My research focuses on Robot Learning, Representation Learning and Multimodal Large Language
Models.
I am looking for collaborators to work on the following topics:
Vision-Language-Action Models
World Models
And other interesting topics in Robot Learning.
If you are interested in these topics, please feel free to contact me.
Based on cognitive psychology, we introduce a comprehensive and complex spatial reasoning
benchmark, including 50 detailed categories and 1.5K manual labeled QA pairs.
We propose a framework that transfers VLMs to downstream tasks by designing visual prompts from an attention perspective
that reduces the transfer solution space.
Closed-Loop Unsupervised Representation Disentanglement with β-VAE Distillation and Diffusion
Probabilistic Feedback
Xin Jin, Bohan Li, Baao Xie, Wenyao Zhang, Jinming Liu, Ziqiang Li, Tao Yang, Wenjun Zeng
European Conference on Computer Vision (ECCV 2024) [Paper][Code]
We introduce CL-Dis, a closed-loop unsupervised disentanglement framework that integrates β-VAE distillation with
diffusion-based feedback to learn semantically disentangled representations without labels.
Hierarchical Temporal Context Learning for Camera-Based Semantic Scene Completion
Bohan Li, Jiajun Deng, Wenyao Zhang, Zhujin Liang, Dalong Du, Xin
Jin, Wenjun Zeng
European Conference on Computer Vision (ECCV 2024) [Paper][Code]
We introduce HTCL, a hierarchical temporal context learning paradigm for camera-based 3D semantic scene
completion.
Predict the Rover Mobility Over Soft Terrain Using Articulated Wheeled Bevameter Wenyao Zhang, Shipeng Lyv , Feng Xue, Chen Yao, Zheng Zhu, Zhenzhong Jia IEEE Robotics and Automation Letters (RA-L 2022) [Paper][Code]
We propose an on-board mobility prediction approach using an articulated wheeled bevameter that consists of a
force-controlled arm and an instrumented bevameter (with force and vision sensors) as its end-effector.