Awesome Robotics Manipulation · full_paper

Unifying Perception and Action: A Hybrid-Modality Pipeline with Implicit Visual Chain-of-Thought for Robotic Action Generation

作者：Xiangkai Ma, Lekai Xing, Han Zhang, Wenzhong LiB, Sanglu Lu · 单位：State Key Laboratory for Novel Software Technology, Nanjing University, Google Robot · 会议/期刊：arXiv · 日期：2025-11-25 · 来源：Low-Level Learning-Based Action Modelling / Input Modelling / 2D Vision Language Action Models with Auxiliary Tasks - Visual Goal Extraction

辅助任务视觉语言动作感知机器人学习操作

Unifying Perception and Action: A Hybrid-Modality Pipeline with Implicit Visual Chain-of-Thought for Robotic Action Generation figure — AlphaXiv 中文论文页面（可滚动查看）