Awesome Robotics Manipulation · full_paper

S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight

作者：Haodong Yan, Zhide Zhong, Jiaguan Zhu, Junjie He, Weilin Yuan, Wenxuan Song, Xin Gong, Yingjie CAI, Guanyi Zhao, Xu Yan, Bingbing Liu, Ying-Cong Chen, Haoang Li · 单位：The Hong Kong University of Science and Technology (Guangzhou) · 会议/期刊：arXiv · 日期：2026-03-17 · 来源：Low-Level Learning-Based Action Modelling / Input Modelling / 2D Vision Language Action Models with Auxiliary Tasks - World Model & Visual Prediction / Visual/State Prediction/Generation

视频规划辅助任务视觉语言动作世界模型感知机器人学习

S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight figure — AlphaXiv 中文论文页面（可滚动查看）