Awesome Robotics Manipulation · full_paper

CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos

作者：Chubin Zhang, Jianan Wang, Zifeng Gao, Yue Su, Tianru Dai, Cai Zhou, Jiwen Lu, Yansong Tang · 单位：Tsinghua University, University of Hong Kong, MIT · 会议/期刊：arXiv · 日期：2026-01-07 · 来源：Low-Level Learning-Based Action Modelling / Input Modelling / 2D Vision Language Action Models with Latent Learning

视频规划视觉语言动作潜变量学习基础模型机器人学习

CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos figure — AlphaXiv 中文论文页面（可滚动查看）