Awesome Robotics Manipulation · full_paper

2D or 3D: Who Governs Salience in VLA Models? -- Tri-Stage Token Pruning Framework with Modality Salience Awareness

作者：Zihao Zheng, Sicheng Tian, Zhihao Mao, Lingyue Zhang, Chenyue Li, Ziyun Zhang, Hong Gao, Yuchen Huang, Yutong Xu, Guojie Luo, Xiang Chen · 单位：School of Computer Science, Peking University, ZTE Corporation, School of Artificial Intelligence, Beijing Normal University, School of Computer Science, China University of Geosciences (Wuhan), School of Electronics Engineering and Computer Science, Peking University · 会议/期刊：arXiv · 日期：2026-04-10 · 来源：Low-Level Learning-Based Action Modelling / Input Modelling / 2D Vision Language Action Models with Efficiency / Token Pruning

三维表征视觉语言动作机器人学习

2D or 3D: Who Governs Salience in VLA Models? -- Tri-Stage Token Pruning Framework with Modality Salience Awareness figure — AlphaXiv 中文论文页面（可滚动查看）