Awesome Robotics Manipulation · full_paper

GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models

作者：Md Selim Sarowar, Omer Tariq, Sungho Kim · 单位：Yeungnam University, Korea Advanced Institute of Science and Technology, KAIST · 会议/期刊：arXiv · 日期：2026-03-10 · 来源：Low-Level Learning-Based Action Modelling / Input Modelling / 3D Vision Language Action Models

三维表征视觉语言动作机器人学习

GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models figure — AlphaXiv 中文论文页面（可滚动查看）