3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation figure
AlphaXiv 中文概览(可滚动查看)