DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning figure
AlphaXiv 中文论文页面(可滚动查看)