VIPA-VLA: Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos figure
AlphaXiv 中文论文页面(可滚动查看)