MAPS: Preserving Vision-Language Representations via Module-Wise Proximity Scheduling for Better Vision-Language-Action Generalization figure
AlphaXiv 中文论文页面(可滚动查看)