Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance figure
AlphaXiv 中文论文页面(可滚动查看)