VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation figure
AlphaXiv 中文论文页面(可滚动查看)