Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification figure
AlphaXiv 中文论文页面(可滚动查看)