CoA-VLA: Improving Vision-Language-Action Models via Visual-Text Chain-of-Affordance figure
AlphaXiv 中文论文页面(可滚动查看)