HALO: A Unified Vision-Language-Action Model for Embodied Multimodal Chain-of-Thought Reasoning figure
AlphaXiv 中文论文页面(可滚动查看)