Environmental Understanding Vision-Language Model for Embodied Agent figure
AlphaXiv 中文论文页面(可滚动查看)