Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions figure
AlphaXiv 中文论文页面(可滚动查看)