NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation figure
AlphaXiv 中文概览(可滚动查看)