VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model figure
AlphaXiv 中文论文页面(可滚动查看)