Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight figure
AlphaXiv 中文论文页面(可滚动查看)