D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI figure
AlphaXiv 中文论文页面(可滚动查看)