CoVAR: Co-generation of Video and Action for Robotic Manipulation via Multi-Modal Diffusion figure
AlphaXiv 中文论文页面(可滚动查看)