Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers figure
AlphaXiv 中文概览(可滚动查看)