S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight figure
AlphaXiv 中文论文页面(可滚动查看)