VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation figure
AlphaXiv 中文论文页面(可滚动查看)