Audio-VLA: Adding Contact Audio Perception to Vision-Language-Action Model for Robotic Manipulation figure
AlphaXiv 中文论文页面(可滚动查看)