Towards the Vision-Sound-Language-Action Paradigm: The HEAR Framework for Sound-Centric Manipulation figure
AlphaXiv 中文论文页面(可滚动查看)