AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation figure
AlphaXiv 中文论文页面(可滚动查看)