VCA: Vision-Click-Action Framework for Precise Manipulation of Segmented Objects in Target Ambiguous Environments figure
AlphaXiv 中文论文页面(可滚动查看)