PRIMT: Preference-based Reinforcement Learning with Multimodal Feedback and Trajectory Synthesis from Foundation Models figure
AlphaXiv 中文论文页面(可滚动查看)