CaRL: Learning Scalable Planning Policies with Simple Rewards figure
AlphaXiv 中文概览(可滚动查看)