MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention figure
AlphaXiv 中文概览(可滚动查看)