M2-VLA: Boosting Vision-Language Models for Generalizable Manipulation via Layer Mixture and Meta-Skills figure
AlphaXiv 中文论文页面(可滚动查看)