VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers figure
AlphaXiv 中文论文页面(可滚动查看)