DepthCache: Depth-Guided Training-Free Visual Token Merging for Vision-Language-Action Model Inference figure
AlphaXiv 中文论文页面(可滚动查看)