TokenGS: Decoupling 3D Gaussian Prediction from Pixels with Learnable Tokens

Jiawei Ren, Michal Tyszkiewicz, Zan Gojcic, Jiahui Huang

April, 2026 CVPR 2026 Highlight

Abstract

TokenGS revisits key design choices in feed-forward 3D Gaussian Splatting prediction. Instead of regressing Gaussian means as depths along camera rays, it directly regresses 3D mean coordinates with a self-supervised rendering loss. This enables an encoder-decoder architecture with learnable Gaussian tokens, decoupling the number of predicted primitives from input image resolution and view count. TokenGS improves robustness to pose noise and multiview inconsistencies, supports efficient test-time optimization in token space, and achieves state-of-the-art feed-forward reconstruction performance on static and dynamic scenes.