TokenGS revisits key design choices in feed-forward 3D Gaussian Splatting prediction. Instead of regressing Gaussian means as depths along camera rays, it directly regresses 3D mean coordinates with a self-supervised rendering loss. This enables an encoder-decoder architecture with learnable Gaussian tokens, decoupling the number of predicted primitives from input image resolution and view count. TokenGS improves robustness to pose noise and multiview inconsistencies, supports efficient test-time optimization in token space, and achieves state-of-the-art feed-forward reconstruction performance on static and dynamic scenes.