Recent video generation models enable a workflow where camera-controlled walkthrough videos are generated first and then lifted to 3D with feed-forward reconstruction. Lyra 2.0 scales this idea to persistent, explorable 3D worlds by addressing spatial forgetting with per-frame geometry used for information routing, and temporal drift with self-augmented training that teaches the model to correct its own accumulated errors. The resulting long-horizon, 3D-consistent video trajectories can be reconstructed into high-quality 3D scenes.