Details
- NVIDIA AI has released SANA-WM, a 2.6B-parameter open source world model for long-horizon video generation.
- The model is natively trained to generate up to 60-second videos from a single image plus a specified camera trajectory.
- It runs on a single GPU, making controllable 3D-like world synthesis more accessible to researchers and developers.
- The architecture includes Hybrid Linear Attention to improve efficiency and throughput on long video sequences.
- Dual-Branch Camera Control is used to follow user-defined camera paths more accurately while preserving scene coherence.
- A two-stage generation pipeline separates coarse scene evolution from higher-fidelity refinement to maintain visual quality over long durations.
- A robust annotation pipeline underpins the training data, enabling improved action-following and camera control conditioning.
- NVIDIA has open sourced the full project, including paper, code, and model checkpoints, for community experimentation and extension.
Impact
By open sourcing a long-horizon, camera-controllable world model that fits on a single GPU, NVIDIA lowers the barrier to experimenting with simulation-style video generation, an area of interest for robotics, gaming, and digital content creation. This move pressures rivals in generative video and world models to match openness and controllability, and could accelerate research into agent training, embodied AI, and data-efficient simulation, especially for labs and startups without large compute budgets.
