Details
- Upgraded model generates high-fidelity 4D assets (3D + time) from single videos, eliminating need for multi-view references.
- Achieves state-of-the-art benchmarks with 14% LPIPS improvement in detail and 44% FV4D gain in 4D consistency over predecessor.
- Enables professional workflows for game sprite sheets, film assets, and virtual worlds through improved temporal coherence.
- Redesigned architecture handles occlusions and large motions better, with 40-second generation times for 5-frame/8-view outputs.
- Released under permissive commercial license via Hugging Face, GitHub, and arXiv with full technical documentation.
Impact
This advancement accelerates 3D/4D content creation pipelines while maintaining cross-view consistency—critical for immersive media. By democratizing complex 4D generation, it lowers barriers for indie developers and aligns with industry shifts toward dynamic asset workflows.