Details

  • Upgraded model generates high-fidelity 4D assets (3D + time) from single videos, eliminating need for multi-view references.
  • Achieves state-of-the-art benchmarks with 14% LPIPS improvement in detail and 44% FV4D gain in 4D consistency over predecessor.
  • Enables professional workflows for game sprite sheets, film assets, and virtual worlds through improved temporal coherence.
  • Redesigned architecture handles occlusions and large motions better, with 40-second generation times for 5-frame/8-view outputs.
  • Released under permissive commercial license via Hugging Face, GitHub, and arXiv with full technical documentation.

Impact

This advancement accelerates 3D/4D content creation pipelines while maintaining cross-view consistency—critical for immersive media. By democratizing complex 4D generation, it lowers barriers for indie developers and aligns with industry shifts toward dynamic asset workflows.