Google DeepMind Unveils Veo 3.1 With Extended Video Generation, Audio, and Advanced Scene Controls

Details

Google DeepMind introduces Veo 3.1, the latest version of its text-to-video model, now available to Flow by Google users and through the Gemini API for developers.
The update significantly advances narrative comprehension, enabling the model to maintain consistent characters, lighting, and photorealistic textures in longer video sequences.
A new "ingredients to video" mode allows creators to upload multiple reference images, blending elements from each into a unified, sound-enabled scene.
The "scene extension" feature auto-generates continuing footage of 60 seconds or more, linking clips by leveraging the previous clip's final moments for seamless continuity.
The "first-and-last-frame" capability generates smooth transitions and camera movements, automatically filling in between endpoints without the need for manual key-framing.
Parallel audio track generation syncs sound precisely to onscreen action, cutting down on post-production sound design efforts.
The improvements stem from enhanced video-language pre-training and a reinforced diffusion decoder, though Veo's model weights are not open source.

Impact

Veo 3.1 directly takes on OpenAI's Sora in the long-form generative video space, surpassing the previous standard for minute-long, story-consistent outputs. Its integration with Gemini API positions Google DeepMind as a strong multimodal platform contender, pushing competitors to innovate in both video and audio generation. As AI-generated video becomes more sophisticated, it could reshape content creation workflows and prompt new regulatory debates around intellectual property.

Google DeepMind Unveils Veo 3.1 With Extended Video Generation, Audio, and Advanced Scene Controls

Details

Impact

Social

CONTENT

INFO