Google DeepMind launches Gemini Omni video-first generative model and Omni Flash rollout

Details

Google DeepMind announced Gemini Omni, a new model family designed to create content from almost any input, starting with video.
Omni combines Gemini’s language reasoning with Google’s generative media systems to improve world understanding, multimodality, and editing.
The model is designed to reason about physics, history, biology, and culture, so generated scenes have logical consequences and coherent narratives.
Users can define a character once and reuse it consistently across different scenes, locations, actions, and lighting conditions.
Styles, motion, and effects can be applied via reference images or natural-language prompts, blending inputs into cohesive clips.
Existing videos can be edited by asking Omni to reimagine the action, change environments, add objects, or transform the overall scene.
The first model in the family, Gemini Omni Flash, is available in the Gemini app, Google Flow, and YouTube Shorts, with API access promised in the coming weeks.
According to Google’s blog, Omni Flash supports multimodal inputs (text, images, audio, video) and focuses on high-quality, knowledge-grounded video generation and conversational editing.

Impact

Gemini Omni positions Google more directly against OpenAI’s Sora and other emerging video-generative systems, but with tighter integration into the broader Gemini stack and Google products like YouTube Shorts. By emphasizing physics-aware, narrative-consistent video and offering conversational editing plus APIs, Google is pushing toward agentic, multimodal creative workflows that could reshape both consumer content creation and professional post-production pipelines.

Google DeepMind launches Gemini Omni video-first generative model and Omni Flash rollout

Details

Impact

Social

CONTENT

INFO