Details

  • Google AI Developers announced Gemini 3.1 Flash Live, now available in preview via the Live API and Google AI Studio.
  • The model enables developers to build low-latency voice and vision agents that process real-time information, supporting text, images, audio, and video inputs with audio outputs.
  • Key improvements include higher task completion in noisy environments by filtering background noise like traffic, better instruction-following for complex scenarios, more natural low-latency dialogue with acoustic nuance detection, and support for over 90 languages.
  • Compared to Gemini 2.5 Flash Native Audio, it offers reduced latency, improved reliability, and enhanced turn coverage including audio activity and all video frames.
  • Demo showcased in Stitch app, where the agent handles voice-based design interactions, sees the canvas, provides feedback, and suggests variations.
  • Features function calling, session management, and ephemeral tokens; input token limit 131,072, output 65,536; knowledge cutoff January 2025.

Impact

Google's Gemini 3.1 Flash Live advances real-time multimodal AI, improving noise robustness and latency over Gemini 2.5 equivalents, positioning it competitively against OpenAI's GPT-4o realtime API and Anthropic's voice features in Claude. This lowers barriers for voice-first apps in noisy settings, potentially accelerating adoption in consumer devices and enterprise tools. With multilingual support, it widens access globally, though rivals like Meta's Llama models lag in integrated live APIs. Early preview status suggests Google aims to lead in low-latency agentic experiences amid intensifying voice AI competition.