Details
- Google has launched Gemini 2.5 Flash-Lite in stable release, now the fastest and most cost-effective model in the Gemini 2.5 family, priced at $0.10 per 1M input tokens and $0.40 per 1M output tokens.
- Flash-Lite is designed for latency-sensitive applications such as translation and classification, offering developers the option to toggle advanced reasoning features and benefiting from comprehensive tools support.
- The model delivers 45% lower latency compared to previous models and uses 30% less power, with real-world impacts seen in implementations like Satlyt’s space diagnostics system.
- Flash-Lite rounds out Google’s 2.5 lineup, complementing 2.5 Pro and Flash, with each supporting a 1M-token context window and full integration capabilities.
- Adopters include Satlyt in space computing, HeyGen for multilingual video avatars, DocsHound for video documentation, and Evertune for brand analysis, highlighting the model’s versatility across industries.
Impact
Gemini 2.5 Flash-Lite strengthens Google’s position in delivering high-performance, cost-efficient AI models for enterprise use. Its favorable pricing and technical efficiencies make it attractive for organizations looking to expand real-time and large-scale AI applications. The launch is poised to shape the economics of foundation models and accelerate AI adoption in critical, latency-sensitive sectors.