Google Launches Gemma 3n: A Multimodal, On-Device AI Model for Low-Memory Devices

Details

Gemma 3n runs on just 2 GB of RAM using the Matryoshka Transformer (MatFormer) architecture, enabling high-performance AI on mobile and low-power devices.
It processes multimodal input—audio, text, and image—with support for 32,000-token context and over 140 languages.
Incorporates conditional parameter loading and Per-Layer Embedding (PLE) caching, optimizing memory use by up to 60 percent.
First in the Gemma series to support native audio input, achieving 6.25 tokens per second without relying on cloud services.
Built on the same architecture as Gemini Nano but adds modular support for vision and audio, which can be turned off for lighter text-only workloads.

Impact

Gemma 3n brings true multimodal AI to edge devices, even those with limited resources, making advanced language and media understanding accessible in offline and low-connectivity environments. This furthers Google's effort to lead the market in privacy-preserving, on-device AI while setting a new bar for efficient model deployment on mobile hardware.

Google Launches Gemma 3n: A Multimodal, On-Device AI Model for Low-Memory Devices

Details

Impact

Social

CONTENT

INFO