Details
- Gemma 3n runs on just 2 GB of RAM using the Matryoshka Transformer (MatFormer) architecture, enabling high-performance AI on mobile and low-power devices.
- It processes multimodal input—audio, text, and image—with support for 32,000-token context and over 140 languages.
- Incorporates conditional parameter loading and Per-Layer Embedding (PLE) caching, optimizing memory use by up to 60 percent.
- First in the Gemma series to support native audio input, achieving 6.25 tokens per second without relying on cloud services.
- Built on the same architecture as Gemini Nano but adds modular support for vision and audio, which can be turned off for lighter text-only workloads.
Impact
Gemma 3n brings true multimodal AI to edge devices, even those with limited resources, making advanced language and media understanding accessible in offline and low-connectivity environments. This furthers Google's effort to lead the market in privacy-preserving, on-device AI while setting a new bar for efficient model deployment on mobile hardware.