Details
- Qwen has introduced Qwen3-Next-80B-A3B, a large language model with 80 billion parameters, launched on September 11, 2025.
- The model activates only 3 billion parameters per token through Mixture-of-Experts routing, allowing for sparse computation.
- Qwen claims the model offers 10 times lower training costs and 10 times faster inference compared to its Qwen3-32B model, particularly on long 32,000-token contexts.
- It utilizes a hybrid architecture known as "Gated DeltaNet + Gated Attention" to optimize throughput and maintain output quality.
- The model is aimed at cloud API providers and enterprise fine-tuning, boasting reduced GPU memory usage and support for scalable multi-node training.
Impact
This launch challenges competitors like OpenAI’s GPT-4o and Anthropic’s Claude-3, both of which still rely on dense architectures. By reducing GPU and energy demands, Qwen’s model could make advanced AI more accessible for mid-sized SaaS companies and those previously locked out by hardware costs. The move underscores industry momentum toward sparse activation, echoing trends set by Google and Mistral, and may shift R&D focus toward smarter model architectures rather than just scaling up parameter counts.