Qwen Unveils Sparse 80B-Parameter Model Qwen3-Next-80B-A3B Promising Major Cost Reductions

Details

Qwen has introduced Qwen3-Next-80B-A3B, a large language model with 80 billion parameters, launched on September 11, 2025.
The model activates only 3 billion parameters per token through Mixture-of-Experts routing, allowing for sparse computation.
Qwen claims the model offers 10 times lower training costs and 10 times faster inference compared to its Qwen3-32B model, particularly on long 32,000-token contexts.
It utilizes a hybrid architecture known as "Gated DeltaNet + Gated Attention" to optimize throughput and maintain output quality.
The model is aimed at cloud API providers and enterprise fine-tuning, boasting reduced GPU memory usage and support for scalable multi-node training.

Impact

This launch challenges competitors like OpenAI’s GPT-4o and Anthropic’s Claude-3, both of which still rely on dense architectures. By reducing GPU and energy demands, Qwen’s model could make advanced AI more accessible for mid-sized SaaS companies and those previously locked out by hardware costs. The move underscores industry momentum toward sparse activation, echoing trends set by Google and Mistral, and may shift R&D focus toward smarter model architectures rather than just scaling up parameter counts.

Qwen Unveils Sparse 80B-Parameter Model Qwen3-Next-80B-A3B Promising Major Cost Reductions

Details

Impact

Social

CONTENT

INFO