Details
- Alibaba introduced Qwen3-Next, a new AI architecture that uses hybrid attention and an ultra-sparse Mixture of Experts (MoE) design, selectively activating just 3.7% of its 80 billion parameters—roughly 3 billion per token—increasing efficiency for large-scale language models.
- The lead model, Qwen3-Next-80B-A3B-Base, achieves 10 times the throughput of the previous Qwen3-32B model for long-context tasks, with training costs reduced to less than 10% of prior models. Both Instruct and Thinking variants are open-sourced on Hugging Face and ModelScope.
- Notable advancements include a Gated DeltaNet with Gated Attention to replace standard attention mechanisms, an ultra-sparse MoE layer for efficiency, and Multi-Token Prediction to boost both performance and inference speed.
- Additional releases include Qwen3-ASR-Flash, a multilingual automatic speech recognition model supporting 11 languages and several Chinese dialects, along with a preview of Qwen3-Max, a trillion-parameter model that placed 6th in LMArena’s Text Arena benchmark.
- The Qwen3-Next-80B offers a native 256,000-token context window—expandable to one million—delivering performance on par with the flagship Qwen3-235B, especially excelling in extended context and complex reasoning tasks.
Impact
Alibaba’s Qwen3-Next represents a major step towards making ultra-large AI models both efficient and accessible, addressing a core challenge as the sector moves into the trillion-parameter era. While competitors like OpenAI and Google prioritize higher capabilities, Alibaba’s architecture could broaden access by enabling sophisticated AI on mainstream hardware. The simultaneous rollout of multiple models underscores its push to lead across different AI domains, not just with one blockbuster release.