NVIDIA Launches Nemotron 3 Super: 120B Hybrid Mamba-Transformer MoE Model

Details

NVIDIA introduced Nemotron 3 Super, a 120B-parameter model (12B active per token) using a hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture with native 1M-token context window.
Part of the Nemotron 3 family including Nano (30B/3B active, available now), Super (optimized for collaborative agents and high-volume workloads like IT automation), and Ultra (500B/50B active for complex reasoning, releases pending in early 2026).
Key innovations: Latent MoE for hardware-aware expert design, multi-token prediction (MTP) for efficient long-form generation, and training in 4-bit NVFP4 precision on Blackwell GPUs, reducing memory use without accuracy loss.
Nemotron 3 improves on Nemotron 2 with up to 4x higher token throughput, 60% fewer reasoning tokens, and expanded context, outperforming models like GPT-OSS-20B and Qwen3-30B on benchmarks.
Designed for multi-agent efficiency, high throughput, and self-hosting with open weights; Nano available on Hugging Face and platforms like Baseten, with enterprise support via NVIDIA NIM microservices.
Accompanied by open-source NeMo Gym, NeMo RL for training environments, and NeMo Evaluator for safety/performance validation.

Impact

NVIDIA's Nemotron 3 Super advances multi-agent AI systems by combining hybrid Mamba-Transformer MoE with NVFP4 training on Blackwell, delivering superior throughput and 1M-token context that outpaces open rivals like Mistral Large 3 and DeepSeek in agent coordination and efficiency, while matching or exceeding proprietary frontier models in reasoning at lower compute costs. This positions NVIDIA to pressure incumbents like OpenAI and Anthropic, whose dense models lag in multi-agent scalability and open-weight accessibility, potentially accelerating adoption in enterprise workflows such as IT automation and long-horizon planning. By open-sourcing tools like NeMo Gym, it lowers barriers for developers building agent swarms, shifting market dynamics toward hardware-optimized, efficient inference on NVIDIA GPUs and easing GPU bottlenecks. Over the next 12-24 months, expect this to steer R&D toward latent MoE and low-precision training, boosting funding into agentic systems and reinforcing NVIDIA's lead in the AI hardware-software stack amid rising demands for on-device and edge deployments.

NVIDIA Launches Nemotron 3 Super: 120B Hybrid Mamba-Transformer MoE Model

Details

Impact

Social

CONTENT

INFO