Details
- NVIDIA announced Megatron Core now offers end-to-end support for higher-order optimizers like Muon, plus research optimizers MOP and REKLS.
- This enables efficient training of large-scale models such as Kimi K2 and Qwen3 at 30B parameter scale.
- The update goes beyond standard data-parallel techniques to push training performance boundaries.
- Key organizations involved include NVIDIA AI, with models from Moonshot AI (Kimi K2) and Alibaba (Qwen3).
- Higher-order optimizers improve convergence speed and stability for massive models compared to traditional methods like AdamW.
- Megatron Core is NVIDIA's framework for scaling transformer training on GPUs, building on prior versions with optimized parallelism.
Impact
NVIDIA's Megatron Core update positions it as a leader in optimizing frontier model training, directly aiding rivals like Moonshot AI and Alibaba to compete with OpenAI-scale models. By integrating Muon—a second-order optimizer shown to reduce training costs by up to 50% in benchmarks—it lowers barriers for 30B+ parameter runs on H100/H200 clusters. This pressures cloud providers like AWS and Google Cloud to match NVIDIA's stack efficiency, accelerating the shift toward cost-effective AI development amid rising compute demands. Among early frameworks supporting such optimizers, it narrows gaps with custom in-house solutions from hyperscalers.
