NVIDIA Blackwell GPUs Shatter AI Training Records in MLPerf v5.0

Details

NVIDIA’s Blackwell architecture set new performance records across all MLPerf Training v5.0 benchmarks, with standout results in Llama 3.1 405B pretraining.
The submissions leveraged Tyche (GB200 NVL72) and Nyx (DGX B200) supercomputers, featuring 2,496 Blackwell GPUs, through collaborations with CoreWeave and IBM.
Key Blackwell advancements include fifth-generation NVLink, 13.4TB of rack memory, liquid cooling, and NVIDIA’s NeMo software toolkit.
Blackwell achieved a 2.2x speedup over the previous-generation Hopper for LLM pretraining, and a 2.5x boost in Llama 2 70B LoRA fine-tuning at scale.
NVIDIA’s partner ecosystem—spanning ASUS, Cisco, Dell, HPE, and Oracle Cloud—contributed across a range of AI workloads.

Impact

NVIDIA’s show of strength in MLPerf v5.0 underscores its leadership in powering the next generation of AI, fueling the rise of AI factories and advanced agentic applications. The Blackwell platform’s scale and efficiency give NVIDIA an edge as enterprises race to build and deploy larger, more capable AI models. As industry adoption grows, Blackwell sets a new standard that competitors must now strive to match.

NVIDIA Blackwell GPUs Shatter AI Training Records in MLPerf v5.0

Details

Impact

Social

CONTENT

INFO