Details

  • Microsoft Azure has introduced the NDv6 GB300 VM series, featuring the world’s first production-ready cluster of NVIDIA GB300 NVL72 systems. The deployment is designed specifically for OpenAI’s advanced inference workloads.
  • The cluster consists of over 4,600 NVIDIA Blackwell Ultra GPUs, linked by NVIDIA Quantum-X800 InfiniBand networking, and is expected to expand to hundreds of thousands of Blackwell GPUs across Azure datacenters globally.
  • Each GB300 NVL72 rack integrates 72 liquid-cooled Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs as a cohesive system, offering 37 TB of rapid-access memory, 1.44 exaflops of FP4 Tensor Core performance, and 130 TB/s NVLink bandwidth. Each GPU achieves 800 Gb/s in cross-rack connectivity, doubling the throughput of the previous GB200 generation.
  • This marks an infrastructure leap from Azure's earlier GB200 NVL2 releases in 2025. Benchmark results show these GB300 systems delivering 5x higher throughput per GPU on the DeepSeek-R1 (671B) reasoning model, and leadership on Llama 3.1 405B tasks, compared to NVIDIA’s Hopper architecture.
  • The deployment leverages advanced networking features such as SHARP v4, adaptive routing, and real-time congestion management, while Microsoft has re-engineered its data centers with custom liquid cooling and tailored power and orchestration systems to accommodate this next-gen hardware.

Impact

The launch of Azure’s GB300 cluster demonstrates the commercial viability of NVIDIA’s Blackwell Ultra for high-stakes, at-scale AI inference and reasoning tasks. With Microsoft scaling to hundreds of thousands of these GPUs, the hyperscaler competition for AI supercomputing leadership is intensifying. This move puts Microsoft and OpenAI at the cutting edge of frontier AI, accelerating the industry toward new advances in reasoning and multimodal intelligence.