Details

  • NVIDIA's Blackwell GB200 NVL72 system achieved 1.5 million tokens per second running OpenAI’s gpt-oss-120b model, highlighting major advancements in large model inference.
  • NVIDIA and OpenAI collaborated to optimize and publicly release two open-weight models, gpt-oss-120b and gpt-oss-20b, aimed at driving generative and physical AI across various industries.
  • The Blackwell architecture leverages NVFP4 4-bit precision and CUDA-optimized frameworks like FlashInfer, Hugging Face, and TensorRT to deliver lower power consumption, reduced memory demands, and real-time execution of trillion-parameter LLMs.
  • This marks a significant shift from previous models trained on NVIDIA H100 GPUs, drastically accelerating training times and inference throughput, with potential reductions in model training from months to just days.
  • Leveraging the extensive reach of the CUDA ecosystem—450 million downloads and 6.5 million developers worldwide—these models are accessible on platforms ranging from DGX Cloud to RTX PRO cards, ensuring broad developer and enterprise access.

Impact

The partnership cements NVIDIA’s role as a cornerstone of AI infrastructure, enabling faster, more efficient open-source model deployment for enterprises and developers globally. This high-profile collaboration demonstrates how hardware-software integration is pushing generative AI forward and helping meet industry demand for scalable, high-performance solutions. With CUDA’s wide adoption, NVIDIA is poised to further fuel innovation in AI-powered fields from research to real-world applications.