NVIDIA and OpenAI Launch Optimized GPT-OSS Models for RTX GPUs

Details

NVIDIA and OpenAI have partnered to release two new open-source language models, gpt-oss-20b and gpt-oss-120b, specifically optimized for inference on NVIDIA GPUs, including consumer RTX cards.
The collaboration leverages tools such as Ollama, llama.cpp, and Microsoft AI Foundry Local, with support from NVIDIA CEO Jensen Huang who highlighted the models' role in bringing advanced AI to more users.
Both models use advanced architectures like mixture-of-experts and chain-of-thought reasoning, are trained on NVIDIA H100 GPUs, offer up to 131,072 token context windows, employ MXFP4 precision, and are designed for seamless deployment across multiple frameworks.
The context window size ranks among the largest on the market, while MXFP4 precision enables efficient inference, achieving speeds up to 256 tokens per second on the NVIDIA RTX 5090 GPU.
NVIDIA's open-source contributions and collaborations with frameworks like Hugging Face, TensorRT-LLM, and ONNX integration are aimed at broad adoption of the models across use cases such as search, code support, and document analysis.

Impact

This joint release lowers the barrier for developers and enterprises to harness large-scale AI models locally using affordable NVIDIA GPUs. The efficiency gains and wide compatibility reinforce NVIDIA's leadership in AI hardware and software, while supporting the broader industry shift toward open-source and accessible AI solutions. By enabling high-performance inference on consumer-grade devices, NVIDIA and OpenAI together set a new standard for democratizing AI technology.

NVIDIA and OpenAI Launch Optimized GPT-OSS Models for RTX GPUs

Details

Impact

Social

CONTENT

INFO