Details
- NVIDIA introduced the Rubin CPX GPU and SMART optimization framework on September 9, 2025, targeting long-context AI workloads such as code generation and video synthesis that demand processing of million-token sequences.
- The Rubin CPX delivers 30 petaFLOPs of NVFP4 compute with 128GB of GDDR7 memory, offering triple the attention acceleration of previous GB300 NVL72 systems.
- The platform leverages a disaggregated inference architecture, separating compute-heavy context handling from memory-bound token generation, and is orchestrated by the NVIDIA Dynamo framework for efficient resource use.
- The Vera Rubin NVL144 CPX platform packages 144 Rubin CPX GPUs into a single rack, delivering 8 exaFLOPs of AI performance, 100TB of high-speed memory, and achieving a 7.5x performance boost over its predecessor.
- NVIDIA projects a 30-50x return on investment, estimating $5 billion in token revenue from a $100 million infrastructure deployment, with general availability slated by late 2026.
Impact
NVIDIA’s latest platform solidifies its position at the forefront of AI infrastructure, catering to enterprise needs for scalable, context-intensive AI solutions as industry focus shifts from model training to efficient inference. This move directly addresses major computational hurdles and could catalyze widespread adoption of advanced agentic AI in real-world enterprise workflows.