Details
- OpenAI announced partnership with AMD, Broadcom, Intel, Microsoft, and NVIDIA to release Multipath Reliable Connection (MRC), an open networking protocol.
- MRC improves speed and reliability for large AI training clusters, reducing wasted GPU time during model training.
- Already deployed on OpenAI's largest supercomputers, including the Abilene, Texas site with Oracle Cloud Infrastructure (OCI) and Microsoft's Fairwater supercomputers used for frontier models.
- MRC is now publicly available via openai.org/mrc.
- Protocol uses multiple network paths for redundancy, minimizing downtime and packet loss in high-scale AI environments.
- Designed specifically for AI workloads, contrasting with traditional networking by prioritizing low-latency, high-throughput connections essential for distributed training.
Impact
OpenAI's MRC launch pressures rivals like Google and Meta, who rely on proprietary networking for their AI clusters, by offering an open protocol that could standardize faster, more reliable training infrastructure across the industry. Amid DOE's recent investments in AI supercomputers at labs like Argonne and ORNL—some targeting exascale AI—this democratizes advanced networking, potentially accelerating non-proprietary research and narrowing the gap between commercial leaders and academic/government efforts. It lowers effective training costs through reduced GPU idle time, widening access for open-source AI development.
