Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Nvidia integrates new open networking protocol into Spectrum-X platform to tackle AI data centre bottleneck

The MRC protocol, developed with OpenAI and Microsoft, spreads data across hundreds of network paths to keep GPUs working rather than waiting.

Defused News Writer profile image
by Defused News Writer
Nvidia integrates new open networking protocol into Spectrum-X platform to tackle AI data centre bottleneck
Photo by Mariia Shalabaieva / Unsplash

Nvidia, the dominant designer of artificial intelligence chips, has added a new networking protocol called MRC to its Spectrum-X Ethernet platform, an upgrade aimed at solving one of the less visible but increasingly critical problems in AI infrastructure: keeping data flowing fast enough between tens of thousands of processors working in unison.

MRC stands for Multipath Reliable Connection, a protocol that governs how data moves between processors inside the vast clusters of graphics processing units (GPUs) used to train AI models such as those behind ChatGPT.

Training a large AI model involves splitting the work across thousands of GPUs, which must constantly exchange data with one another over a high-speed network.

If even a small part of that network becomes congested or fails, every GPU in the cluster can be forced to wait, wasting expensive computing time.

Traditional networking protocols send data along a single path between two points, which creates a bottleneck if that path becomes overloaded.

MRC takes a different approach: it spreads a single data transfer across hundreds of network paths simultaneously, balancing the load and reducing the risk that any one link becomes a chokepoint.

When a path does fail, MRC can detect the problem and reroute traffic in microseconds, fast enough that the GPUs continue working without meaningful interruption.

The protocol was developed jointly by Nvidia, OpenAI, Microsoft, AMD, Broadcom and Intel, and is already deployed across OpenAI's largest GPU clusters, including systems at Oracle's data centre in Abilene, Texas, and in Microsoft's Fairwater supercomputers.

OpenAI released the MRC specification through the Open Compute Project, an industry body that publishes open hardware and software standards, making it freely available for others to adopt.

Nvidia's Spectrum-X is an Ethernet-based networking platform purpose-built for AI workloads, comprising switches and network interface cards designed to connect large numbers of GPUs at speeds of up to 800 gigabits per second.

Ethernet is the standard networking technology used in most data centres, but conventional Ethernet equipment was not designed for the extreme demands of AI training, where vast amounts of data must move between processors with minimal delay.

Spectrum-X modifies Ethernet to handle those demands, and the addition of MRC gives it a more resilient transport layer capable of sustaining performance even as clusters scale towards hundreds of thousands of GPUs.

The announcement underlines how networking, once an afterthought in data centre design, has become a central battleground in AI infrastructure as the size of training clusters grows faster than the capacity of traditional networks to connect them.

The recap

  • NVIDIA adds MRC to Spectrum‑X Ethernet fabric now.
  • Spectrum‑X described as "the Open, AI‑Native Ethernet Fabric".
  • Technology positioned for gigascale artificial intelligence deployments at scale.
Defused News Writer profile image
by Defused News Writer

Explore stories