Nvidia's next AI supercomputer has 1.3 million parts, weighs two tons and costs up to $4 million. It may also be the only thing that can keep AI's energy problem from getting worse

centresWhy Vera Rubin exists

The biggest constraint on AI right now is not processing power. It is energy. Data centres can only draw so much power from the grid, and the current generation of AI hardware is pushing against that ceiling.

Vera Rubin is Nvidia's answer. The company's new rack-scale system delivers roughly ten times more performance per watt compared to Grace Blackwell, its current flagship, which means AI companies can run significantly more compute without a proportional increase in their electricity bill. For an industry where power costs are becoming a defining factor in what can and cannot be built, that gap matters enormously.

The system is currently in volume production and scheduled to ship in the second half of 2026.

What is actually in the box

A single Vera Rubin rack weighs close to two tons. It contains around 1,300 chips, 220 trillion transistors, and approximately 1.3 million individual components sourced from more than 80 suppliers across more than 20 countries.

The computing core of each rack is the Vera Rubin superchip. Each one pairs a Vera CPU with two Rubin GPUs. The Vera CPU delivers roughly twice the performance per watt of the Grace CPU it replaces. Each Rubin GPU produces around 50 petaflops of AI performance, about 2.5 times its predecessor.

There are 18 compute trays in a single rack. Each tray holds two superchips. Each superchip contains approximately 17,000 components.

One meaningful design change from the previous generation: memory is no longer soldered in. Vera Rubin uses SoCAMM memory modules that can be removed and replaced. Grace Blackwell's memory was fixed in place, which made repairs and upgrades harder. The new system also uses HBM4, the latest generation of High Bandwidth Memory, with eight stacks per GPU, four on each side.

A full Vera Rubin deployment is called a pod. It contains 16 racks and 1,152 GPUs.

Cooling and power, explained plainly

Running 1.3 million components at full load, around the clock, generates a serious amount of heat. Early Blackwell deployments ran into overheating problems, most of which turned out to be installation errors rather than design flaws. Nvidia has since resolved those issues, but the lesson informed how Vera Rubin handles thermal management.

The compute trays in Vera Rubin have no fans, no hoses attached directly to them, and no cables running through them. Cooling is handled entirely by liquid, using cold plates with internal piping. The approach uses less water overall than traditional air and evaporative cooling systems, which is a meaningful advantage as data centres face increasing scrutiny over water consumption.

Power consumption per rack sits at around 220kW, roughly twice that of a Grace Blackwell rack. That increase is the trade-off for the performance gains, and it requires data centres to upgrade their power delivery infrastructure before they can run Vera Rubin.

How the chips talk to each other

Getting 1,152 GPUs to behave as a single coordinated system requires serious networking. Nvidia handles this with NVLink, its proprietary interconnect. Nine NVLink Switch trays per rack connect 72 GPUs at a combined data rate of 260 terabytes per second. The line rate for NVLink is 3.6 terabytes per second.

Each rack also contains BlueField data processing units, which manage storage and security, and ConnectX-9 networking controllers for external connectivity. These products originated with Mellanox, a networking company Nvidia acquired in 2020.

Racks connect to each other through separate networking racks filled with Nvidia's Spectrum-X switches. Thousands of these racks together form what Nvidia calls an AI factory.

The price and what you get for it

Vera Rubin racks are expected to cost between $3 million and $4 million each, an increase of around 25% over Grace Blackwell. That is a higher upfront cost for customers.

The counterargument, and the one Nvidia makes consistently, is cost per token. A token is the basic unit of AI output, a word, part of a word, or a character, and the economics of running a large language model are largely determined by how cheaply those tokens can be generated. Vera Rubin is expected to deliver roughly ten times lower cost per token compared to its predecessor. For companies running these systems at scale, that efficiency gain offsets the higher purchase price.

Tariffs have introduced some uncertainty into component pricing, but Nvidia has been working to diversify its supply chain. The company is participating in the US government's reshoring initiative and has plans to manufacture up to $500 billion of AI infrastructure in the United States through 2029. Blackwell production is already underway at TSMC's new Arizona fabs, with assembly at sites in the US, Taiwan, and a new Foxconn plant in Mexico.

The competition and what comes next

Nvidia is not running unopposed. AMD is preparing its own rack-scale system, Helios, for later this year. The primary appeal for customers is having a second viable supplier, which reduces dependence on any single company and adds negotiating leverage. AMD is unlikely to match Nvidia's performance figures at launch, but for workloads that do not require the absolute frontier, it represents a credible alternative.

Nvidia's largest customers, including AWS, Google and Microsoft, are also developing their own AI chips and servers. All three continue to buy Nvidia hardware in large volumes regardless, which is a reasonable indication that in-house silicon is not yet a replacement for what Nvidia offers.

The generation after Vera Rubin already has a name. Kyber is expected to arrive in 2027 as part of the Vera Rubin Ultra system, with 288 GPUs per rack. The design philosophy is further integration and fewer connection points, which reduces latency and lowers the total cost of maintenance over the system's lifetime.

Nvidia intends customers to think of hardware purchases in annual cycles, upgrading continuously rather than waiting for a step change. Vera Rubin and Blackwell are designed to coexist, with different workloads running on whichever generation suits them better.