Nvidia and Google Cloud used the Cloud Next conference in Las Vegas to unveil a suite of new infrastructure, software and managed services designed to move agentic and physical artificial intelligence workloads from research into production.
The companies said the announcements mark a milestone in a co-engineering partnership spanning more than a decade of work across libraries, systems and cloud services.
At the centre of the launch are A5X instances powered by Nvidia's Vera Rubin NVL72 rack-scale systems, the chipmaker's latest data centre platform built around its Rubin GPU architecture.
The companies said A5X delivers up to ten times lower inference cost per token and ten times higher token throughput per megawatt compared to the prior generation.
Each Vera Rubin NVL72 rack combines 72 Rubin GPUs with 36 Vera CPUs connected via sixth-generation NVLink, providing 260 terabytes per second of scale-up bandwidth.
A5X uses Nvidia ConnectX-9 SuperNICs alongside Google's Virgo networking fabric to scale clusters to up to 80,000 Rubin GPUs at a single site and up to 960,000 across multiple sites.
"We believe the next decade of AI will be shaped by customers' ability to run their most demanding workloads on a truly integrated, AI-optimised infrastructure stack," said Mark Lohmeyer, vice president and general manager of AI and computing infrastructure at Google Cloud.
The companies also previewed Google's Gemini model running on Google Distributed Cloud with Nvidia Blackwell and Blackwell Ultra GPUs, and introduced confidential virtual machines using Blackwell hardware for workloads requiring enhanced data security.
On the software side, Nvidia's Nemotron open models and its NeMo framework are now available on Google's Gemini Enterprise Agent Platform, including the Nemotron 3 Super model.
Google and Nvidia have added Managed Training Clusters with a managed reinforcement learning API built using NeMo RL, aimed at simplifying the process of fine-tuning and training models at scale.
Customers, including OpenAI, Thinking Machines Lab and CrowdStrike, are already using the combined stack for large-scale inference, training and industry-specific applications.
More than 90,000 developers have joined the joint developer community in just over a year, the companies said.
The partnership also extends into physical AI and industrial applications, with tools including Nvidia Omniverse and Isaac Sim now available on Google Cloud to support digital twins, robotics simulation and industrial workflows.
Partners Cadence and Siemens Digital Industries Software are participating in the effort.
The recap
- NVIDIA and Google Cloud unveil expanded AI infrastructure.
- A5X promises up to 10x lower inference cost per token.
- Attend NVIDIA sessions, demos and workshops at Google Cloud Next.