OpenAI adds 750MW of ultra-low-latency AI compute via Cerebras partnership
Deal brings purpose-built inference hardware onto OpenAI’s platform, aiming to speed real-time AI interactions as capacity rolls out through 2028.
OpenAI said it will add 750 megawatts of ultra-low-latency artificial intelligence compute from Cerebras, integrating the capacity into its inference platform in phases.
The company said Cerebras designs purpose-built AI systems that place massive compute, memory and bandwidth onto a single, wafer-scale chip, reducing the data-movement bottlenecks that typically slow inference on conventional GPU-based architectures. The approach is designed to deliver extremely fast response times, which are critical for real-time AI applications.
OpenAI said the new capacity will be incorporated progressively across its inference stack and expanded to additional workloads over time. The rollout will occur in multiple tranches through 2028.
“OpenAI’s compute strategy is to build a resilient portfolio that matches the right systems to the right workloads,” said Sachin Katti of OpenAI. “Cerebras adds a dedicated low-latency inference solution to our platform. That means faster responses, more natural interactions, and a stronger foundation to scale real-time AI to many more people.”
Cerebras’ systems are designed around a single, giant processor rather than clusters of smaller chips, a technical choice intended to minimise communication delays between compute units. By keeping model parameters and data closer together on one chip, the company says inference can be delivered with far lower latency than traditional architectures.
Related reading
- Datadog uses Codex for system-level code review
- OpenAI launches ChatGPT Health
- Google plans nationwide digital celebration for America’s 250th anniv…
Andrew Feldman, co-founder and chief executive officer of Cerebras, said the partnership reflects a shift in how AI will be deployed at scale. “We are delighted to partner with OpenAI, bringing the world’s leading AI models to the world’s fastest AI processor,” he said. “Just as broadband transformed the internet, real-time inference will transform AI, enabling entirely new ways to build and interact with AI models.”
The announcement underscores growing demand for specialised inference infrastructure as AI models are increasingly deployed in interactive, user-facing applications, where response times can be as important as raw model capability.
The Recap
- OpenAI adds Cerebras low latency AI compute to platform.
- The deal adds 750MW of ultra low latency compute.
- Capacity will come online in multiple tranches through 2028.