Microsoft unveils new Maia AI chip

Microsoft has introduced Maia 200, a next-generation inference accelerator built for large-scale AI workloads.

The tech company said Maia 200 is the most efficient inference system Microsoft has ever deployed, delivering 30% better performance per dollar over existing systems.

Microsoft highlighted native FP8/FP4 tensor cores and a redesigned memory system. It listed 216GB of HBM3e memory delivering 7TB/s and 272MB of on-chip SRAM.

Microsoft said each chip contains over 140 billion transistors. It also said each Maia 200 chip delivers over 10 petaFLOPS in 4-bit precision (FP4) and over 5 petaFLOPS in 8-bit precision (FP8). Microsoft put the design within a 750W SoC TDP envelope.

In performance comparisons, Microsoft said Maia 200 has three times the FP4 performance of the third generation Amazon Trainium. It also said FP8 performance is above Google’s seventh generation TPU.

Microsoft added that Maia 200 delivers 30% better performance per dollar than the latest-generation hardware in its fleet.

Microsoft said Maia 200 is part of its heterogeneous AI infrastructure. It said the accelerator will serve multiple models, including the latest GPT-5.2 models from OpenAI.

It said this will support Microsoft Foundry and Microsoft 365 Copilot.

The company said Maia 200 is deployed in its US Central datacenter region near Des Moines, Iowa. It said the US West 3 region near Phoenix, Arizona, is coming next.

Microsoft said it is previewing the Maia SDK. It said the toolset includes PyTorch integration and a Triton compiler, plus an optimized kernel library and access to a low-level programming language.

The SDK includes a Maia simulator and a cost calculator to support earlier efficiency optimization.

At the systems level, Microsoft described a two-tier scale-up network design built on standard Ethernet. It said each accelerator exposes 2.8TB/s of bidirectional, dedicated scale-up bandwidth. It also said the design supports collective operations across clusters of up to 6,144 accelerators.

The Recap

Microsoft introduced Maia 200, a next-generation inference accelerator chip.
Maia 200 delivers 30% better performance per dollar than existing systems.
It is part of Microsoft's heterogeneous AI infrastructure, to serve multiple models

Subscribe to Our Newsletter

Microsoft unveils new Maia AI chip

The Recap

Google builds flood database to help predict urban flash floods a day in advance

FlashLabs launches hosted platform to make autonomous AI agents easier to deploy

Qurrent raises $15m to replace back-office workers with AI agents

Cryptio raises $45m to expand crypto accounting software for banks and asset manager

Microsoft publishes email security performance data in push for transparency

Explore topics

Tech

Artificial Intelligence

Business

Entertainment & Sport

Top tags

Microsoft unveils new Maia AI chip

Related reading

The Recap

Google builds flood database to help predict urban flash floods a day in advance

FlashLabs launches hosted platform to make autonomous AI agents easier to deploy

Qurrent raises $15m to replace back-office workers with AI agents

Cryptio raises $45m to expand crypto accounting software for banks and asset manager

Microsoft publishes email security performance data in push for transparency