Microsoft unveils new Maia AI chip
Microsoft introduced Maia 200, a next-generation inference accelerator for large-scale AI.
Microsoft has introduced Maia 200, a next-generation inference accelerator built for large-scale AI workloads.
The tech company said Maia 200 is the most efficient inference system Microsoft has ever deployed, delivering 30% better performance per dollar over existing systems.
Microsoft highlighted native FP8/FP4 tensor cores and a redesigned memory system. It listed 216GB of HBM3e memory delivering 7TB/s and 272MB of on-chip SRAM.
Microsoft said each chip contains over 140 billion transistors. It also said each Maia 200 chip delivers over 10 petaFLOPS in 4-bit precision (FP4) and over 5 petaFLOPS in 8-bit precision (FP8). Microsoft put the design within a 750W SoC TDP envelope.
In performance comparisons, Microsoft said Maia 200 has three times the FP4 performance of the third generation Amazon Trainium. It also said FP8 performance is above Google’s seventh generation TPU.
Microsoft added that Maia 200 delivers 30% better performance per dollar than the latest-generation hardware in its fleet.
Microsoft said Maia 200 is part of its heterogeneous AI infrastructure. It said the accelerator will serve multiple models, including the latest GPT-5.2 models from OpenAI.
It said this will support Microsoft Foundry and Microsoft 365 Copilot.
The company said Maia 200 is deployed in its US Central datacenter region near Des Moines, Iowa. It said the US West 3 region near Phoenix, Arizona, is coming next.
Microsoft said it is previewing the Maia SDK. It said the toolset includes PyTorch integration and a Triton compiler, plus an optimized kernel library and access to a low-level programming language.
Related reading
- Midas Labs to launch Playmaker in Q2 2026
- VIP Play completes technology transformation
- NVIDIA launches Earth-2 open weather models
The SDK includes a Maia simulator and a cost calculator to support earlier efficiency optimization.
At the systems level, Microsoft described a two-tier scale-up network design built on standard Ethernet. It said each accelerator exposes 2.8TB/s of bidirectional, dedicated scale-up bandwidth. It also said the design supports collective operations across clusters of up to 6,144 accelerators.
The Recap
- Microsoft introduced Maia 200, a next-generation inference accelerator chip.
- Maia 200 delivers 30% better performance per dollar than existing systems.
- It is part of Microsoft's heterogeneous AI infrastructure, to serve multiple models