Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Microsoft brings high-speed AI model engine Fireworks AI to its Azure cloud platform

The partnership gives developers faster, cheaper access to open AI models with enterprise security controls built in

Defused News Writer profile image
by Defused News Writer
Microsoft brings high-speed AI model engine Fireworks AI to its Azure cloud platform
Photo by Rubaitul Azad / Unsplash

Microsoft has added Fireworks AI, a high-performance artificial intelligence inference engine, to its Azure cloud platform in public preview, giving developers a faster and more flexible way to run open AI models at scale.

Inference, in this context, refers to the process of running a trained AI model to generate responses, the step that happens every time a user sends a message or an application requests an output.

The partnership pairs Fireworks AI's speed-focused engine with Microsoft Foundry, Azure's platform for building, deploying and managing AI models and agents, combining raw processing performance with the security, compliance and governance controls that large organisations require.

The move responds to a growing preference among businesses for open models, AI systems whose underlying weights and architecture are publicly available, which offer more control over cost, customisation and vendor independence than proprietary alternatives tied to a single provider.

Fireworks AI says its engine already processes more than 13 trillion tokens daily, handling around 180,000 requests per second and generating more than 1,000 tokens per second on large models, with tokens being the small chunks of text that AI models read and produce.

Models available through the integration include DeepSeek V3.2, OpenAI's gpt-oss-120b, Kimi K2.5 and MiniMax M2.5.

Developers can upload their own customised or compressed model weights through a bring-your-own-weights option, and choose between paying per token used or buying provisioned throughput units (PTUs), a fixed-capacity arrangement that delivers more predictable performance for high-volume applications.

Yina Arenas, Microsoft's corporate vice president for AI platform, said the integration offers unified governance, observability and agent-ready tooling for production use.

Fireworks AI models are available now in the Foundry model catalogue, with serverless and PTU deployment options.

The recap

  • Microsoft Foundry begins public preview of Fireworks AI inference.
  • Fireworks engine processes over 13T tokens daily at internet scale.
  • Developers can deploy serverless or provisioned throughput units.
Defused News Writer profile image
by Defused News Writer

Explore stories