Baseten AI News nvidia Tech artificial intelligence

AI inference providers claim up to 10x cost cuts on Nvidia's Blackwell chips

Baseten, DeepInfra, Fireworks AI and Together AI say open source models run far cheaper on new platform

by Defused News Writer

Updated February 12, 2026

AI inference providers claim up to 10x cost cuts on Nvidia's Blackwell chips — Photo by Brian Kostiuk / Unsplash

Four artificial intelligence inference providers say they have reduced the cost per token of running open source models by up to ten times by deploying them on Nvidia's Blackwell computing platform.

Baseten, DeepInfra, Fireworks AI and Together AI, which host and serve AI models on behalf of other companies, said the savings came from combining frontier open source models with Blackwell's hardware, Nvidia's software tools and their own optimised inference stacks.

The cost reductions are measured against Nvidia's previous generation Hopper platform.

DeepInfra, the inference provider, said it cut the cost per million tokens on a large mixture-of-experts model from 20 cents on Hopper to ten cents on Blackwell, and to five cents using Blackwell's native NVFP4 numerical format.

In healthcare, Sully.ai, which builds AI tools for physicians, said its inference costs fell by 90% after moving to Baseten's platform on Blackwell, while response times improved by 65% for critical workflows.

Baseten said the deployment delivered up to 2.5 times better throughput per dollar compared with Hopper.

Fireworks AI said its Blackwell-optimised stack helped Sentient Labs, an AI startup, achieve 25% to 50% better cost efficiency and handle a surge of 1.8 million waitlisted users within 24 hours of a product launch, processing 5.6 million queries in a single week.

Together AI and Decagon, a customer service AI company, reported response times below 400 milliseconds for voice queries and a sixfold drop in cost per query by combining speculative decoding, caching and automatic scaling on Blackwell.

The announcements reflect growing competition among inference providers to drive down the cost of serving open source AI models, which have become increasingly popular as alternatives to proprietary systems from OpenAI and Anthropic.

Nvidia's Blackwell architecture, which began shipping to data centres last year, was designed to deliver large performance gains for AI workloads over its Hopper predecessor.

The Recap

Providers cut cost per token by up to 10x on Blackwell.
Sully.ai reduced inference costs by 90% and cut response times by 65%.
DeepInfra lowered token cost to five cents per million tokens.

by Defused News Writer

Updated February 12, 2026

Subscribe to Our Newsletter

AI inference providers claim up to 10x cost cuts on Nvidia's Blackwell chips

The Recap

Latest posts

Deepgram adds Hebrew, Persian and Urdu to its speech-to-text model

Most US healthcare providers yet to deploy agentic AI despite growing confidence in its potential

Robots are leaving the lab. 2026 is when they start working for real

Google Cloud warns state-backed hackers are targeting Western defence suppliers

Related reading

The Recap

Latest posts

Deepgram adds Hebrew, Persian and Urdu to its speech-to-text model

Most US healthcare providers yet to deploy agentic AI despite growing confidence in its potential

Robots are leaving the lab. 2026 is when they start working for real

Google Cloud warns state-backed hackers are targeting Western defence suppliers