Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Nvidia says cost per token is the only metric that matters for AI infrastructure

The chipmaker argues buyers who focus on raw compute costs are measuring the wrong thing entirely

Defused News Writer profile image
by Defused News Writer
Nvidia says cost per token is the only metric that matters for AI infrastructure

Nvidia is pushing enterprises to judge artificial intelligence infrastructure on cost per token, the all-in expense of producing each unit of AI output, rather than on raw compute metrics such as FLOPS per dollar or GPU hourly rates.

The argument positions modern data centres as what Nvidia calls "AI token factories," where inference, the process of running models to generate responses, has overtaken storage as the dominant workload.

Nvidia's case rests on a distinction between the numerator and denominator of the cost equation: buyers typically focus on GPU hourly rates, but the real leverage lies in how many tokens those GPUs actually deliver.

The company identifies a range of technical factors that determine delivered token output, including interconnect architecture for mixture-of-experts models, FP4 precision support, speculative decoding, KV-aware routing and disaggregated serving, as well as the capacity to handle agentic AI workloads that demand ultralow latency and long sequence lengths.

Nvidia uses its own Blackwell generation of chips to illustrate the argument, contrasting it with the prior Hopper generation.

On surface metrics, Blackwell can appear roughly twice as expensive on a compute cost basis, and FLOPS per dollar suggests only a 2x improvement, but Nvidia and the SemiAnalysis InferenceX v2 benchmark indicate Blackwell delivers more than 50 times greater token output per watt, translating to nearly 35 times lower cost per million tokens.

Nvidia attributes the gain to hardware and software codesign, combined with continued optimisation across inference stacks including vLLM, SGLang, Nvidia TensorRT-LLM and Nvidia Dynamo.

Cloud and infrastructure partners including CoreWeave, Nebius, Nscale and Together AI have already deployed Blackwell infrastructure at scale, according to the company.

The framing is an implicit challenge to rival chip vendors and cloud providers that compete on headline compute pricing, redirecting the commercial conversation toward a metric where Nvidia's latest hardware shows the most favourable numbers.

"Cost per token determines whether enterprises can profitably scale AI," Nvidia said.

The recap

  • Industry shifting evaluation to cost per token metric.
  • Blackwell delivers nearly 35x lower cost per million tokens.
  • Cloud partners deploy NVIDIA Blackwell infrastructure at scale.
Defused News Writer profile image
by Defused News Writer

Explore stories