Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Aura-2 leads rea-time text-to-speech benchmark on speed and cost

Coval’s public tests show the Deepgram model delivers faster responses, tighter latency and lower effective costs, a combination that matters for large-scale voice and contact-centre systems.

Defused News Writer profile image
by Defused News Writer
Aura-2 leads rea-time text-to-speech benchmark on speed and cost
Photo by Masha S / Unsplash

Aura-2 has topped a public, real-time text-to-speech benchmark run by Coval, posting the lowest end-to-end latency, a narrower spread of response times and one of the lowest effective cost tiers among models tested.

In a statement, Coval said its benchmark suite simulates real-world voice agent scenarios and measures how systems perform across metrics that matter in practice, including latency, accuracy and how well agents handle interruptions. The platform runs repeated tests to capture not just average performance but consistency, which Coval said is critical for customer-facing deployments.

Coval said Aura-2 delivered the lowest effective end-to-end text-to-speech latency across repeated runs. In practical terms, that means a shorter pause between a user finishing a sentence and hearing a spoken response from an AI agent. The company said the model produced less “dead air” and allowed more overlap between processing and audio playback, making conversations feel more natural.

For a lay reader, latency is the delay between asking a question and hearing an answer. Even small delays can feel awkward in conversation. Coval said that in contact centres handling more than 10,000 calls a day, cutting just 100 milliseconds from each response can add up to hours of reduced waiting time across all callers.

Beyond speed, Coval said Aura-2 showed a tight latency distribution, meaning responses were not only fast on average but also predictable, with fewer long pauses caused by occasional slow responses. The company said the model maintained accuracy suitable for customer service use while meeting strict real-time performance targets.

In blinded preference tests cited by Coval, evaluators consistently rated Aura-2 highest for customer service scenarios when voices were assessed under realistic, real-time conditions. Coval also said Aura-2 operates in one of the lowest effective cost tiers among comparable models, combining performance with pricing that supports large-scale deployment.

Aura-2 is developed by Deepgram, which has focused on optimising both the software and infrastructure behind the model. In a separate technical post, Deepgram’s chief technology officer, Adam Sypniewski, said Aura-2 achieved sub-200 millisecond time to first byte at launch. Since then, the team has reduced steady-state response times to around 90 milliseconds, with 95% of responses under 200 milliseconds.

Sypniewski said the improvements came from increasing the number of concurrent audio streams per graphics processor, tightening scheduling and batching, and building the runtime in the Rust programming language. Separating prompt processing from audio synthesis and optimising how GPUs are orchestrated helped reduce variability as well as raw speed.

Coval noted that its benchmark explorer is public, allowing developers and buyers to inspect and compare results directly, while Deepgram makes Aura-2 available through its online Playground for hands-on testing.

The results highlight a growing focus in voice AI on conversational quality rather than headline model size. As more companies deploy AI agents for support and sales, the ability to respond quickly, consistently and at low cost is emerging as a competitive differentiator alongside accuracy and natural-sounding speech.

The Recap

  • Aura-2 led Coval's public real-time TTS benchmark suite.
  • Median latency and tight distribution were lowest among tested models.
  • Coval said its benchmark explorer is publicly accessible for comparison.
Defused News Writer profile image
by Defused News Writer

Latest posts