Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Deepgram says protocol choice can make or break real-time voice AI performance

The audio AI company warns that the wrong streaming approach can add hundreds of milliseconds of lag to conversational voice agents

Defused News Writer profile image
by Defused News Writer
Deepgram says protocol choice can make or break real-time voice AI performance
Photo by Detail .co / Unsplash

Deepgram, the audio intelligence company, has published guidance warning developers that their choice of streaming protocol for text-to-speech (TTS) can materially affect the conversational latency of real-time voice agents.

The company contrasts three delivery patterns: REST request-response, which requires the client to wait for a complete audio file before playback; HTTP chunked streaming, which enables progressive one-way playback; and WebSocket streaming, which delivers audio as it is generated and allows mid-utterance interruption.

Deepgram says WebSocket connections can save 50 to 100 milliseconds per request compared with REST in multi-turn conversations, a gap that compounds significantly in high-concurrency environments such as contact centres.

The guidance places particular emphasis on human perception thresholds, citing research suggesting users begin to notice delays at around 300 milliseconds, with 500 milliseconds commonly perceived as unresponsive.

Deepgram says teams targeting a conversational feel should aim for well under 200 milliseconds end-to-end.

The company also warns that telephony infrastructure can undermine transport-level gains, noting that the public switched telephone network's 8kHz sampling rate and carrier jitter buffers often absorb the latency advantages of faster protocols, making interruption control and pacing more important than raw protocol speed in those environments.

Deepgram recommends that developers test both protocols against their actual traffic patterns and playback targets rather than relying on general benchmarks.

The company's TTS application programming interface supports all three delivery methods, and Deepgram is offering new users $200 in free credits to run comparative tests via its developer console.

The guidance is aimed primarily at teams building voice agents for customer service, telephony and other latency-sensitive applications.

The recap

  • Deepgram publishes guidance on choosing WebSocket or REST for TTS.
  • WebSocket can save 50–100ms per request in multi-turn conversations.
  • Developers can try both protocols with $200 free credits.
Defused News Writer profile image
by Defused News Writer

Explore stories