Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Default interruption handling is not enough for production call centres, ElevenLabs warns

The gap between a platform's built-in barge-in behaviour and what noisy, high-volume telephony environments actually require is wider than most engineering teams realise

Defused News Writer profile image
by Defused News Writer
Default interruption handling is not enough for production call centres, ElevenLabs warns
Photo by Clem Onojeghuo / Unsplash

ElevenLabs has published a technical assessment warning that its native barge-in handling, the mechanism that allows a caller to interrupt an AI agent mid-sentence, is insufficient for production call centre deployments without additional engineering work.

And the report goes on to say that broken turn-taking in voice AI systems costs US businesses an estimated $62 billion per year.

Barge-in, or interruption detection, is the capability that determines whether a voice agent stops talking and begins processing a new response when a human caller speaks over it.

The problem is harder than it appears because reliable interruption handling is not a single component but a chain of dependent systems, each of which must perform within tight latency constraints for the overall behaviour to feel natural.

The chain runs from voice activity detection, which identifies whether a human is speaking, through streaming speech-to-text transcription, which must produce stable interim results quickly enough to act on, to text-to-speech cancellation, which must stop audio playback at the right moment without creating an audible glitch.

ElevenLabs acknowledges that its platform handles straightforward interruptions automatically, but does not expose the low-level controls that engineering teams need to tune behaviour for difficult conditions: variable voice activity detection thresholds, overlapping-speech handling and conditional logic that can distinguish a genuine interruption from background noise or affirmative sounds such as "mm-hmm" that should not trigger a stop.

The latency budget is unforgiving.

Natural human conversation typically has gaps between turns of under 300 milliseconds, meaning a voice AI system has roughly a third of a second to detect an interruption, stop generating audio and begin processing the new input before the interaction starts to feel broken.

Under telephony conditions, with added network latency, compressed audio and the acoustic variability of real call centre environments, including background noise and diverse accents, that budget becomes even tighter.

ElevenLabs recommends that engineering teams test interruption handling against production-realistic conditions, including noisy audio, injected affirmations and concurrent-load simulation, before deploying at scale, and points teams towards Deepgram's Voice Agent API as an integrated runtime with lower-level controls for turn-taking behaviour.

The recap

  • ElevenLabs' native interruption handling suits scripted, clean-audio flows.
  • Poor customer service costs U.S. businesses $62 billion per year.
  • The announcement recommends realistic tests and a $200 trial.
Defused News Writer profile image
by Defused News Writer