Gemini audio models gain native live‑voice updates
Google updated its Gemini 2.5 Flash Native Audio model to improve live voice agents and add live speech translation across Google products.
Google rolled out an updated Gemini 2.5 Flash Native Audio model across Google AI Studio and Vertex AI, and has begun deploying it in Gemini Live and Search Live, the company said. The update brings native audio to Search Live for the first time and introduces a beta live speech translation experience in the Google Translate app, the company added.
The company said the model was improved in three areas: more reliable function calling, stronger instruction following, and better multi‑turn conversation quality. On ComplexFuncBench Audio, an evaluation capturing multi‑step function calling, the model scored 71.5%. The company reported a 90% adherence rate to developer instructions, up from 84%, and said the model retrieves context from previous turns more effectively.
Customers testing the model include Shopify, United Wholesale Mortgage and Newo.ai. “Users often forget they’re talking to AI within a minute of using Sidekick, and in some cases have thanked the bot after a long chat…New Live API AI capabilities offered through Gemini [2.5 Flash Native Audio] empower our merchants to win,” David Wurtz, VP of Product, Shopify said.
Gemini’s live speech translation supports continuous listening and two‑way conversation, the company said, preserving intonation, pacing and pitch. It can translate over 70 languages and 2,000 language pairs, handle multilingual input, auto‑detect the spoken language, and filter ambient noise. The company said the Translate app beta is rolling out to Android devices in the US, Mexico and India, with iOS and more regions coming soon, and that it plans to expand the feature to more Google products including the Gemini API.
The company said Gemini 2.5 Flash Native Audio is generally available on Vertex AI and available as a preview in the Gemini API, and that Gemini 2.5 Flash and 2.5 Pro text‑to‑speech models are available via the Gemini API in Google AI Studio.
The Recap
- Google updated Gemini 2.5 Flash Native Audio for live voice agents.
- Model scored 71.5% on ComplexFuncBench Audio benchmark eval.
- Live speech translation beta rolling out in Google Translate app.