Google has released Gemini 3.1 Flash TTS, a text-to-speech model that gives developers granular control over AI-generated voices, including pace, tone, accent and mid-sentence expression changes.
The model is available from today in preview through the Gemini API, Google AI Studio, Vertex AI for enterprise customers and Google Workspace via Google Vids.
Gemini 3.1 Flash TTS supports more than 70 languages and handles native multi-speaker dialogue, allowing developers to assign distinct voice profiles to different speakers within a single output.
The model uses what Google calls "audio tags", natural language commands embedded directly in input text to steer vocal delivery without requiring separate configuration files.
Google AI Studio provides configurable tools including scene direction, speaker-level audio profiles and "director's notes" that toggle pace, tone and accent, while inline tags allow expression changes within individual sentences.
Developers can export these parameters as Gemini API code to reproduce consistent voices across projects.
The model achieved an Elo score of 1,211 on the Artificial Analysis text-to-speech leaderboard, placing it in what the benchmarking service describes as its highest-quality, most cost-effective quadrant.
All generated audio carries SynthID, Google's imperceptible watermarking system designed to flag AI-created content.
Related reading
- The race to replace ElevenLabs in live AI systems puts latency and reliability above voice quality
- Deepgram says protocol choice can make or break real-time voice AI performance
- Google opens its music generation model to developers, with watermarking built in from the start
Google said early testers reported improved controllability and expressivity compared with existing tools, and invited developers to experiment with the model through the Google AI Studio playground.
The launch reflects a broader push across major AI providers to move text-to-speech beyond simple narration toward more nuanced, production-grade audio generation.
The recap
- Google launches Gemini 3.1 Flash TTS in preview channels.
- It achieved an Elo score of 1,211 on Artificial Analysis.
- Developers can test features in Google AI Studio Playground today.