Gemini 3.1 Flash TTS
Gemini 3.1 Flash TTS is a text-to-speech model for expressive spoken audio generation from text. It supports granular control over delivery through audio tags, native multi-speaker dialogue, and speech generation across 70+ languages, making it suitable for narration, conversational voice apps, podcasts, audiobooks, and other production-oriented voice workflows.
API Options
Platform-level options for task execution and delivery.
-
taskType
string required value: audioInference -
Identifier for the type of task being performed
-
taskUUID
string required UUID v4 -
UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.
-
outputType
string default: URL -
Audio output type.
Allowed values 3 values
-
outputFormat
string default: MP3 -
Specifies the file format of the generated output. The available values depend on the task type and the specific model's capabilities.
- `MP3`: Compressed audio, smaller file size.
- `WAV`: Uncompressed, high-quality audio.
- `FLAC`: Lossless compression.
- `OGG`: Open-source compressed audio format (Vorbis codec).
Allowed values 4 values
-
audioSettings
object -
Audio encoding settings for controlling the bitrate, number of channels, and sample rate of the generated audio. Only applicable for lossy output formats (
MP3andOGG). When using lossless formats (WAVorFLAC), this parameter must not be provided.The available sample rates and valid bitrate ranges depend on the output format. For
OGG, bitrate limits also vary by the number of channels.MP3 bitrate limits
Bitrate limits for MP3 are the same regardless of mono or stereo.
Sample Rate Min Bitrate Max Bitrate 8,000 Hz 8 kbps 64 kbps 11,025 Hz 8 kbps 64 kbps 12,000 Hz 8 kbps 64 kbps 16,000 Hz 8 kbps 160 kbps 22,050 Hz 8 kbps 160 kbps 24,000 Hz 8 kbps 160 kbps 32,000 Hz 32 kbps 320 kbps 44,100 Hz 32 kbps 320 kbps 48,000 Hz 32 kbps 320 kbps OGG bitrate limits — Mono (1 channel)
Sample Rate Min Bitrate Max Bitrate 8,000 Hz 8 kbps 40 kbps 12,000 Hz 16 kbps 48 kbps 16,000 Hz 16 kbps 96 kbps 24,000 Hz 16 kbps 80 kbps 48,000 Hz 32 kbps 224 kbps OGG bitrate limits — Stereo (2 channels)
Sample Rate Min Bitrate Max Bitrate 8,000 Hz 16 kbps 80 kbps 12,000 Hz 16 kbps 96 kbps 16,000 Hz 24 kbps 192 kbps 24,000 Hz 32 kbps 160 kbps 48,000 Hz 48 kbps 256 kbps Lossless formats: When
outputFormatis set toWAVorFLAC, theaudioSettingsparameter is not available since these formats produce uncompressed or lossless audio with no configurable encoding settings.Properties 3 properties
-
audioSettings»bitratebitrate
integer min: 8 -
Audio bitrate in kbps.
-
audioSettings»channelschannels
integer default: 2 -
Number of audio channels. 1 for mono, 2 for stereo.
Allowed values 2 values
-
audioSettings»sampleRatesampleRate
integer -
Audio sample rate in Hz.
-
-
webhookURL
string URI -
Specifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.
Learn more 1 resource
- Webhooks PLATFORM
- Webhooks
-
deliveryMethod
string default: sync -
Determines how the API delivers task results.
Allowed values 2 values
- Returns complete results directly in the API response.
- Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse.
Learn more 1 resource
- Task Polling PLATFORM
-
uploadEndpoint
string URI -
Specifies a URL where the generated content will be automatically uploaded using the HTTP PUT method. The raw binary data of the media file is sent directly as the request body. For secure uploads to cloud storage, use presigned URLs that include temporary authentication credentials.
Common use cases:
- Cloud storage: Upload directly to S3 buckets, Google Cloud Storage, or Azure Blob Storage using presigned URLs.
- CDN integration: Upload to content delivery networks for immediate distribution.
// S3 presigned URL for secure upload https://your-bucket.s3.amazonaws.com/generated/content.mp4?X-Amz-Signature=abc123&X-Amz-Expires=3600 // Google Cloud Storage presigned URL https://storage.googleapis.com/your-bucket/content.jpg?X-Goog-Signature=xyz789 // Custom storage endpoint https://storage.example.com/uploads/generated-image.jpgThe content data will be sent as the request body to the specified URL when generation is complete.
-
ttl
integer min: 60 -
Time-to-live (TTL) in seconds for generated content. Only applies when
outputTypeisURL.
-
includeCost
boolean default: false -
Include task cost in the response.
-
numberResults
integer min: 1 max: 4 default: 1 -
Number of results to generate. Each result uses a different seed, producing variations of the same parameters.
Generation Parameters
Core parameters for controlling the generated content.
-
model
string required value: google:gemini@3.1-flash-tts -
Identifier of the model to use for generation.
Learn more 3 resources
-
seed
integer min: 0 max: 2147483647 -
Random seed for reproducible generation. When not provided, a random seed is generated in the unsigned 32-bit range.
-
speech
object required -
Settings for speech generation.
Properties 4 properties
-
speech»texttext
string required min: 1 max: 4000 -
Text to convert to speech. For dialogue mode, use speaker tags such as [Sam] Hello [Bob] Hi there. Provider markup tags like [laughs] and [short pause] are also supported.
Learn more 2 resources
- Speech Generation: Prompting Guide EXTERNAL
- Speech Generation: Transcript Tags EXTERNAL
- Speech Generation: Prompting Guide
-
speech»voicevoice
string default: Zephyr -
Voice identifier to use. Set to
autofor automatic selection.Allowed values 30 values
- Female, Soft
- Male, Friendly
- Male, Gravelly
- Male, Smooth
- Male, Firm
- Female, Breezy
- Female, Bright
- Female, Easy-going
- Male, Informative
- Female, Smooth
- Male, Breathy
- Female, Clear
- Male, Excitable
- Female, Mature
- Male, Clear
- Female, Firm
- Female, Upbeat
- Female, Youthful
- Male, Firm
- Male, Upbeat
- Female, Forward
- Male, Informative
- Male, Lively
- Male, Knowledgeable
- Male, Even
- Female, Warm
- Male, Easy-going
- Female, Gentle
- Female, Bright
- Male, Casual
-
speech»voicesvoices
array of objects items: 2 -
Two-speaker dialogue configuration. Use instead of voice for multi-speaker generation.
Properties 2 properties
-
speech»voices»speakerspeaker
string required min: 1 -
Unique alphanumeric speaker alias. Must appear in the text as [Alias].
-
speech»voices»voicevoice
string required default: Zephyr -
Voice identifier to use. Set to
autofor automatic selection.Allowed values 30 values
- Female, Soft
- Male, Friendly
- Male, Gravelly
- Male, Smooth
- Male, Firm
- Female, Breezy
- Female, Bright
- Female, Easy-going
- Male, Informative
- Female, Smooth
- Male, Breathy
- Female, Clear
- Male, Excitable
- Female, Mature
- Male, Clear
- Female, Firm
- Female, Upbeat
- Female, Youthful
- Male, Firm
- Male, Upbeat
- Female, Forward
- Male, Informative
- Male, Lively
- Male, Knowledgeable
- Male, Even
- Female, Warm
- Male, Easy-going
- Female, Gentle
- Female, Bright
- Male, Casual
-
-
speech»languagelanguage
string min: 1 default: en-US -
Language code for speech generation.
Allowed values 87 values
- Arabic (Egypt)
- Bangla (Bangladesh)
- Dutch (Netherlands)
- English (India)
- English (United States)
- French (France)
- German (Germany)
- Hindi (India)
- Indonesian (Indonesia)
- Italian (Italy)
- Japanese (Japan)
- Korean (South Korea)
- Marathi (India)
- Polish (Poland)
- Portuguese (Brazil)
- Romanian (Romania)
- Russian (Russia)
- Spanish (Spain)
- Tamil (India)
- Telugu (India)
- Thai (Thailand)
- Turkish (Turkey)
- Ukrainian (Ukraine)
- Vietnamese (Vietnam)
- Afrikaans (South Africa)
- Albanian (Albania)
- Amharic (Ethiopia)
- Arabic (World)
- Armenian (Armenia)
- Azerbaijani (Azerbaijan)
- Basque (Spain)
- Belarusian (Belarus)
- Bulgarian (Bulgaria)
- Burmese (Myanmar)
- Catalan (Spain)
- Cebuano (Philippines)
- Chinese, Mandarin (China)
- Chinese, Mandarin (Taiwan)
- Croatian (Croatia)
- Czech (Czech Republic)
- Danish (Denmark)
- English (Australia)
- English (United Kingdom)
- Estonian (Estonia)
- Filipino (Philippines)
- Finnish (Finland)
- French (Canada)
- Galician (Spain)
- Georgian (Georgia)
- Greek (Greece)
- Gujarati (India)
- Haitian Creole (Haiti)
- Hebrew (Israel)
- Hungarian (Hungary)
- Icelandic (Iceland)
- Javanese (Java)
- Kannada (India)
- Konkani (India)
- Lao (Laos)
- Latin (Vatican City)
- Latvian (Latvia)
- Lithuanian (Lithuania)
- Luxembourgish (Luxembourg)
- Macedonian (North Macedonia)
- Maithili (India)
- Malagasy (Madagascar)
- Malay (Malaysia)
- Malayalam (India)
- Mongolian (Mongolia)
- Nepali (Nepal)
- Norwegian, Bokmål (Norway)
- Norwegian, Nynorsk (Norway)
- Odia (India)
- Pashto (Afghanistan)
- Persian (Iran)
- Portuguese (Portugal)
- Punjabi (India)
- Serbian (Serbia)
- Sindhi (India)
- Sinhala (Sri Lanka)
- Slovak (Slovakia)
- Slovenian (Slovenia)
- Spanish (Latin America)
- Spanish (Mexico)
- Swahili (Kenya)
- Swedish (Sweden)
- Urdu (Pakistan)
-
Settings
Technical parameters to fine-tune the inference process. These must be nested inside the settings object.
settings object.-
settings»temperaturetemperature
float min: 0 max: 2 step: 0.01 default: 1 -
Controls randomness in generation. Lower values produce more deterministic outputs, higher values increase variation and creativity.
Midnight Museum Security Banter
{
"taskType": "audioInference",
"taskUUID": "522826a1-17aa-4c4c-8c92-b7babb2a00a3",
"model": "google:gemini@3.1-flash-tts",
"seed": 75937,
"speech": {
"language": "en-US",
"voices": [
{
"speaker": "Mara",
"voice": "Gacrux"
},
{
"speaker": "Leo",
"voice": "Zubenelgenubi"
}
],
"text": "[Mara] [quietly] Eleven minutes to closing rounds, and the bronze owl is staring at me again. [short pause] Tell me you moved it as a joke. [Leo] I did not. [dry chuckle] Last time I touched that thing, the curator gave me a fifteen-minute lecture about patina. [Mara] [whispering] Then why is it facing the east hallway now? It was pointed at the staircase an hour ago. [Leo] Maybe it appreciates architecture. [short pause] Or maybe you need stronger coffee. [Mara] I switched to mint gum at nine. [soft exhale] Listen. [short pause] Did you hear that little tap? [Leo] Yeah. From the gallery with the ship maps. [lower voice] Okay, that one I don't love. [Mara] We do this properly. Slow walk, flashlights low, no dramatic hero speeches. [Leo] Agreed. If anything lunges, I'm donating my bravery to science. [Mara] [amused] Noted. [footing steady tone] East hallway clear. Marble busts still judgmental. [Leo] Ship-map room ahead. Door's cracked open by... maybe two inches. [short pause] That was definitely closed. [Mara] On three. One... two... [both inhale] [Leo] Wait. [short pause] There it is again. Tiny metal tapping, like... claws? [Mara] [relieved laugh] Leo. Look down. [Leo] Oh, you've got to be kidding me. [warmly] It's the curator's wind-up beetle. [Mara] Missing since Tuesday. Marching straight into the wall with full confidence. [Leo] [laughs] So the haunted tapping menace is eight centimeters long and losing a fight with baseboard trim. [Mara] Case closed. Retrieve the beetle, secure the door, and never speak of the owl incident again. [Leo] Deal. [playful whisper] But if the owl moves before sunrise, I'm requesting a transfer."
},
"settings": {
"temperature": 0.82
}
}{
"taskType": "audioInference",
"taskUUID": "522826a1-17aa-4c4c-8c92-b7babb2a00a3",
"audioUUID": "4882b7c4-3f4a-43e1-82eb-3d65b6d7eb89",
"audioURL": "https://am.runware.ai/audio/os/a06dlim3/ws/5/ai/4882b7c4-3f4a-43e1-82eb-3d65b6d7eb89.mp3",
"seed": 75937,
"cost": 0.05442
}Riverside Bento Train Farewell
{
"taskType": "audioInference",
"taskUUID": "45e1eaa6-28e2-467b-b223-c7df9cc645ad",
"model": "google:gemini@3.1-flash-tts",
"seed": 76964,
"speech": {
"language": "ja-JP",
"text": "[Mika] [softly] お弁当、まだ温かいよ。急いで包んだから、見た目は少しだけ許してね。[short pause] でも、鮭は上手に焼けたと思う。[Ren] そんなこと言うと、開ける前から泣きそうになるよ。[laughs] 朝のホームで、これは反則だな。[Mika] 泣くのはまだ早いよ。列車が橋を渡るまでは、ちゃんと笑って見送るって決めたの。[short pause] だから……はい、受け取って。[Ren] ありがとう。重さがちょうどいいね。卵焼き、入ってる?[Mika] 入ってる。甘いやつ。あと、きんぴらと、しそごはん。[laughs] 子どもの遠足みたいって言わないでね。[Ren] 言わないよ。むしろ、その感じがうれしい。知らない町に着いても、最初の昼ごはんが君の味なら、少し平気でいられる。[short pause] [Mika] じゃあ、ひとつ約束。向こうで忙しくなっても、お昼はちゃんと食べること。景色の写真も送って。曇りの日も、何でも。[Ren] うん。毎週日曜日、必ず電話する。橋が見えたら、最初に君を思い出す。[short pause] [Mika] あ、ベル……もう行かなきゃ。[voice trembles slightly] ねえ、振り向かなくてもいいから、窓際に座って。[Ren] どうして?[Mika] 列車が出るとき、顔が見えたら、たぶん私、笑えなくなるから。[short pause] [Ren] ……わかった。でも、最後にひとつだけ。[softly] 行ってきます。[Mika] [warmly] いってらっしゃい。お弁当、冷める前に半分食べてね。[laughs softly]",
"voices": [
{
"speaker": "Mika",
"voice": "Sulafat"
},
{
"speaker": "Ren",
"voice": "Achird"
}
]
},
"settings": {
"temperature": 0.82
}
}{
"taskType": "audioInference",
"taskUUID": "45e1eaa6-28e2-467b-b223-c7df9cc645ad",
"audioUUID": "4c6a588f-152f-4c39-be94-99638fce893d",
"audioURL": "https://am.runware.ai/audio/os/a06dlim3/ws/5/ai/4c6a588f-152f-4c39-be94-99638fce893d.mp3",
"seed": 76964,
"cost": 0.05724
}Faro Costero de Emergencia
{
"taskType": "audioInference",
"taskUUID": "7373192d-ef7f-4b14-a447-355aaf113930",
"model": "google:gemini@3.1-flash-tts",
"seed": 32004,
"speech": {
"language": "es-ES",
"voices": [
{
"speaker": "Nora",
"voice": "Kore"
},
{
"speaker": "Hugo",
"voice": "Iapetus"
}
],
"text": "[Nora] Central del faro Punta Grana, aquí Nora. [short pause] Se ha ido la luz principal, pero la lente auxiliar sigue girando. [short pause] Necesito confirmación de la marea y del carguero en ruta. [Hugo] Recibido, Nora. [short pause] El carguero Belmonte está a dieciséis minutos de tu costa y reporta niebla espesa. [short pause] Mantén la baliza manual encendida. [Nora] Entendido. [short pause] [breathes] Escucho la sirena del buque, pero no veo su señal. [Hugo] Nora, quiero tu voz firme. [short pause] Vas a dar el aviso por canal abierto, despacio y claro. [Nora] De acuerdo. [short pause] Atención, Belmonte, atención. Les habla el faro Punta Grana. La luz principal está fuera de servicio. Repito: la luz principal está fuera de servicio. Sigan rumbo dos grados al este y reduzcan velocidad. [short pause] [Hugo] Muy bien. [short pause] El capitán responde... te ha oído. Corrigen rumbo ahora. [Nora] [softly] Gracias. [short pause] Pensé que llegarían demasiado cerca de las rocas. [Hugo] No esta noche. [short pause] Cuando amanezca, el puerto entero sabrá que los guiaste con una lámpara de reserva y mucho pulso."
},
"settings": {
"temperature": 0.82
}
}{
"taskType": "audioInference",
"taskUUID": "7373192d-ef7f-4b14-a447-355aaf113930",
"audioUUID": "db128e30-124e-4133-a098-bc2b57e8edac",
"audioURL": "https://am.runware.ai/audio/os/a05d22/ws/5/ai/db128e30-124e-4133-a098-bc2b57e8edac.mp3",
"seed": 32004,
"cost": 0.03673
}High-Altitude Balloon Logbook Dialogue
{
"taskType": "audioInference",
"taskUUID": "40e17de5-ae3d-4a66-8dbd-fc15544a3171",
"model": "google:gemini@3.1-flash-tts",
"seed": 97799,
"speech": {
"language": "en-US",
"voices": [
{
"speaker": "Mara",
"voice": "Gacrux"
},
{
"speaker": "Jules",
"voice": "Fenrir"
}
],
"text": "[Mara] Flight log, ascent hour three. Altitude steady. External temperature is far below forgiving, and the sunrise is spreading copper along the curve of the Earth. [short pause] Jules, status on the particle sampler? [Jules] [excitedly] Sampler is live! And Mara, you have to look right now. The cloud deck below us looks like crumpled silk. I know I am supposed to sound professional, but this is unbelievable. [Mara] [warmly] You may sound amazed in the official record. Amazement is appropriate at twenty-eight kilometers. [short pause] Just keep your glove away from the latch this time. [Jules] That happened once. [laughs] Fine, twice. But this time I am graceful. [short pause] Wait— reading spike on channel four. A clean one. Do you want me to mark it? [Mara] Mark it, timestamp it, and say exactly what you see. Nice and slow. [Jules] Channel four spike at zero six fourteen local. Sharp rise, short tail, no matching noise on the control band. [lower voice] That is either a lovely little data gift or the universe showing off. [Mara] In science, those are sometimes the same thing. [short pause] Rotate the array five degrees east. [Jules] Rotating now. [softly] You know, when I was ten, I used to climb onto the shed roof with binoculars and imagine this exact view. [Mara] And now? [Jules] Now I am above weather, holding a pencil with numb fingers, trying not to cry on a very expensive instrument package. [Mara] [gentle chuckle] For the logbook: morale remains exceptionally high. [short pause] Jules, confirm rotation complete. [Jules] Rotation complete. Signal stabilizing. Oh— there it is again, broader this time. Mara, I think we caught something real. [Mara] Then let the record show this clearly. [measured] At sunrise over the upper atmosphere, the team observed a repeat event on channel four. [softly] And for one quiet minute, the whole sky seemed to lean in and listen."
},
"settings": {
"temperature": 0.78
}
}{
"taskType": "audioInference",
"taskUUID": "40e17de5-ae3d-4a66-8dbd-fc15544a3171",
"audioUUID": "301d9440-8d3d-4dc3-bbef-33d963cb2fbf",
"audioURL": "https://am.runware.ai/audio/os/a06dlim3/ws/5/ai/301d9440-8d3d-4dc3-bbef-33d963cb2fbf.mp3",
"seed": 97799,
"cost": 0.06569
}Subway Violinist Interview
{
"taskType": "audioInference",
"taskUUID": "fa329722-8162-4bb6-854b-e8e43371d3af",
"model": "google:gemini@3.1-flash-tts",
"seed": 25439,
"speech": {
"text": "[Mara] [softly] We are standing beneath the eastbound platform, where every note seems to bounce off tile and steel. [short pause] Sir, your case is open, your bow is frayed, and yet people keep stopping. Why here? [Tomas] [warm chuckle] Because this tunnel listens. Concert halls judge you. Down here, the walls give the music back a little bruised, a little honest. [Mara] You began with something delicate, then turned suddenly fierce. Was that planned? [Tomas] Not exactly. [short pause] A child dropped a coin, someone missed a train, a couple started arguing near the stairs. I heard all of it. So the melody changed its mind. [Mara] [smiling] You make it sound like the station is your duet partner. [Tomas] On good nights, yes. On bad nights, it is a critic with bad manners. [both laugh] [Mara] One last question. If a tired commuter only gives you ten seconds of attention, what do you hope they carry upstairs with them? [Tomas] [gentler] Proof that the day did not flatten everything. Just that. A small lift in the ribs. A reason to walk slower when they reach the street.",
"language": "en-US",
"voices": [
{
"speaker": "Mara",
"voice": "Erinome"
},
{
"speaker": "Tomas",
"voice": "Umbriel"
}
]
},
"settings": {
"temperature": 0.87
}
}{
"taskType": "audioInference",
"taskUUID": "fa329722-8162-4bb6-854b-e8e43371d3af",
"audioUUID": "e83d4e58-b123-49c3-8923-f3a5d4c5a8e7",
"audioURL": "https://am.runware.ai/audio/os/a07dlim3/ws/5/ai/e83d4e58-b123-49c3-8923-f3a5d4c5a8e7.mp3",
"seed": 25439,
"cost": 0.04299
}