Inworld Realtime TTS-2

by Inworld AIMay 5, 2026

Inworld Realtime TTS-2 is a conversational text-to-speech model built for realtime voice interaction rather than static narration. It supports free-form voice direction, carries tone and pacing forward from prior audio in a session, preserves one voice identity across 100+ languages, and is designed for expressive, low-latency speech in assistants, characters, support agents, and interactive products.

Formatting LLM output for speech
How to write LLM system prompts that produce text TTS-2 can synthesize naturally, with normalization, filler words, and emphasis cues handled before the audio call.
Controlling voice delivery with steering tags
How to use natural-language steering tags to control emotion, pacing, volume, and vocal style in TTS-2 speech output.