Inworld Realtime TTS-2
Inworld Realtime TTS-2 is a conversational text-to-speech model built for realtime voice interaction rather than static narration. It supports free-form voice direction, carries tone and pacing forward from prior audio in a session, preserves one voice identity across 100+ languages, and is designed for expressive, low-latency speech in assistants, characters, support agents, and interactive products.
API Reference
INTEGRATE
Complete technical specification for integration
Request Response
Examples 5
CODE
Ready-to-use code snippets for common workflows
Guides 2
LEARN
Step-by-step tutorials for advanced use cases
- Formatting LLM output for speech How to write system prompts that make LLM output sound natural when synthesized by TTS-2. Covers text normalization, filler words, emphasis, and ready-to-use prompt templates.
- Controlling voice delivery with steering tags How to use natural-language steering tags to control emotion, pacing, volume, and vocal style in TTS-2 speech output.