Inworld AI

Expressive AI speech synthesis with low latency voice generation

Inworld AI develops real-time speech synthesis models that deliver emotionally nuanced, low latency voice generation with support for zero-shot voice cloning, multilingual output, and emotional markups. On Runware, Inworld AI functions as a provider for high fidelity text to speech generation, offering models optimized for both real-time responsiveness and broadcast ready audio quality across a diverse library of natural sounding voices.

Models by Inworld AI

Launch View details

Inworld Realtime TTS-2

Inworld Realtime TTS-2 is a conversational text-to-speech model built for realtime voice interaction rather than static narration. It supports free-form voice direction, carries tone and pacing forward from prior audio in a session, preserves one voice identity across 100+ languages, and is designed for expressive, low-latency speech in assistants, characters, support agents, and interactive products.

Launch View details

Inworld TTS-1.5 Mini

Inworld TTS-1.5 Mini is a lightweight text-to-speech model designed for real-time voice experiences with ultra-low latency and efficient performance. It delivers natural, expressive audio suitable for interactive agents, voice assistants, and conversational applications where responsiveness is critical. The Mini variant balances speed and quality, enabling responsive speech output even under constrained compute conditions.

Launch View details

Inworld TTS-1.5 Max

Inworld TTS-1.5 Max is a high-fidelity text-to-speech model engineered for expressive voice synthesis with rich prosody, nuanced emotional range, and broadcast-ready audio quality. It supports a wide set of languages and delivers more natural pronunciation and expressive variation suitable for narration, content creation, and immersive character voices. The Max variant prioritizes audio quality and expressiveness while still supporting responsive generation.