Inworld AI

Inworld AI

Expressive AI speech synthesis with low latency voice generation

Inworld AI develops real-time speech synthesis models that deliver emotionally nuanced, low latency voice generation with support for zero-shot voice cloning, multilingual output, and emotional markups. On Runware, Inworld AI functions as a provider for high fidelity text to speech generation, offering models optimized for both real-time responsiveness and broadcast ready audio quality across a diverse library of natural sounding voices.

Models by Inworld AI

Inworld TTS-1.5 Mini is a lightweight text-to-speech model designed for real-time voice experiences with ultra-low latency and efficient performance. It delivers natural, expressive audio suitable for interactive agents, voice assistants, and conversational applications where responsiveness is critical. The Mini variant balances speed and quality, enabling responsive speech output even under constrained compute conditions.

Inworld TTS-1.5 Max is a high-fidelity text-to-speech model engineered for expressive voice synthesis with rich prosody, nuanced emotional range, and broadcast-ready audio quality. It supports a wide set of languages and delivers more natural pronunciation and expressive variation suitable for narration, content creation, and immersive character voices. The Max variant prioritizes audio quality and expressiveness while still supporting responsive generation.