Inworld

Audio models

Inworld TTS-1.5 Mini

Inworld TTS-1.5 Mini is a lightweight text-to-speech model designed for real-time voice experiences with ultra-low latency and efficient performance. It delivers natural, expressive audio suitable for interactive agents, voice assistants, and conversational applications. The Mini variant balances speed and quality, enabling responsive speech output even under constrained compute conditions.

Model AIR ID: inworld:tts@1.5-mini.

Supported workflows: Text-to-audio.

Technical specifications:

Speech text: 2–2,000 characters (required).
Speech voice: Required. Specifies the voice for synthesis.
Speech speed: 0.5–1.5 (multiples of 0.1, default: 1). Controls the playback speed of the generated audio.
Temperature: 0.1–2 (default: 1.1). Controls the expressiveness and variability of the generated speech.
Audio settings: Supports sampleRate and bitrate configuration.

Provider-specific settings:

Parameters supported: voice.

Basic text-to-speech

{
  "taskType": "audioInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "model": "inworld:tts@1.5-mini",
  "speech": {
    "text": "Welcome to our platform. We're excited to have you here.",
    "voice": "alloy"
  }
}

With speed control

{
  "taskType": "audioInference",
  "taskUUID": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
  "model": "inworld:tts@1.5-mini",
  "speech": {
    "text": "This is a slower, more deliberate narration for a documentary-style presentation.",
    "voice": "alloy",
    "speed": 0.8
  }
}

With temperature and audio settings

{
  "taskType": "audioInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440000",
  "model": "inworld:tts@1.5-mini",
  "speech": {
    "text": "A dramatic reading with expressive intonation and emotional depth.",
    "voice": "alloy",
    "speed": 1.2
  },
  "settings": {
    "temperature": 1.5
  },
  "audioSettings": {
    "sampleRate": 44100,
    "bitrate": 192
  }
}

With provider-specific voice

{
  "taskType": "audioInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da7",
  "model": "inworld:tts@1.5-mini",
  "speech": {
    "text": "A conversational response from an AI assistant with natural pacing.",
    "voice": "alloy"
  },
  "providerSettings": {
    "inworld": {
      "voice": "custom-voice-id"
    }
  }
}

Inworld TTS-1.5 Max

Inworld TTS-1.5 Max is a high-fidelity text-to-speech model engineered for expressive voice synthesis with rich prosody, nuanced emotional range, and broadcast-ready audio quality. It supports a wide set of languages and delivers more natural pronunciation and expressive variation suitable for narration, content creation, and immersive character voices. The Max variant prioritizes audio quality and expressiveness while still supporting responsive generation.

Model AIR ID: inworld:tts@1.5-max.

Supported workflows: Text-to-audio.

Technical specifications:

Speech text: 2–2,000 characters (required).
Speech voice: Required. Specifies the voice for synthesis.
Speech speed: 0.5–1.5 (multiples of 0.1, default: 1). Controls the playback speed of the generated audio.
Temperature: 0.1–2 (default: 1.1). Controls the expressiveness and variability of the generated speech.
Audio settings: Supports sampleRate and bitrate configuration.

Provider-specific settings:

Parameters supported: voice.

High-fidelity narration

{
  "taskType": "audioInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d7",
  "model": "inworld:tts@1.5-max",
  "speech": {
    "text": "In a world where technology and creativity converge, new possibilities emerge every day.",
    "voice": "alloy"
  }
}

Expressive with high temperature

{
  "taskType": "audioInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d59",
  "model": "inworld:tts@1.5-max",
  "speech": {
    "text": "The stage lights dimmed, and the audience held its breath as the final act began.",
    "voice": "alloy",
    "speed": 0.9
  },
  "settings": {
    "temperature": 1.8
  }
}

Broadcast-ready with audio settings

{
  "taskType": "audioInference",
  "taskUUID": "4192bff0-e1e0-43ce-a4db-912808c32493",
  "model": "inworld:tts@1.5-max",
  "speech": {
    "text": "Breaking news: scientists have discovered a new method for sustainable energy production that could revolutionize the industry.",
    "voice": "alloy",
    "speed": 1.1
  },
  "settings": {
    "temperature": 1.1
  },
  "audioSettings": {
    "sampleRate": 48000,
    "bitrate": 256
  }
}

With provider-specific voice

{
  "taskType": "audioInference",
  "taskUUID": "2b30193e-83b3-c392-1192-9cad0e1f2031",
  "model": "inworld:tts@1.5-max",
  "speech": {
    "text": "Welcome back to the show. Today we have an incredible lineup of guests joining us.",
    "voice": "alloy"
  },
  "providerSettings": {
    "inworld": {
      "voice": "custom-voice-id"
    }
  }
}

Introduction

Audio models

Inworld TTS-1.5 Mini

Inworld TTS-1.5 Max