xAI
xAI

xAI Text-to-Speech

Expressive text-to-speech with five voices, speech tags, and multilingual support

Text to Audio

xAI Text-to-Speech Overview

xAI Text-to-Speech converts text into natural-sounding spoken audio with a single API call. It offers five expressive voices (Eve, Ara, Leo, Rex, and Sal), inline speech tags for fine-grained control over pauses, laughter, whispers, and emphasis, and supports over 20 auto-detected languages.

From $0.00420/ audio
1000 characters$0.0042

Commercial use

More models from xAI

Grok Imagine Image Pro is the higher quality variant of the Grok Imagine image model developed by xAI. It generates detailed images from text prompts and supports iterative editing of existing images through natural language instructions. The model provides stronger prompt adherence, improved rendering quality, and more reliable control over composition, style, and aspect ratio. It supports multiple image styles and resolutions up to 2K, enabling workflows for design, illustration, and creative prototyping.

Grok Imagine Image is a multimodal generative image model that creates high-quality still images from text prompts or image inputs. It supports flexible visual synthesis across a range of styles, enabling developers to generate creative imagery directly from structured prompts or to expand on existing visuals with coherent, detailed outputs.

Grok Imagine Video is a multimodal generative video model that produces short video clips with native audio from text descriptions or static images. It supports text-to-video and image-to-video generation with synchronized sound effects and dialogue, enabling developers to animate scenes with motion, camera dynamics, and audio in a single API workflow.