xAI

xAI

Multimodal generative AI with strong reasoning and visual capabilities

xAI develops foundational models that combine reasoning, real-time knowledge, and multimodal generation. Its Grok model family delivers context aware, expressive outputs across text and visual generation tasks. As a Runware provider, xAI models are available through a single inference pipeline alongside other creators.

Models by xAI

xAI Text-to-Speech converts text into natural-sounding spoken audio with a single API call. It offers five expressive voices (Eve, Ara, Leo, Rex, and Sal), inline speech tags for fine-grained control over pauses, laughter, whispers, and emphasis, and supports over 20 auto-detected languages.

Grok Imagine Image Pro is the higher quality variant of the Grok Imagine image model developed by xAI. It generates detailed images from text prompts and supports iterative editing of existing images through natural language instructions. The model provides stronger prompt adherence, improved rendering quality, and more reliable control over composition, style, and aspect ratio. It supports multiple image styles and resolutions up to 2K, enabling workflows for design, illustration, and creative prototyping.

Grok Imagine Image is a multimodal generative image model that creates high-quality still images from text prompts or image inputs. It supports flexible visual synthesis across a range of styles, enabling developers to generate creative imagery directly from structured prompts or to expand on existing visuals with coherent, detailed outputs.

Grok Imagine Video is a multimodal generative video model that produces short video clips with native audio from text descriptions or static images. It supports text-to-video and image-to-video generation with synchronized sound effects and dialogue, enabling developers to animate scenes with motion, camera dynamics, and audio in a single API workflow.