xAI

Multimodal generative AI with strong reasoning and visual capabilities

xAI develops foundational models that combine reasoning, real-time knowledge, and multimodal generation. Its Grok model family delivers context aware, expressive outputs across text and visual generation tasks. As a Runware provider, xAI models are available through a single inference pipeline alongside other creators.

Models by xAI

Launch View details

Grok Imagine Video 1.5

Grok Imagine Video 1.5 is xAI's newer image-to-video model. It is positioned above the earlier Grok Imagine Video release with higher per-second pricing, supports durations up to 15 seconds, and generates 480p or 720p video from a single still-image starting frame for cinematic clips, animated visuals, and prompt-guided short-form video creation.

Launch View details

Grok Imagine Image Quality

Grok Imagine Image Quality is xAI's quality-focused image generation and editing model. It is designed for higher realism, stronger multilingual text rendering, tighter prompt following, deeper scene understanding, and more consistent brand-oriented output across both text-to-image and image editing workflows.

Launch View details

Grok 4.3

Grok 4.3 is xAI's flagship language model for agentic reasoning, strong instruction following, and minimal hallucinations. It supports text and image input, a 1 million token context window, configurable reasoning effort including non-reasoning mode, function calling, and structured outputs for production assistants, coding workflows, and long-context analysis.

Launch View details

xAI Text-to-Speech

xAI Text-to-Speech converts text into natural-sounding spoken audio with a single API call. It offers five expressive voices (Eve, Ara, Leo, Rex, and Sal), inline speech tags for fine-grained control over pauses, laughter, whispers, and emphasis, and supports over 20 auto-detected languages.

Launch View details

Grok Imagine Image

Grok Imagine Image is a multimodal generative image model that creates high-quality still images from text prompts or image inputs. It supports flexible visual synthesis across a range of styles, enabling developers to generate creative imagery directly from structured prompts or to expand on existing visuals with coherent, detailed outputs.

Launch View details

Grok Imagine Video

Grok Imagine Video is a multimodal generative video model that produces short video clips with native audio from text descriptions or static images. It supports text-to-video and image-to-video generation with synchronized sound effects and dialogue, enabling developers to animate scenes with motion, camera dynamics, and audio in a single API workflow.