
xAI
Multimodal generative AI with strong reasoning and visual capabilities
xAI develops foundational models that combine reasoning, real-time knowledge, and multimodal generation. Its Grok model family delivers context aware, expressive outputs across text and visual generation tasks. As a Runware provider, xAI models are available through a single inference pipeline alongside other creators.
Models by xAI
Grok Imagine Image Quality is xAI's quality-focused image generation and editing model. It is designed for higher realism, stronger multilingual text rendering, tighter prompt following, deeper scene understanding, and more consistent brand-oriented output across both text-to-image and image editing workflows.
Grok 4.3 is xAI's flagship language model for agentic reasoning, strong instruction following, and minimal hallucinations. It supports text and image input, a 1 million token context window, configurable reasoning effort including non-reasoning mode, function calling, and structured outputs for production assistants, coding workflows, and long-context analysis.
xAI Text-to-Speech converts text into natural-sounding spoken audio with a single API call. It offers five expressive voices (Eve, Ara, Leo, Rex, and Sal), inline speech tags for fine-grained control over pauses, laughter, whispers, and emphasis, and supports over 20 auto-detected languages.
Grok Imagine Image Pro is the higher quality variant of the Grok Imagine image model developed by xAI. It generates detailed images from text prompts and supports iterative editing of existing images through natural language instructions. The model provides stronger prompt adherence, improved rendering quality, and more reliable control over composition, style, and aspect ratio. It supports multiple image styles and resolutions up to 2K, enabling workflows for design, illustration, and creative prototyping.
Grok Imagine Image is a multimodal generative image model that creates high-quality still images from text prompts or image inputs. It supports flexible visual synthesis across a range of styles, enabling developers to generate creative imagery directly from structured prompts or to expand on existing visuals with coherent, detailed outputs.
Grok Imagine Video is a multimodal generative video model that produces short video clips with native audio from text descriptions or static images. It supports text-to-video and image-to-video generation with synchronized sound effects and dialogue, enabling developers to animate scenes with motion, camera dynamics, and audio in a single API workflow.





