xAI
Introduction
xAI's Grok Imagine models are integrated into the Runware platform through our unified API, providing access to advanced multimodal generation technology for both images and videos. Built for creative workflows requiring fast inference and flexible visual synthesis, Grok Imagine excels at text-to-image generation, image editing, and video creation with synchronized audio capabilities.
The Grok Imagine suite enables developers to generate high-quality visual content from text prompts or existing media, supporting dynamic content creation, rapid prototyping, and automated visual asset generation for modern AI products.
Image models
Grok Imagine Image
xAI's Grok Imagine Image model creates high-quality still images from text prompts or image inputs, supporting flexible visual synthesis across a range of styles with coherent, detailed outputs.
Model AIR ID: xai:grok-imagine@image.
Supported workflows: Text-to-image, image-to-image.
Technical specifications:
- Positive prompt: 1+ characters.
- Supported dimensions: 1024×1024 (1:1), 896×1280 (3:4), 1280×896 (4:3), 768×1408 (9:16), 1408×768 (16:9), 864×1296 (2:3), 1296×864 (3:2), 576×1248 (9:19.5), 1248×576 (19.5:9), 576×1280 (9:20), 1280×576 (20:9), 704×1408 (1:2), 1408×704 (2:1).
- Dimension behavior:
- Text-to-image: Specify explicit
widthandheightfrom the supported dimensions above. - Image-to-image: Two options available:
- Specify
widthandheightexplicitly for precise control. - Use
resolutionparameter (1k) to automatically match the aspect ratio from the first reference image.
- Specify
- Text-to-image: Specify explicit
- Reference images: Supports up to 1 image via
referenceImages. - Resolution:
1k(default:1k).
{
"taskType": "imageInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d7",
"model": "xai:grok-imagine@image",
"positivePrompt": "A futuristic cityscape at sunset with flying vehicles and neon lights",
"width": 1408,
"height": 768
}{
"taskType": "imageInference",
"taskUUID": "6ba7b833-9dad-11d1-80b4-00c04fd430c8",
"model": "xai:grok-imagine@image",
"inputs": {
"referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
},
"positivePrompt": "Transform into a cyberpunk style with enhanced neon elements",
"resolution": "1k"
}{
"taskType": "imageInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440015",
"model": "xai:grok-imagine@image",
"positivePrompt": "Professional portrait photography with dramatic lighting and shallow depth of field",
"width": 768,
"height": 1408
}Video models
Grok Imagine Video
xAI's Grok Imagine Video model produces short video clips with native audio from text descriptions or static images, supporting synchronized sound effects and dialogue in a single generation workflow.
Model AIR ID: xai:grok-imagine@video.
Supported workflows: Text-to-video, image-to-video, video-to-video.
Technical specifications:
- Positive prompt: 1+ characters.
- Supported dimensions:
- 480p: 480×480 (1:1), 848×480 (16:9), 480×848 (9:16), 640×480 (4:3), 480×640 (3:4), 720×480 (3:2), 480×720 (2:3).
- 720p: 720×720 (1:1), 1280×720 (16:9), 720×1280 (9:16), 960×720 (4:3), 720×960 (3:4), 720×480 (3:2), 480×720 (2:3).
- Dimension behavior:
- Text-to-video: Specify explicit
widthandheightfrom the supported dimensions above. - Image-to-video: Two options available:
- Specify
widthandheightexplicitly for precise control. - Use
resolutionparameter (480por720p) to automatically match the aspect ratio from the first frame image.
- Specify
- Text-to-video: Specify explicit
- Duration: 1-15 seconds (default: 6).
- Resolution:
480por720p(default:480p). - Frame images: Supports first frame for
frameImages. - Reference videos: Supports video-to-video editing (MP4 format, maximum 8.7 seconds).
Grok Imagine Video generates videos with synchronized audio, including sound effects and environmental audio that matches the visual content.
{
"taskType": "videoInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
"model": "xai:grok-imagine@video",
"positivePrompt": "A serene ocean wave crashing on a beach at sunrise with ambient sound",
"width": 1280,
"height": 720,
"duration": 6
}{
"taskType": "videoInference",
"taskUUID": "6ba7b834-9dad-11d1-80b4-00c04fd430c8",
"model": "xai:grok-imagine@video",
"frameImages": [
{
"inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
"frame": "first"
}
],
"positivePrompt": "Animate this scene with gentle camera movement and natural environmental sounds",
"duration": 8,
"resolution": "720p"
}{
"taskType": "videoInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440016",
"model": "xai:grok-imagine@video",
"inputs": {
"referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
},
"positivePrompt": "Add dramatic lighting effects and enhance the atmospheric audio"
}{
"taskType": "videoInference",
"taskUUID": "a770f077-f413-47de-9dac-be0b26a35da7",
"model": "xai:grok-imagine@video",
"positivePrompt": "A time-lapse of clouds moving across a mountain landscape with wind sounds",
"width": 848,
"height": 480,
"duration": 15
}