xAI

Access xAI's Grok Imagine models for image and video generation through Runware's unified API. Learn about multimodal capabilities, synchronized audio generation, and provider-specific settings.

Introduction

xAI's Grok Imagine models are integrated into the Runware platform through our unified API, providing access to advanced multimodal generation technology for both images and videos. Built for creative workflows requiring fast inference and flexible visual synthesis, Grok Imagine excels at text-to-image generation, image editing, and video creation with synchronized audio capabilities.

The Grok Imagine suite enables developers to generate high-quality visual content from text prompts or existing media, supporting dynamic content creation, rapid prototyping, and automated visual asset generation for modern AI products.

Image models

Grok Imagine Image

xAI's Grok Imagine Image model creates high-quality still images from text prompts or image inputs, supporting flexible visual synthesis across a range of styles with coherent, detailed outputs.

Model AIR ID: xai:grok-imagine@image.

Supported workflows: Text-to-image, image-to-image.

Technical specifications:

  • Positive prompt: 1+ characters.
  • Supported dimensions: 1024×1024 (1:1), 896×1280 (3:4), 1280×896 (4:3), 768×1408 (9:16), 1408×768 (16:9), 864×1296 (2:3), 1296×864 (3:2), 576×1248 (9:19.5), 1248×576 (19.5:9), 576×1280 (9:20), 1280×576 (20:9), 704×1408 (1:2), 1408×704 (2:1).
  • Dimension behavior:
    • Text-to-image: Specify explicit width and height from the supported dimensions above.
    • Image-to-image: Two options available:
      • Specify width and height explicitly for precise control.
      • Use resolution parameter (1k) to automatically match the aspect ratio from the first reference image.
  • Reference images: Supports up to 3 images via referenceImages.
  • Resolution: 1k (default: 1k).
{
  "taskType": "imageInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d7",
  "model": "xai:grok-imagine@image",
  "positivePrompt": "A futuristic cityscape at sunset with flying vehicles and neon lights",
  "width": 1408,
  "height": 768
}
{
  "taskType": "imageInference",
  "taskUUID": "6ba7b833-9dad-11d1-80b4-00c04fd430c8",
  "model": "xai:grok-imagine@image",
  "inputs": {
    "referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
  },
  "positivePrompt": "Transform into a cyberpunk style with enhanced neon elements",
  "resolution": "1k"
}
{
  "taskType": "imageInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440015",
  "model": "xai:grok-imagine@image",
  "positivePrompt": "Professional portrait photography with dramatic lighting and shallow depth of field",
  "width": 768,
  "height": 1408
}

Grok Imagine Image Pro

Grok Imagine Image Pro is the higher quality variant of the Grok Imagine image model. It generates detailed images from text prompts and supports iterative editing of existing images through natural language instructions. The model provides stronger prompt adherence, improved rendering quality, and more reliable control over composition, style, and aspect ratio, supporting resolutions up to 2K.

Model AIR ID: xai:grok-imagine@image-pro.

Supported workflows: Text-to-image, image-to-image.

Technical specifications:

  • Positive prompt: 1+ characters (required).
  • Supported dimensions:
    • 1K: 1024×1024 (1:1), 896×1280 (3:4), 1280×896 (4:3), 768×1408 (9:16), 1408×768 (16:9), 864×1296 (2:3), 1296×864 (3:2), 576×1248 (9:19.5), 1248×576 (19.5:9), 576×1280 (9:20), 1280×576 (20:9), 704×1408 (1:2), 1408×704 (2:1).
    • 2K: 2048×2048 (1:1), 1792×2560 (3:4), 2560×1792 (4:3), 1536×2816 (9:16), 2816×1536 (16:9), 1728×2592 (2:3), 2592×1728 (3:2), 1152×2496 (9:19.5), 2496×1152 (19.5:9), 1152×2560 (9:20), 2560×1152 (20:9), 1408×2816 (1:2), 2816×1408 (2:1).
  • Dimension behavior:
    • Text-to-image: Specify explicit width and height, or use resolution (1K or 2K) to select a default square dimension.
    • Image-to-image: Dimensions are automatically inferred from the reference image. Do not specify width, height, or resolution.
  • Reference images: Supports up to 1 image via referenceImages.
  • Resolution: 1K or 2K (default: 1K). Cannot be combined with explicit width/height.
{
  "taskType": "imageInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "model": "xai:grok-imagine@image-pro",
  "positivePrompt": "A cinematic landscape photograph of misty mountains at golden hour with dramatic cloud formations",
  "width": 1408,
  "height": 768
}
{
  "taskType": "imageInference",
  "taskUUID": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
  "model": "xai:grok-imagine@image-pro",
  "positivePrompt": "Ultra-detailed product photography of a luxury watch on black marble with studio lighting",
  "width": 2048,
  "height": 2048
}
{
  "taskType": "imageInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440000",
  "model": "xai:grok-imagine@image-pro",
  "inputs": {
    "referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
  },
  "positivePrompt": "Transform the background into a vibrant sunset scene while preserving the subject"
}
{
  "taskType": "imageInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da7",
  "model": "xai:grok-imagine@image-pro",
  "positivePrompt": "Professional editorial photography with contemporary styling and precise color grading",
  "resolution": "2K"
}

Video models

Grok Imagine Video

xAI's Grok Imagine Video model produces short video clips with native audio from text descriptions or static images, supporting synchronized sound effects and dialogue in a single generation workflow.

Model AIR ID: xai:grok-imagine@video.

Supported workflows: Text-to-video, image-to-video, video-to-video.

Technical specifications:

  • Positive prompt: 1+ characters.
  • Supported dimensions:
    • 480p: 480×480 (1:1), 848×480 (16:9), 480×848 (9:16), 640×480 (4:3), 480×640 (3:4), 720×480 (3:2), 480×720 (2:3).
    • 720p: 720×720 (1:1), 1280×720 (16:9), 720×1280 (9:16), 960×720 (4:3), 720×960 (3:4), 720×480 (3:2), 480×720 (2:3).
  • Dimension behavior:
    • Text-to-video: Specify explicit width and height from the supported dimensions above.
    • Image-to-video: Two options available:
      • Specify width and height explicitly for precise control.
      • Use resolution parameter (480p or 720p) to automatically match the aspect ratio from the first frame image.
  • Duration: 1-15 seconds (default: 6).
  • Resolution: 480p or 720p (default: 480p).
  • Frame images: Supports first frame for frameImages.
  • Reference videos: Supports video-to-video editing (MP4 format, maximum 8.7 seconds).

Grok Imagine Video generates videos with synchronized audio, including sound effects and environmental audio that matches the visual content.

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
  "model": "xai:grok-imagine@video",
  "positivePrompt": "A serene ocean wave crashing on a beach at sunrise with ambient sound",
  "width": 1280,
  "height": 720,
  "duration": 6
}
{
  "taskType": "videoInference",
  "taskUUID": "6ba7b834-9dad-11d1-80b4-00c04fd430c8",
  "model": "xai:grok-imagine@video",
  "inputs": {
    "frameImages": [
      {
        "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
        "frame": "first"
      }
    ]
  },
  "positivePrompt": "Animate this scene with gentle camera movement and natural environmental sounds",
  "duration": 8,
  "resolution": "720p"
}
{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440016",
  "model": "xai:grok-imagine@video",
  "inputs": {
    "referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
  },
  "positivePrompt": "Add dramatic lighting effects and enhance the atmospheric audio"
}
{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da7",
  "model": "xai:grok-imagine@video",
  "positivePrompt": "A time-lapse of clouds moving across a mountain landscape with wind sounds",
  "width": 848,
  "height": 480,
  "duration": 15
}