xAI

Access xAI's Grok Imagine models for image and video generation through Runware's unified API. Learn about multimodal capabilities, synchronized audio generation, and provider-specific settings.

Introduction

xAI's Grok Imagine models are integrated into the Runware platform through our unified API, providing access to advanced multimodal generation technology for both images and videos. Built for creative workflows requiring fast inference and flexible visual synthesis, Grok Imagine excels at text-to-image generation, image editing, and video creation with synchronized audio capabilities.

The Grok Imagine suite enables developers to generate high-quality visual content from text prompts or existing media, supporting dynamic content creation, rapid prototyping, and automated visual asset generation for modern AI products.

Image models

Grok Imagine Image

xAI's Grok Imagine Image model creates high-quality still images from text prompts or image inputs, supporting flexible visual synthesis across a range of styles with coherent, detailed outputs.

Model AIR ID: xai:grok-imagine@image.

Supported workflows: Text-to-image, image-to-image.

Technical specifications:

  • Positive prompt: 1+ characters.
  • Supported dimensions: 1024×1024 (1:1), 896×1280 (3:4), 1280×896 (4:3), 768×1408 (9:16), 1408×768 (16:9), 864×1296 (2:3), 1296×864 (3:2), 576×1248 (9:19.5), 1248×576 (19.5:9), 576×1280 (9:20), 1280×576 (20:9), 704×1408 (1:2), 1408×704 (2:1).
  • Dimension behavior:
    • Text-to-image: Specify explicit width and height from the supported dimensions above.
    • Image-to-image: Two options available:
      • Specify width and height explicitly for precise control.
      • Use resolution parameter (1k) to automatically match the aspect ratio from the first reference image.
  • Reference images: Supports up to 1 image via referenceImages.
  • Resolution: 1k (default: 1k).
{
  "taskType": "imageInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d7",
  "model": "xai:grok-imagine@image",
  "positivePrompt": "A futuristic cityscape at sunset with flying vehicles and neon lights",
  "width": 1408,
  "height": 768
}
{
  "taskType": "imageInference",
  "taskUUID": "6ba7b833-9dad-11d1-80b4-00c04fd430c8",
  "model": "xai:grok-imagine@image",
  "inputs": {
    "referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
  },
  "positivePrompt": "Transform into a cyberpunk style with enhanced neon elements",
  "resolution": "1k"
}
{
  "taskType": "imageInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440015",
  "model": "xai:grok-imagine@image",
  "positivePrompt": "Professional portrait photography with dramatic lighting and shallow depth of field",
  "width": 768,
  "height": 1408
}

Video models

Grok Imagine Video

xAI's Grok Imagine Video model produces short video clips with native audio from text descriptions or static images, supporting synchronized sound effects and dialogue in a single generation workflow.

Model AIR ID: xai:grok-imagine@video.

Supported workflows: Text-to-video, image-to-video, video-to-video.

Technical specifications:

  • Positive prompt: 1+ characters.
  • Supported dimensions:
    • 480p: 480×480 (1:1), 848×480 (16:9), 480×848 (9:16), 640×480 (4:3), 480×640 (3:4), 720×480 (3:2), 480×720 (2:3).
    • 720p: 720×720 (1:1), 1280×720 (16:9), 720×1280 (9:16), 960×720 (4:3), 720×960 (3:4), 720×480 (3:2), 480×720 (2:3).
  • Dimension behavior:
    • Text-to-video: Specify explicit width and height from the supported dimensions above.
    • Image-to-video: Two options available:
      • Specify width and height explicitly for precise control.
      • Use resolution parameter (480p or 720p) to automatically match the aspect ratio from the first frame image.
  • Duration: 1-15 seconds (default: 6).
  • Resolution: 480p or 720p (default: 480p).
  • Frame images: Supports first frame for frameImages.
  • Reference videos: Supports video-to-video editing (MP4 format, maximum 8.7 seconds).

Grok Imagine Video generates videos with synchronized audio, including sound effects and environmental audio that matches the visual content.

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
  "model": "xai:grok-imagine@video",
  "positivePrompt": "A serene ocean wave crashing on a beach at sunrise with ambient sound",
  "width": 1280,
  "height": 720,
  "duration": 6
}
{
  "taskType": "videoInference",
  "taskUUID": "6ba7b834-9dad-11d1-80b4-00c04fd430c8",
  "model": "xai:grok-imagine@video",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "positivePrompt": "Animate this scene with gentle camera movement and natural environmental sounds",
  "duration": 8,
  "resolution": "720p"
}
{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440016",
  "model": "xai:grok-imagine@video",
  "inputs": {
    "referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
  },
  "positivePrompt": "Add dramatic lighting effects and enhance the atmospheric audio"
}
{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da7",
  "model": "xai:grok-imagine@video",
  "positivePrompt": "A time-lapse of clouds moving across a mountain landscape with wind sounds",
  "width": 848,
  "height": 480,
  "duration": 15
}