Alibaba

Access Alibaba's AI models including Wan for video generation through Runware's unified API. Learn about Alibaba-specific parameters and multimodal capabilities.

Introduction

Alibaba Cloud's AI models are integrated into the Runware platform through our unified API, providing access to advanced generative capabilities across language, vision, and video domains. The Wan model family specializes in video generation with support for multi-shot sequencing, native audio, and strong temporal consistency.

Through the providerSettings.alibaba object, you can access Alibaba-specific features such as prompt extension, automatic audio generation, and multi-shot composition, while maintaining the consistency of Runware's standard API structure. This page documents the technical specifications, parameter requirements, and provider-specific settings for all Alibaba models available through our platform.

Image models

providerSettings » alibaba
alibaba
object

Configuration object for Alibaba-specific image generation settings. These parameters provide control over prompt enhancement for Wan image models.

View example
{
  "taskType": "imageInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6",
  "model": "alibaba:wan@2.5-image",
  "positivePrompt": "A cinematic still with rich detail",
  "width": 1280,
  "height": 1280,
  "providerSettings": {
    "alibaba": {
      "promptExtend": true
    }
  }
}
Properties ⁨1⁩ property
providerSettings » alibaba » promptExtend
promptExtend
boolean Default: true

Enables LLM-based prompt rewriting to improve generation quality by expanding and clarifying the input prompt. When enabled, the system analyzes and enhances the prompt to produce more detailed and coherent video output.

Enabling prompt extension increases generation time but typically results in higher quality output with better scene composition and narrative flow.

Wan2.5-Preview Image

Alibaba's Wan2.5-Preview Image delivers high-fidelity single frame generation built from the Wan2.5 video architecture. This model focuses on detailed depth structure, strong prompt following, multilingual text rendering, and video-grade visual quality for production-ready stills.

Model AIR ID: alibaba:wan@2.5-image.

Supported workflows: Text-to-image.

Technical specifications:

  • Positive prompt: 1-2000 characters (supports English and Chinese).
  • Negative prompt: 1-500 characters (optional).
  • Supported dimensions: Minimum 768×768 total pixels (589,824), maximum 1440×1440 total pixels (2,073,600), aspect ratio between 1:4 and 4:1 (default: 1280×1280).

Provider-specific settings:

Parameters supported: promptExtend.

{
  "taskType": "imageInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
  "model": "alibaba:wan@2.5-image",
  "positivePrompt": "A cinematic still of a dramatic landscape with detailed depth structure and rich atmospheric lighting",
  "width": 1280,
  "height": 1280,
  "providerSettings": {
    "alibaba": {
      "promptExtend": true
    }
  }
}

Video models

providerSettings » alibaba
alibaba
object

Configuration object for Alibaba-specific video generation settings. These parameters provide control over prompt enhancement, audio generation, and shot composition for Wan video models.

View example
{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6",
  "model": "alibaba:wan@2.6",
  "positivePrompt": "A cinematic scene with multiple shots",
  "duration": 10,
  "width": "1920",
  "height": "1080",
  "providerSettings": {
    "alibaba": {
      "promptExtend": true,
      "audio": true,
      "shotType": "multi"
    }
  }
}
Properties ⁨3⁩ properties
providerSettings » alibaba » promptExtend
promptExtend
boolean Default: true

Enables LLM-based prompt rewriting to improve generation quality by expanding and clarifying the input prompt. When enabled, the system analyzes and enhances the prompt to produce more detailed and coherent video output.

Enabling prompt extension increases generation time but typically results in higher quality output with better scene composition and narrative flow.

providerSettings » alibaba » audio
audio
boolean Default: true

Controls automatic audio generation for the video. When enabled, the model generates native audio that aligns with the visual content and scene progression.

This parameter is ignored if custom audio is provided via inputs.audio.

providerSettings » alibaba » shotType
shotType
"single" | "multi" Default: single

Determines the shot composition style for the generated video. This parameter controls whether the video is generated as a continuous single shot or as multiple shots with transitions.

Available values:

  • single: Generate video as a continuous single shot.
  • multi: Generate video with multiple shots and transitions between them.

This parameter only takes effect when promptExtend is set to true. Multi-shot composition works best with prompts that explicitly describe shot changes or scene transitions.

Wan2.5-Preview

Alibaba's Wan2.5-Preview model represents a research preview of multimodal video generation with native audio support. This model offers strong prompt adherence, smooth motion, and multilingual audio capabilities for narrative scenes up to 10 seconds, making it suitable for short-form storytelling and creative video workflows.

Model AIR ID: alibaba:wan@2.5-preview.

Supported workflows: Text-to-video, image-to-video, audio-to-video.

Technical specifications:

  • Positive prompt: 1-2000 characters (supports English and Chinese).
  • Negative prompt: 1-500 characters (optional).
  • Frame images: Supports first frame via inputs.frameImages (image-to-video only).
  • Audio input: Supports custom audio via inputs.audio.
  • Supported dimensions:
    • 480p: 854×480 (16:9), 480×854 (9:16), 640×640 (1:1).
    • 720p: 1280×720 (16:9), 720×1280 (9:16), 960×960 (1:1), 1088×832 (17:13), 832×1088 (13:17).
    • 1080p: 1920×1080 (16:9), 1080×1920 (9:16), 1440×1440 (1:1), 1632×1248 (17:13), 1248×1632 (13:17).
  • Dimension behavior:
    • Text-to-video: Specify explicit width and height from the supported dimensions above.
    • Image-to-video: Two options available:
      • Specify width and height explicitly for precise control.
      • Use resolution parameter (480p, 720p, or 1080p) to automatically match the aspect ratio from the first frame image.
  • Duration: 5 or 10 seconds (default: 5).
  • Input image requirements: 360-2000 pixels, 10MB file size limit.
  • Audio requirements: WAV/MP3, 3-30 seconds duration, 15MB file size limit.

Provider-specific settings:

Parameters supported: promptExtend, audio.

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
  "model": "alibaba:wan@2.5-preview",
  "positivePrompt": "A cinematic narrative scene with smooth character movement and atmospheric storytelling",
  "duration": 10,
  "width": 1920,
  "height": 1080,
  "providerSettings": {
    "alibaba": {
      "promptExtend": true,
      "audio": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "6ba7b834-9dad-11d1-80b4-00c04fd430c8",
  "model": "alibaba:wan@2.5-preview",
  "inputs": {
    "frameImages": [
      {
        "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
        "frame": "first"
      }
    ]
  },
  "positivePrompt": "The character begins to move naturally through the scene with smooth motion",
  "duration": 5,
  "resolution": "720p",
  "providerSettings": {
    "alibaba": {
      "audio": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440016",
  "model": "alibaba:wan@2.5-preview",
  "positivePrompt": "Visual narrative synchronized with the provided audio track",
  "inputs": {
    "audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
  },
  "duration": 10,
  "width": 1280,
  "height": 720,
  "providerSettings": {
    "alibaba": {
      "promptExtend": true
    }
  }
}

Wan2.6

Alibaba's Wan2.6 model delivers multimodal video generation with native audio support and multi-shot sequencing capabilities. This model emphasizes temporal stability, consistent visual structure across shots, and reliable alignment between visuals and audio for short-form narrative video production.

Model AIR ID: alibaba:wan@2.6.

Supported workflows: Text-to-video, image-to-video, reference-to-video.

Technical specifications:

  • Positive prompt: 1-1500 characters (supports English and Chinese).
  • Negative prompt: 1-500 characters (optional).
  • Frame images: Supports first frame via inputs.frameImages (image-to-video only).
  • Reference videos: Supports up to 3 videos via inputs.referenceVideos (reference-to-video only).
  • Audio input: Supports custom audio via inputs.audio.
  • Supported dimensions:
    • 720p: 1280×720 (16:9), 720×1280 (9:16), 960×960 (1:1), 1088×832 (17:13), 832×1088 (13:17).
    • 1080p: 1920×1080 (16:9), 1080×1920 (9:16), 1440×1440 (1:1), 1632×1248 (17:13), 1248×1632 (13:17).
  • Dimension behavior:
    • Text-to-video: Specify explicit width and height from the supported dimensions above.
    • Image-to-video: Two options available:
      • Specify width and height explicitly for precise control.
      • Use resolution parameter (720p or 1080p) to automatically match the aspect ratio from the first frame image.
  • Duration: 5, 10, or 15 seconds (default: 5).
  • Input image requirements: 360-2000 pixels, 10MB file size limit.
  • Reference video requirements: Maximum 30MB per video.
  • Audio requirements: WAV/MP3, 3-30 seconds duration, 15MB file size limit.

Reference videos cannot be used together with frame images. Choose either image-to-video or reference-to-video workflow.

Provider-specific settings:

Parameters supported: promptExtend, audio, shotType.

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d7",
  "model": "alibaba:wan@2.6",
  "positivePrompt": "A cinematic chase through a rain-soaked city, opening with a wide street shot, cutting to a close-up of footsteps splashing through puddles, followed by an overhead tracking shot",
  "duration": 10,
  "width": "1920",
  "height": "1080",
  "providerSettings": {
    "alibaba": {
      "promptExtend": true,
      "audio": true,
      "shotType": "multi"
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "6ba7b833-9dad-11d1-80b4-00c04fd430c8",
  "model": "alibaba:wan@2.6",
  "inputs": {
    "frameImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
  },
  "positivePrompt": "The scene comes alive with gentle movement and atmospheric effects",
  "duration": 5,
  "resolution": "720p",
  "providerSettings": {
    "alibaba": {
      "audio": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440015",
  "model": "alibaba:wan@2.6",
  "inputs": {
    "referenceVideos": [
      "c64351d5-4c59-42f7-95e1-eace013eddab",
      "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"
    ]
  },
  "positivePrompt": "character1 walks through a forest while character2 follows behind, maintaining their visual characteristics and movement styles",
  "duration": 15,
  "width": "1920",
  "height": "1080",
  "providerSettings": {
    "alibaba": {
      "promptExtend": true,
      "audio": true,
      "shotType": "single"
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d489",
  "model": "alibaba:wan@2.6",
  "inputs": {
    "audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
  },
  "positivePrompt": "A dramatic scene with synchronized visuals matching the provided audio track",
  "duration": 10,
  "width": "1920",
  "height": "1080",
  "providerSettings": {
    "alibaba": {
      "promptExtend": true
    }
  }
}