Alibaba

Access Alibaba's AI models including Wan for video generation through Runware's unified API. Learn about Alibaba-specific parameters and multimodal capabilities.

Introduction

Alibaba Cloud's AI models are integrated into the Runware platform through our unified API, providing access to advanced generative capabilities across language, vision, and video domains. The Wan model family specializes in video generation with support for multi-shot sequencing, native audio, and strong temporal consistency.

Through the providerSettings.alibaba object, you can access Alibaba-specific features such as prompt extension, automatic audio generation, and multi-shot composition, while maintaining the consistency of Runware's standard API structure. This page documents the technical specifications, parameter requirements, and provider-specific settings for all Alibaba models available through our platform.

Image models

providerSettings » alibaba alibaba object

Configuration object for Alibaba-specific image generation settings. These parameters provide control over prompt enhancement for Wan image models.

Example 1 example

{
  "taskType": "imageInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6",
  "model": "alibaba:wan@2.5-image",
  "positivePrompt": "A cinematic still with rich detail",
  "width": 1280,
  "height": 1280,
  "providerSettings": {
    "alibaba": {
      "promptExtend": true
    }
  }
}

Properties 1 property

providerSettings » alibaba » promptExtend promptExtend boolean Default: true: Enables LLM-based prompt rewriting to improve generation quality by expanding and clarifying the input prompt. When enabled, the system analyzes and enhances the prompt to produce more detailed and coherent video output.

Enabling prompt extension increases generation time but typically results in higher quality output with better scene composition and narrative flow.

Qwen-Image-2.0

Alibaba's Qwen-Image-2.0 is an advanced unified model for image generation and editing that produces high-quality images at native 2K resolution with professional-grade text rendering. This model excels at generating complex textual content within images, making it ideal for infographics, posters, and layout-driven visuals with strong semantic understanding and detailed prompt adherence.

Model AIR ID: runware:qwen-image@2.0.

Supported workflows: Text-to-image, image-to-image.

Technical specifications:

Positive prompt: 2-2000 characters.
Negative prompt: 2-500 characters (text-to-image only).
Reference images: Supports up to 3 images via referenceImages.
Supported dimensions: Freely customizable within total area limit of 2,097,152 pixels (2048×1024 equivalent), width and height in 1-pixel increments.

Advanced generation settings:

settings.promptExtend (boolean, default: true): Enables automatic prompt expansion to improve quality. Adds 3-5 seconds latency. Disable for detailed prompts or latency-sensitive workflows.

There is currently a temporary backend validation limiting width and height to a maximum value of 2048.

This restriction will be removed in the next deployment.

The model supports any dimensions within the total area limit of 2,097,152 pixels, as described above.

Text-to-image with typography

{
  "taskType": "imageInference",
  "taskUUID": "6ba7b834-9dad-11d1-80b4-00c04fd430c8",
  "model": "runware:qwen-image@2.0",
  "positivePrompt": "Create a professional business infographic titled 'QUARTERLY GROWTH 2025' with bar charts, percentage indicators, and clean typography on light background",
  "negativePrompt": "blurry text, distorted letters, low quality",
  "width": 1920,
  "height": 1080,
  "numberResults": 2,
  "settings": {
    "promptExtend": true
  }
}

Image editing

{
  "taskType": "imageInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440015",
  "model": "runware:qwen-image@2.0",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab"
    ]
  },
  "positivePrompt": "Change the background to a modern office environment while keeping the subject and text intact",
  "width": 1024,
  "height": 1024
}

Poster design

{
  "taskType": "imageInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35daa",
  "model": "runware:qwen-image@2.0",
  "positivePrompt": "Design a concert poster with bold text 'LIVE MUSIC FESTIVAL' featuring geometric patterns, vibrant colors, and professional layout with event details",
  "width": 1200,
  "height": 1600,
  "seed": 42,
  "settings": {
    "promptExtend": false
  }
}

Qwen-Image-2.0-Pro

Alibaba's Qwen-Image-2.0-Pro enhances the base model with optimized visual fidelity, improved layout and typography handling, and advanced editing control for professional creative workflows. This model delivers richer detail, more accurate text and iconography rendering, making it suitable for advertising, branding, design systems, and high-impact visual content.

Model AIR ID: runware:qwen-image@2.0-pro.

Supported workflows: Text-to-image, image-to-image.

Technical specifications:

Positive prompt: 2-2000 characters.
Negative prompt: 2-500 characters (text-to-image only).
Reference images: Supports up to 3 images via referenceImages.
Supported dimensions: Freely customizable within total area limit of 2,097,152 pixels (2048×1024 equivalent), width and height in 1-pixel increments.

Advanced generation settings:

settings.promptExtend (boolean, default: true): Enables automatic prompt expansion to improve quality. Adds 3-5 seconds latency. Disable for detailed prompts or latency-sensitive workflows.

There is currently a temporary backend validation limiting width and height to a maximum value of 2048.

This restriction will be removed in the next deployment.

The model supports any dimensions within the total area limit of 2,097,152 pixels, as described above.

Professional branding

{
  "taskType": "imageInference",
  "taskUUID": "6ba7b835-9dad-11d1-80b4-00c04fd430c8",
  "model": "runware:qwen-image@2.0-pro",
  "positivePrompt": "Create a luxury brand advertisement with elegant text 'PRESTIGE COLLECTION' featuring refined typography, gold accents, and sophisticated composition",
  "negativePrompt": "cluttered, poor spacing, amateur design",
  "width": 2048,
  "height": 1024,
  "numberResults": 3,
  "settings": {
    "promptExtend": true
  }
}

Advanced editing with multiple references

{
  "taskType": "imageInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440016",
  "model": "runware:qwen-image@2.0-pro",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab",
      "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f",
      "454639ca-4717-4f8b-a031-b593e96b8cd4"
    ]
  },
  "positivePrompt": "Blend these product shots into a unified catalog layout with consistent lighting and professional presentation",
  "width": 1600,
  "height": 1200
}

High-detail iconography

{
  "taskType": "imageInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35dab",
  "model": "runware:qwen-image@2.0-pro",
  "positivePrompt": "Design a mobile app interface mockup with crisp icons, clear navigation labels, and modern UI elements with perfect pixel alignment",
  "width": 1080,
  "height": 1920,
  "seed": 12345,
  "settings": {
    "promptExtend": false
  }
}

Wan2.5-Preview Image

Alibaba's Wan2.5-Preview Image delivers high-fidelity single frame generation built from the Wan2.5 video architecture. This model focuses on detailed depth structure, strong prompt following, multilingual text rendering, and video-grade visual quality for production-ready stills.

Model AIR ID: alibaba:wan@2.5-image.

Supported workflows: Text-to-image.

Technical specifications:

Positive prompt: 1-2000 characters (supports English and Chinese).
Negative prompt: 1-500 characters (optional).
Supported dimensions: Minimum 768×768 total pixels (589,824), maximum 1440×1440 total pixels (2,073,600), aspect ratio between 1:4 and 4:1 (default: 1280×1280).

Provider-specific settings:

Parameters supported: promptExtend.

{
  "taskType": "imageInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
  "model": "alibaba:wan@2.5-image",
  "positivePrompt": "A cinematic still of a dramatic landscape with detailed depth structure and rich atmospheric lighting",
  "width": 1280,
  "height": 1280,
  "providerSettings": {
    "alibaba": {
      "promptExtend": true
    }
  }
}

Wan2.6 Image

Alibaba's Wan2.6 Image is a single-frame image generation model derived from the Wan2.6 multimodal video architecture. It focuses on strong prompt adherence, clean spatial structure, and visually coherent results, delivering video-grade image quality for creative, editorial, and product-oriented workflows.

Model AIR ID: alibaba:wan@2.6-image.

Supported workflows: Text-to-image, image-to-image, reference-to-image, image-editing.

Technical specifications:

Positive prompt: 1-2100 characters (supports English and Chinese).
Negative prompt: 1-500 characters (optional).
Supported dimensions: Minimum 1280×1280 total pixels (1,638,400), maximum 1440×1440 total pixels (2,073,600), aspect ratio between 1:4 and 4:1 (default: 1280×1280).
- Recommended resolutions: 1280×1280 (1:1), 1280×720 (16:9), 720×1280 (9:16), 1280×960 (4:3), 960×1280 (3:4), 1200×800 (3:2), 800×1200 (2:3), 1344×576 (21:9).
Reference images: Supports up to 4 images via inputs.referenceImages.

Provider-specific settings:

Parameters supported: promptExtend.

Text-to-image

{
  "taskType": "imageInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d9",
  "model": "alibaba:wan@2.6-image",
  "positivePrompt": "A professional product photograph with clean spatial composition and precise lighting for editorial use",
  "width": 1280,
  "height": 1280,
  "providerSettings": {
    "alibaba": {
      "promptExtend": true
    }
  }
}

Image-to-image

{
  "taskType": "imageInference",
  "taskUUID": "6ba7b833-9dad-11d1-80b4-00c04fd430c8",
  "model": "alibaba:wan@2.6-image",
  "inputs": {
    "referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
  },
  "positivePrompt": "Transform this image while maintaining strong spatial structure and visual coherence",
  "width": 1280,
  "height": 720
}

Multi-reference composition

{
  "taskType": "imageInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440015",
  "model": "alibaba:wan@2.6-image",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab",
      "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f",
      "e8f9a0b1-3c6d-4e8f-b2e4-0d9e8f7c6b5a"
    ]
  },
  "positivePrompt": "Combine these elements with video-grade quality and coherent visual structure",
  "width": 960,
  "height": 1280,
  "providerSettings": {
    "alibaba": {
      "promptExtend": true
    }
  }
}

Video models

advancedFeatures » wanAnimate wanAnimate object

Configuration object for Wan2.2 Animate character animation and replacement features. These parameters control animation strategies, pose retargeting, and temporal consistency for character-focused video generation.

Example 1 example

{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6",
  "model": "alibaba:wan@2.2-animate",
  "inputs": {
    "referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"],
    "referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
  },
  "advancedFeatures": {
    "wanAnimate": {
      "mode": "animate",
      "retargetPose": true,
      "prevSegCondFrames": 3
    }
  }
}

Properties 3 properties

advancedFeatures » wanAnimate » mode mode "animate" | "replace" Default: animate

Selects the animation strategy for character generation and integration into video footage.

Available values:

animate: Uses pose detection from the reference image and video, with optional skeleton retargeting to adjust the reference pose to match video movements. Ideal for bringing static characters to life with natural motion.
replace: Uses pose detection and segmentation models to determine character pose and shape, then replaces the character in the video while preserving background and motion. Best for character substitution in existing footage.

advancedFeatures » wanAnimate » retargetPose retargetPose boolean Default: false: Retargets the pose of the video to match the reference image's initial pose, with bone positions (notably hands) adjusted according to video movements. This creates more natural alignment between the reference character's pose and the video motion.

This parameter is only supported in animate mode and has no effect when using replace mode.

advancedFeatures » wanAnimate » prevSegCondFrames prevSegCondFrames integer Min: 1 Max: 5 Default: 1: Number of frames taken from the previous segment to maintain temporal consistency across video segments. Higher values improve visual continuity between segments but increase inference time, while lower values reduce consistency but generate faster.

providerSettings » alibaba alibaba object

Configuration object for Alibaba-specific video generation settings. These parameters provide control over prompt enhancement, audio generation, and shot composition for Wan video models.

Example 1 example

{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6",
  "model": "alibaba:wan@2.6",
  "positivePrompt": "A cinematic scene with multiple shots",
  "duration": 10,
  "width": "1920",
  "height": "1080",
  "providerSettings": {
    "alibaba": {
      "promptExtend": true,
      "audio": true,
      "shotType": "multi"
    }
  }
}

Properties 3 properties

providerSettings » alibaba » promptExtend promptExtend boolean Default: true: Enables LLM-based prompt rewriting to improve generation quality by expanding and clarifying the input prompt. When enabled, the system analyzes and enhances the prompt to produce more detailed and coherent video output.

Enabling prompt extension increases generation time but typically results in higher quality output with better scene composition and narrative flow.

providerSettings » alibaba » audio audio boolean Default: true

Controls automatic audio generation for the video. When enabled, the model generates native audio that aligns with the visual content and scene progression.

This parameter is ignored if custom audio is provided via inputs.audio.

providerSettings » alibaba » shotType shotType "single" | "multi" Default: single

Determines the shot composition style for the generated video. This parameter controls whether the video is generated as a continuous single shot or as multiple shots with transitions.

Available values:

single: Generate video as a continuous single shot.
multi: Generate video with multiple shots and transitions between them.

This parameter only takes effect when promptExtend is set to true. Multi-shot composition works best with prompts that explicitly describe shot changes or scene transitions.

Wan2.2 Animate

Alibaba's Wan2.2 Animate is a unified video model that produces character-focused animations from static images and reference videos or replaces characters in existing footage while preserving motion, expressions, and scene consistency. Built on the Wan2.2 mixture-of-experts architecture, this model generates coherent character movement and seamless integration with background video.

Model AIR ID: runware:200@8.

Supported workflows: Image-to-video, video-to-video.

Technical specifications:

Positive prompt: 1-2000 characters (Default: 视频中的人在做动作).
Reference images: Supports inputs.referenceImages with 1 image (required).
Reference videos: Supports inputs.referenceVideos with 1 video (required).
Supported dimensions:
- 480p: 480×480 (1:1), 480×704 (±2:3), 704×480 (±3:2), 480×832 (±4:7), 832×480 (±7:4), 480×1280 (±3:8), 1280×480 (±8:3).
- 580p: 704×704 (1:1), 704×832 (±6:7), 832×704 (±7:6), 704×1280 (±11:20), 1280×704 (±20:11).
- 720p: 832×832 (1:1), 832×1280 (±13:20), 1280×832 (±20:13), 1280×1280 (1:1).
Dimension behavior:
- Specify explicit width and height from the supported dimensions above.
- Use resolution parameter (480p, 580p, or 720p) to automatically match the aspect ratio from the reference video.
- Omit both width/height and resolution to automatically determine dimensions from the reference video.
- Cannot use width/height and resolution together.
Steps: 2-50 (default: 30).
Frame rate: 4-60 FPS (default: 16).
LoRA: Supports LoRA configurations via lora parameter.

The output video duration matches the reference video length. The model automatically resizes the reference video to match the requested output resolution while preserving aspect ratio.

Faster Generation with Turbo

Wan2.2 Animate Turbo (runware:200@9) provides optimized performance with pre-configured acceleration (high), reduced steps (6), and specialized LoRA optimizations for faster generation times while maintaining visual quality.

Advanced features:

Parameters supported: wanAnimate.mode, wanAnimate.retargetPose, wanAnimate.prevSegCondFrames.

Animate mode

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d9",
  "model": "runware:200@8",
  "inputs": {
    "referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"],
    "referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
  },
  "steps": 6,
  "advancedFeatures": {
    "wanAnimate": {
      "mode": "animate",
      "retargetPose": false,
      "prevSegCondFrames": 1
    }
  }
}

Replace mode

{
  "taskType": "videoInference",
  "taskUUID": "6ba7b835-9dad-11d1-80b4-00c04fd430c8",
  "model": "runware:200@8",
  "inputs": {
    "referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"],
    "referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
  },
  "steps": 6,
  "advancedFeatures": {
    "wanAnimate": {
      "mode": "replace"
    }
  }
}

Pose retargeting

{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440017",
  "model": "runware:200@8",
  "inputs": {
    "referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"],
    "referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
  },
  "steps": 6,
  "advancedFeatures": {
    "wanAnimate": {
      "mode": "animate",
      "retargetPose": true,
      "prevSegCondFrames": 3
    }
  }
}

Wan2.5-Preview

Alibaba's Wan2.5-Preview model represents a research preview of multimodal video generation with native audio support. This model offers strong prompt adherence, smooth motion, and multilingual audio capabilities for narrative scenes up to 10 seconds, making it suitable for short-form storytelling and creative video workflows.

Model AIR ID: alibaba:wan@2.5-preview.

Supported workflows: Text-to-video, image-to-video, audio-to-video.

Technical specifications:

Positive prompt: 1-2000 characters (supports English and Chinese).
Negative prompt: 1-500 characters (optional).
Frame images: Supports first frame via inputs.frameImages (image-to-video only).
Audio input: Supports custom audio via inputs.audio.
Supported dimensions:
- 480p: 854×480 (16:9), 480×854 (9:16), 640×640 (1:1).
- 720p: 1280×720 (16:9), 720×1280 (9:16), 960×960 (1:1), 1088×832 (17:13), 832×1088 (13:17).
- 1080p: 1920×1080 (16:9), 1080×1920 (9:16), 1440×1440 (1:1), 1632×1248 (17:13), 1248×1632 (13:17).
Dimension behavior:
- Text-to-video: Specify explicit width and height from the supported dimensions above.
- Image-to-video: Two options available:
  - Specify width and height explicitly for precise control.
  - Use resolution parameter (480p, 720p, or 1080p) to automatically match the aspect ratio from the first frame image.
Duration: 5 or 10 seconds (default: 5).
Input image requirements: 360-2000 pixels, 10MB file size limit.
Audio requirements: WAV/MP3, 3-30 seconds duration, 15MB file size limit.

Provider-specific settings:

Parameters supported: promptExtend, audio.

Text-to-video

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
  "model": "alibaba:wan@2.5-preview",
  "positivePrompt": "A cinematic narrative scene with smooth character movement and atmospheric storytelling",
  "duration": 10,
  "width": 1920,
  "height": 1080,
  "providerSettings": {
    "alibaba": {
      "promptExtend": true,
      "audio": true
    }
  }
}

Image-to-video

{
  "taskType": "videoInference",
  "taskUUID": "6ba7b834-9dad-11d1-80b4-00c04fd430c8",
  "model": "alibaba:wan@2.5-preview",
  "inputs": {
    "frameImages": [
      {
        "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
        "frame": "first"
      }
    ]
  },
  "positivePrompt": "The character begins to move naturally through the scene with smooth motion",
  "duration": 5,
  "resolution": "720p",
  "providerSettings": {
    "alibaba": {
      "audio": true
    }
  }
}

Audio-to-video with custom audio

{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440016",
  "model": "alibaba:wan@2.5-preview",
  "positivePrompt": "Visual narrative synchronized with the provided audio track",
  "inputs": {
    "audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
  },
  "duration": 10,
  "width": 1280,
  "height": 720,
  "providerSettings": {
    "alibaba": {
      "promptExtend": true
    }
  }
}

Wan2.6

Alibaba's Wan2.6 model delivers multimodal video generation with native audio support and multi-shot sequencing capabilities. This model emphasizes temporal stability, consistent visual structure across shots, and reliable alignment between visuals and audio for short-form narrative video production.

Model AIR ID: alibaba:wan@2.6.

Supported workflows: Text-to-video, image-to-video, reference-to-video.

Technical specifications:

Positive prompt: 1-1500 characters (supports English and Chinese).
Negative prompt: 1-500 characters (optional).
Frame images: Supports first frame via inputs.frameImages (image-to-video only).
Reference images: Supports up to 5 images via inputs.referenceImages (reference-to-video only, 10MB per image limit).
Reference videos: Supports up to 3 videos via inputs.referenceVideos (reference-to-video only, 100MB per video limit).
Audio input: Supports custom audio via inputs.audio.
Supported dimensions:
- 720p: 1280×720 (16:9), 720×1280 (9:16), 960×960 (1:1), 1088×832 (17:13), 832×1088 (13:17).
- 1080p: 1920×1080 (16:9), 1080×1920 (9:16), 1440×1440 (1:1), 1632×1248 (17:13), 1248×1632 (13:17).
Dimension behavior:
- Text-to-video and reference-to-video: Specify explicit width and height from the supported dimensions above.
- Image-to-video: Two options available:
  - Specify width and height explicitly for precise control.
  - Use resolution parameter (720p or 1080p) to automatically match the aspect ratio from the first frame image.
Duration:
- Text-to-video and image-to-video: 5, 10, or 15 seconds (default: 5).
- Reference-to-video: 2-10 seconds (default: 5).
Input image requirements: 360-2000 pixels, 10MB file size limit.
Reference video requirements: Maximum 100MB per video.
Audio requirements: WAV/MP3, 3-30 seconds duration, 15MB file size limit.

Reference images and reference videos cannot be used together with frame images. Choose either image-to-video or reference-to-video workflow.

Provider-specific settings:

Parameters supported: promptExtend, audio, shotType.

Text-to-video

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d7",
  "model": "alibaba:wan@2.6",
  "positivePrompt": "A cinematic chase through a rain-soaked city, opening with a wide street shot, cutting to a close-up of footsteps splashing through puddles, followed by an overhead tracking shot",
  "duration": 10,
  "width": "1920",
  "height": "1080",
  "providerSettings": {
    "alibaba": {
      "promptExtend": true,
      "audio": true,
      "shotType": "multi"
    }
  }
}

Image-to-video

{
  "taskType": "videoInference",
  "taskUUID": "6ba7b833-9dad-11d1-80b4-00c04fd430c8",
  "model": "alibaba:wan@2.6",
  "inputs": {
    "frameImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
  },
  "positivePrompt": "The scene comes alive with gentle movement and atmospheric effects",
  "duration": 5,
  "resolution": "720p",
  "providerSettings": {
    "alibaba": {
      "audio": true
    }
  }
}

Reference-to-video (videos)

{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440015",
  "model": "alibaba:wan@2.6",
  "inputs": {
    "referenceVideos": [
      "c64351d5-4c59-42f7-95e1-eace013eddab",
      "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"
    ]
  },
  "positivePrompt": "character1 walks through a forest while character2 follows behind, maintaining their visual characteristics and movement styles",
  "duration": 8,
  "width": "1920",
  "height": "1080",
  "providerSettings": {
    "alibaba": {
      "promptExtend": true,
      "audio": true,
      "shotType": "single"
    }
  }
}

Reference-to-video (images)

{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da7",
  "model": "alibaba:wan@2.6",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab",
      "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f",
      "e8f9a0b1-3c6d-4e8f-b2e4-0d9e8f7c6b5a"
    ]
  },
  "positivePrompt": "Maintain consistent character appearance and style across the animated sequence",
  "duration": 5,
  "width": "1280",
  "height": "720",
  "providerSettings": {
    "alibaba": {
      "promptExtend": true,
      "shotType": "single"
    }
  }
}

Custom audio

{
  "taskType": "videoInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d489",
  "model": "alibaba:wan@2.6",
  "inputs": {
    "audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
  },
  "positivePrompt": "A dramatic scene with synchronized visuals matching the provided audio track",
  "duration": 10,
  "width": "1920",
  "height": "1080",
  "providerSettings": {
    "alibaba": {
      "promptExtend": true
    }
  }
}

Wan2.6 Flash

Alibaba's Wan2.6 Flash is a distilled, low-latency variant optimized for rapid image-to-video and reference-to-video generation with fluid motion and visual stability. This fast model preserves subject structure and motion realism while producing HD clips, making it ideal for preview workflows, high-throughput creative pipelines, and scenarios requiring quick turnaround without sacrificing quality.

Model AIR ID: alibaba:wan@2.6-flash.

Supported workflows: Image-to-video, reference-to-video.

Technical specifications:

Positive prompt: 1-1500 characters (supports English and Chinese).
Negative prompt: 1-500 characters (optional).
Frame images: Supports first frame via inputs.frameImages (image-to-video only).
Reference images: Supports up to 5 images via inputs.referenceImages (reference-to-video only, 10MB per image limit).
Reference videos: Supports up to 3 videos via inputs.referenceVideos (reference-to-video only, 100MB per video limit).
Audio input: Supports custom audio via inputs.audio.
Supported dimensions:
- 720p: 1280×720 (16:9), 720×1280 (9:16), 960×960 (1:1), 1088×832 (17:13), 832×1088 (13:17).
- 1080p: 1920×1080 (16:9), 1080×1920 (9:16), 1440×1440 (1:1), 1632×1248 (17:13), 1248×1632 (13:17).
Dimension behavior:
- Reference-to-video: Specify explicit width and height from the supported dimensions above.
- Image-to-video: Two options available:
  - Specify width and height explicitly for precise control.
  - Use resolution parameter (720p or 1080p) to automatically match the aspect ratio from the first frame image.
Duration:
- Image-to-video: 2-15 seconds (default: 5).
- Reference-to-video: 2-10 seconds (default: 5).
Input image requirements: 360-2000 pixels, 10MB file size limit.
Reference video requirements: Maximum 100MB per video.
Audio requirements: WAV/MP3, 3-30 seconds duration, 15MB file size limit.

Reference images and reference videos cannot be used together with frame images. Choose either image-to-video or reference-to-video workflow.

Provider-specific settings:

Parameters supported: promptExtend, audio, shotType.

Fast image-to-video

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
  "model": "alibaba:wan@2.6-flash",
  "inputs": {
    "frameImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
  },
  "positivePrompt": "Bring the scene to life with natural movement and atmospheric effects",
  "duration": 5,
  "resolution": "720p",
  "providerSettings": {
    "alibaba": {
      "audio": true
    }
  }
}

Reference-to-video (videos)

{
  "taskType": "videoInference",
  "taskUUID": "6ba7b837-9dad-11d1-80b4-00c04fd430c8",
  "model": "alibaba:wan@2.6-flash",
  "inputs": {
    "referenceVideos": [
      "c64351d5-4c59-42f7-95e1-eace013eddab",
      "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"
    ]
  },
  "positivePrompt": "Quick preview maintaining character consistency and motion style from reference videos",
  "duration": 5,
  "width": "1920",
  "height": "1080",
  "providerSettings": {
    "alibaba": {
      "promptExtend": true,
      "shotType": "single"
    }
  }
}

Reference-to-video (images)

{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440016",
  "model": "alibaba:wan@2.6-flash",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab",
      "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"
    ]
  },
  "positivePrompt": "Rapid animation preserving character features and style consistency",
  "duration": 8,
  "width": "1280",
  "height": "720",
  "providerSettings": {
    "alibaba": {
      "audio": true
    }
  }
}

Short-form content

{
  "taskType": "videoInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d490",
  "model": "alibaba:wan@2.6-flash",
  "inputs": {
    "frameImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
  },
  "positivePrompt": "Quick animated preview for social media with engaging motion",
  "duration": 2,
  "width": "720",
  "height": "1280",
  "providerSettings": {
    "alibaba": {
      "audio": true
    }
  }
}