Alibaba
Introduction
Alibaba Cloud's AI models are integrated into the Runware platform through our unified API, providing access to advanced generative capabilities across language, vision, and video domains. The Wan model family specializes in video generation with support for multi-shot sequencing, native audio, and strong temporal consistency.
Through the providerSettings.alibaba object, you can access Alibaba-specific features such as prompt extension, automatic audio generation, and multi-shot composition, while maintaining the consistency of Runware's standard API structure. This page documents the technical specifications, parameter requirements, and provider-specific settings for all Alibaba models available through our platform.
Image models
-
providerSettings»alibabaalibabaobject -
Configuration object for Alibaba-specific image generation settings. These parameters provide control over prompt enhancement for Wan image models.
Example 1 example
{ "taskType": "imageInference", "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6", "model": "alibaba:wan@2.5-image", "positivePrompt": "A cinematic still with rich detail", "width": 1280, "height": 1280, "providerSettings": { "alibaba": { "promptExtend": true } } }Properties 1 property
-
providerSettings»alibaba»promptExtendpromptExtendboolean Default: true -
Enables LLM-based prompt rewriting to improve generation quality by expanding and clarifying the input prompt. When enabled, the system analyzes and enhances the prompt to produce more detailed and coherent video output.
Enabling prompt extension increases generation time but typically results in higher quality output with better scene composition and narrative flow.
-
Qwen-Image-2.0
Alibaba's Qwen-Image-2.0 is an advanced unified model for image generation and editing that produces high-quality images at native 2K resolution with professional-grade text rendering. This model excels at generating complex textual content within images, making it ideal for infographics, posters, and layout-driven visuals with strong semantic understanding and detailed prompt adherence.
Model AIR ID: runware:qwen-image@2.0.
Supported workflows: Text-to-image, image-to-image.
Technical specifications:
- Positive prompt: 2-2000 characters.
- Negative prompt: 2-500 characters (text-to-image only).
- Reference images: Supports up to 3 images via
referenceImages. - Supported dimensions: Freely customizable within total area limit of 2,097,152 pixels (2048×1024 equivalent), width and height in 1-pixel increments.
Advanced generation settings:
settings.promptExtend(boolean, default: true): Enables automatic prompt expansion to improve quality. Adds 3-5 seconds latency. Disable for detailed prompts or latency-sensitive workflows.
There is currently a temporary backend validation limiting width and height to a maximum value of 2048.
This restriction will be removed in the next deployment.
The model supports any dimensions within the total area limit of 2,097,152 pixels, as described above.
{
"taskType": "imageInference",
"taskUUID": "6ba7b834-9dad-11d1-80b4-00c04fd430c8",
"model": "runware:qwen-image@2.0",
"positivePrompt": "Create a professional business infographic titled 'QUARTERLY GROWTH 2025' with bar charts, percentage indicators, and clean typography on light background",
"negativePrompt": "blurry text, distorted letters, low quality",
"width": 1920,
"height": 1080,
"numberResults": 2,
"settings": {
"promptExtend": true
}
}{
"taskType": "imageInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440015",
"model": "runware:qwen-image@2.0",
"inputs": {
"referenceImages": [
"c64351d5-4c59-42f7-95e1-eace013eddab"
]
},
"positivePrompt": "Change the background to a modern office environment while keeping the subject and text intact",
"width": 1024,
"height": 1024
}{
"taskType": "imageInference",
"taskUUID": "a770f077-f413-47de-9dac-be0b26a35daa",
"model": "runware:qwen-image@2.0",
"positivePrompt": "Design a concert poster with bold text 'LIVE MUSIC FESTIVAL' featuring geometric patterns, vibrant colors, and professional layout with event details",
"width": 1200,
"height": 1600,
"seed": 42,
"settings": {
"promptExtend": false
}
}Qwen-Image-2.0-Pro
Alibaba's Qwen-Image-2.0-Pro enhances the base model with optimized visual fidelity, improved layout and typography handling, and advanced editing control for professional creative workflows. This model delivers richer detail, more accurate text and iconography rendering, making it suitable for advertising, branding, design systems, and high-impact visual content.
Model AIR ID: runware:qwen-image@2.0-pro.
Supported workflows: Text-to-image, image-to-image.
Technical specifications:
- Positive prompt: 2-2000 characters.
- Negative prompt: 2-500 characters (text-to-image only).
- Reference images: Supports up to 3 images via
referenceImages. - Supported dimensions: Freely customizable within total area limit of 2,097,152 pixels (2048×1024 equivalent), width and height in 1-pixel increments.
Advanced generation settings:
settings.promptExtend(boolean, default: true): Enables automatic prompt expansion to improve quality. Adds 3-5 seconds latency. Disable for detailed prompts or latency-sensitive workflows.
There is currently a temporary backend validation limiting width and height to a maximum value of 2048.
This restriction will be removed in the next deployment.
The model supports any dimensions within the total area limit of 2,097,152 pixels, as described above.
{
"taskType": "imageInference",
"taskUUID": "6ba7b835-9dad-11d1-80b4-00c04fd430c8",
"model": "runware:qwen-image@2.0-pro",
"positivePrompt": "Create a luxury brand advertisement with elegant text 'PRESTIGE COLLECTION' featuring refined typography, gold accents, and sophisticated composition",
"negativePrompt": "cluttered, poor spacing, amateur design",
"width": 2048,
"height": 1024,
"numberResults": 3,
"settings": {
"promptExtend": true
}
}{
"taskType": "imageInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440016",
"model": "runware:qwen-image@2.0-pro",
"inputs": {
"referenceImages": [
"c64351d5-4c59-42f7-95e1-eace013eddab",
"d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f",
"454639ca-4717-4f8b-a031-b593e96b8cd4"
]
},
"positivePrompt": "Blend these product shots into a unified catalog layout with consistent lighting and professional presentation",
"width": 1600,
"height": 1200
}{
"taskType": "imageInference",
"taskUUID": "a770f077-f413-47de-9dac-be0b26a35dab",
"model": "runware:qwen-image@2.0-pro",
"positivePrompt": "Design a mobile app interface mockup with crisp icons, clear navigation labels, and modern UI elements with perfect pixel alignment",
"width": 1080,
"height": 1920,
"seed": 12345,
"settings": {
"promptExtend": false
}
}Wan2.5-Preview Image
Alibaba's Wan2.5-Preview Image delivers high-fidelity single frame generation built from the Wan2.5 video architecture. This model focuses on detailed depth structure, strong prompt following, multilingual text rendering, and video-grade visual quality for production-ready stills.
Model AIR ID: alibaba:wan@2.5-image.
Supported workflows: Text-to-image.
Technical specifications:
- Positive prompt: 1-2000 characters (supports English and Chinese).
- Negative prompt: 1-500 characters (optional).
- Supported dimensions: Minimum 768×768 total pixels (589,824), maximum 1440×1440 total pixels (2,073,600), aspect ratio between 1:4 and 4:1 (default: 1280×1280).
Provider-specific settings:
Parameters supported: promptExtend.
{
"taskType": "imageInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
"model": "alibaba:wan@2.5-image",
"positivePrompt": "A cinematic still of a dramatic landscape with detailed depth structure and rich atmospheric lighting",
"width": 1280,
"height": 1280,
"providerSettings": {
"alibaba": {
"promptExtend": true
}
}
}Wan2.6 Image
Alibaba's Wan2.6 Image is a single-frame image generation model derived from the Wan2.6 multimodal video architecture. It focuses on strong prompt adherence, clean spatial structure, and visually coherent results, delivering video-grade image quality for creative, editorial, and product-oriented workflows.
Model AIR ID: alibaba:wan@2.6-image.
Supported workflows: Text-to-image, image-to-image, reference-to-image, image-editing.
Technical specifications:
- Positive prompt: 1-2100 characters (supports English and Chinese).
- Negative prompt: 1-500 characters (optional).
- Supported dimensions: Minimum 1280×1280 total pixels (1,638,400), maximum 1440×1440 total pixels (2,073,600), aspect ratio between 1:4 and 4:1 (default: 1280×1280).
- Recommended resolutions: 1280×1280 (1:1), 1280×720 (16:9), 720×1280 (9:16), 1280×960 (4:3), 960×1280 (3:4), 1200×800 (3:2), 800×1200 (2:3), 1344×576 (21:9).
- Reference images: Supports up to 4 images via
inputs.referenceImages.
Provider-specific settings:
Parameters supported: promptExtend.
{
"taskType": "imageInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d9",
"model": "alibaba:wan@2.6-image",
"positivePrompt": "A professional product photograph with clean spatial composition and precise lighting for editorial use",
"width": 1280,
"height": 1280,
"providerSettings": {
"alibaba": {
"promptExtend": true
}
}
}{
"taskType": "imageInference",
"taskUUID": "6ba7b833-9dad-11d1-80b4-00c04fd430c8",
"model": "alibaba:wan@2.6-image",
"inputs": {
"referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
},
"positivePrompt": "Transform this image while maintaining strong spatial structure and visual coherence",
"width": 1280,
"height": 720
}{
"taskType": "imageInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440015",
"model": "alibaba:wan@2.6-image",
"inputs": {
"referenceImages": [
"c64351d5-4c59-42f7-95e1-eace013eddab",
"d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f",
"e8f9a0b1-3c6d-4e8f-b2e4-0d9e8f7c6b5a"
]
},
"positivePrompt": "Combine these elements with video-grade quality and coherent visual structure",
"width": 960,
"height": 1280,
"providerSettings": {
"alibaba": {
"promptExtend": true
}
}
}Video models
-
advancedFeatures»wanAnimatewanAnimateobject -
Configuration object for Wan2.2 Animate character animation and replacement features. These parameters control animation strategies, pose retargeting, and temporal consistency for character-focused video generation.
Example 1 example
{ "taskType": "videoInference", "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6", "model": "alibaba:wan@2.2-animate", "inputs": { "referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"], "referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"] }, "advancedFeatures": { "wanAnimate": { "mode": "animate", "retargetPose": true, "prevSegCondFrames": 3 } } }Properties 3 properties
-
advancedFeatures»wanAnimate»modemode"animate" | "replace" Default: animate -
Selects the animation strategy for character generation and integration into video footage.
Available values:
animate: Uses pose detection from the reference image and video, with optional skeleton retargeting to adjust the reference pose to match video movements. Ideal for bringing static characters to life with natural motion.replace: Uses pose detection and segmentation models to determine character pose and shape, then replaces the character in the video while preserving background and motion. Best for character substitution in existing footage.
-
advancedFeatures»wanAnimate»retargetPoseretargetPoseboolean Default: false -
Retargets the pose of the video to match the reference image's initial pose, with bone positions (notably hands) adjusted according to video movements. This creates more natural alignment between the reference character's pose and the video motion.
This parameter is only supported in
animatemode and has no effect when usingreplacemode.
-
advancedFeatures»wanAnimate»prevSegCondFramesprevSegCondFramesinteger Min: 1 Max: 5 Default: 1 -
Number of frames taken from the previous segment to maintain temporal consistency across video segments. Higher values improve visual continuity between segments but increase inference time, while lower values reduce consistency but generate faster.
-
-
providerSettings»alibabaalibabaobject -
Configuration object for Alibaba-specific video generation settings. These parameters provide control over prompt enhancement, audio generation, and shot composition for Wan video models.
Example 1 example
{ "taskType": "videoInference", "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6", "model": "alibaba:wan@2.6", "positivePrompt": "A cinematic scene with multiple shots", "duration": 10, "width": "1920", "height": "1080", "providerSettings": { "alibaba": { "promptExtend": true, "audio": true, "shotType": "multi" } } }Properties 3 properties
-
providerSettings»alibaba»promptExtendpromptExtendboolean Default: true -
Enables LLM-based prompt rewriting to improve generation quality by expanding and clarifying the input prompt. When enabled, the system analyzes and enhances the prompt to produce more detailed and coherent video output.
Enabling prompt extension increases generation time but typically results in higher quality output with better scene composition and narrative flow.
-
providerSettings»alibaba»audioaudioboolean Default: true -
Controls automatic audio generation for the video. When enabled, the model generates native audio that aligns with the visual content and scene progression.
This parameter is ignored if custom audio is provided via
inputs.audio.
-
providerSettings»alibaba»shotTypeshotType"single" | "multi" Default: single -
Determines the shot composition style for the generated video. This parameter controls whether the video is generated as a continuous single shot or as multiple shots with transitions.
Available values:
single: Generate video as a continuous single shot.multi: Generate video with multiple shots and transitions between them.
This parameter only takes effect when
promptExtendis set totrue. Multi-shot composition works best with prompts that explicitly describe shot changes or scene transitions.
-
Wan2.2 Animate
Alibaba's Wan2.2 Animate is a unified video model that produces character-focused animations from static images and reference videos or replaces characters in existing footage while preserving motion, expressions, and scene consistency. Built on the Wan2.2 mixture-of-experts architecture, this model generates coherent character movement and seamless integration with background video.
Model AIR ID: runware:200@8.
Supported workflows: Image-to-video, video-to-video.
Technical specifications:
- Positive prompt: 1-2000 characters (Default:
视频中的人在做动作). - Reference images: Supports
inputs.referenceImageswith 1 image (required). - Reference videos: Supports
inputs.referenceVideoswith 1 video (required). - Supported dimensions:
- 480p: 480×480 (1:1), 480×704 (±2:3), 704×480 (±3:2), 480×832 (±4:7), 832×480 (±7:4), 480×1280 (±3:8), 1280×480 (±8:3).
- 580p: 704×704 (1:1), 704×832 (±6:7), 832×704 (±7:6), 704×1280 (±11:20), 1280×704 (±20:11).
- 720p: 832×832 (1:1), 832×1280 (±13:20), 1280×832 (±20:13), 1280×1280 (1:1).
- Dimension behavior:
- Specify explicit
widthandheightfrom the supported dimensions above. - Use
resolutionparameter (480p,580p, or720p) to automatically match the aspect ratio from the reference video. - Omit both
width/heightandresolutionto automatically determine dimensions from the reference video. - Cannot use
width/heightandresolutiontogether.
- Specify explicit
- Steps: 2-50 (default: 30).
- Frame rate: 4-60 FPS (default: 16).
- LoRA: Supports LoRA configurations via
loraparameter.
The output video duration matches the reference video length. The model automatically resizes the reference video to match the requested output resolution while preserving aspect ratio.
Wan2.2 Animate Turbo (runware:200@9) provides optimized performance with pre-configured acceleration (high), reduced steps (6), and specialized LoRA optimizations for faster generation times while maintaining visual quality.
Advanced features:
Parameters supported: wanAnimate.mode, wanAnimate.retargetPose, wanAnimate.prevSegCondFrames.
{
"taskType": "videoInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d9",
"model": "runware:200@8",
"inputs": {
"referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"],
"referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
},
"steps": 6,
"advancedFeatures": {
"wanAnimate": {
"mode": "animate",
"retargetPose": false,
"prevSegCondFrames": 1
}
}
}{
"taskType": "videoInference",
"taskUUID": "6ba7b835-9dad-11d1-80b4-00c04fd430c8",
"model": "runware:200@8",
"inputs": {
"referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"],
"referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
},
"steps": 6,
"advancedFeatures": {
"wanAnimate": {
"mode": "replace"
}
}
}{
"taskType": "videoInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440017",
"model": "runware:200@8",
"inputs": {
"referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"],
"referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
},
"steps": 6,
"advancedFeatures": {
"wanAnimate": {
"mode": "animate",
"retargetPose": true,
"prevSegCondFrames": 3
}
}
}Wan2.5-Preview
Alibaba's Wan2.5-Preview model represents a research preview of multimodal video generation with native audio support. This model offers strong prompt adherence, smooth motion, and multilingual audio capabilities for narrative scenes up to 10 seconds, making it suitable for short-form storytelling and creative video workflows.
Model AIR ID: alibaba:wan@2.5-preview.
Supported workflows: Text-to-video, image-to-video, audio-to-video.
Technical specifications:
- Positive prompt: 1-2000 characters (supports English and Chinese).
- Negative prompt: 1-500 characters (optional).
- Frame images: Supports first frame via
inputs.frameImages(image-to-video only). - Audio input: Supports custom audio via
inputs.audio. - Supported dimensions:
- 480p: 854×480 (16:9), 480×854 (9:16), 640×640 (1:1).
- 720p: 1280×720 (16:9), 720×1280 (9:16), 960×960 (1:1), 1088×832 (17:13), 832×1088 (13:17).
- 1080p: 1920×1080 (16:9), 1080×1920 (9:16), 1440×1440 (1:1), 1632×1248 (17:13), 1248×1632 (13:17).
- Dimension behavior:
- Text-to-video: Specify explicit
widthandheightfrom the supported dimensions above. - Image-to-video: Two options available:
- Specify
widthandheightexplicitly for precise control. - Use
resolutionparameter (480p,720p, or1080p) to automatically match the aspect ratio from the first frame image.
- Specify
- Text-to-video: Specify explicit
- Duration: 5 or 10 seconds (default: 5).
- Input image requirements: 360-2000 pixels, 10MB file size limit.
- Audio requirements: WAV/MP3, 3-30 seconds duration, 15MB file size limit.
Provider-specific settings:
Parameters supported: promptExtend, audio.
{
"taskType": "videoInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
"model": "alibaba:wan@2.5-preview",
"positivePrompt": "A cinematic narrative scene with smooth character movement and atmospheric storytelling",
"duration": 10,
"width": 1920,
"height": 1080,
"providerSettings": {
"alibaba": {
"promptExtend": true,
"audio": true
}
}
}{
"taskType": "videoInference",
"taskUUID": "6ba7b834-9dad-11d1-80b4-00c04fd430c8",
"model": "alibaba:wan@2.5-preview",
"inputs": {
"frameImages": [
{
"inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
"frame": "first"
}
]
},
"positivePrompt": "The character begins to move naturally through the scene with smooth motion",
"duration": 5,
"resolution": "720p",
"providerSettings": {
"alibaba": {
"audio": true
}
}
}{
"taskType": "videoInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440016",
"model": "alibaba:wan@2.5-preview",
"positivePrompt": "Visual narrative synchronized with the provided audio track",
"inputs": {
"audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
},
"duration": 10,
"width": 1280,
"height": 720,
"providerSettings": {
"alibaba": {
"promptExtend": true
}
}
}Wan2.6
Alibaba's Wan2.6 model delivers multimodal video generation with native audio support and multi-shot sequencing capabilities. This model emphasizes temporal stability, consistent visual structure across shots, and reliable alignment between visuals and audio for short-form narrative video production.
Model AIR ID: alibaba:wan@2.6.
Supported workflows: Text-to-video, image-to-video, reference-to-video.
Technical specifications:
- Positive prompt: 1-1500 characters (supports English and Chinese).
- Negative prompt: 1-500 characters (optional).
- Frame images: Supports first frame via
inputs.frameImages(image-to-video only). - Reference images: Supports up to 5 images via
inputs.referenceImages(reference-to-video only, 10MB per image limit). - Reference videos: Supports up to 3 videos via
inputs.referenceVideos(reference-to-video only, 100MB per video limit). - Audio input: Supports custom audio via
inputs.audio. - Supported dimensions:
- 720p: 1280×720 (16:9), 720×1280 (9:16), 960×960 (1:1), 1088×832 (17:13), 832×1088 (13:17).
- 1080p: 1920×1080 (16:9), 1080×1920 (9:16), 1440×1440 (1:1), 1632×1248 (17:13), 1248×1632 (13:17).
- Dimension behavior:
- Text-to-video and reference-to-video: Specify explicit
widthandheightfrom the supported dimensions above. - Image-to-video: Two options available:
- Specify
widthandheightexplicitly for precise control. - Use
resolutionparameter (720por1080p) to automatically match the aspect ratio from the first frame image.
- Specify
- Text-to-video and reference-to-video: Specify explicit
- Duration:
- Text-to-video and image-to-video: 5, 10, or 15 seconds (default: 5).
- Reference-to-video: 2-10 seconds (default: 5).
- Input image requirements: 360-2000 pixels, 10MB file size limit.
- Reference video requirements: Maximum 100MB per video.
- Audio requirements: WAV/MP3, 3-30 seconds duration, 15MB file size limit.
Reference images and reference videos cannot be used together with frame images. Choose either image-to-video or reference-to-video workflow.
Provider-specific settings:
Parameters supported: promptExtend, audio, shotType.
{
"taskType": "videoInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d7",
"model": "alibaba:wan@2.6",
"positivePrompt": "A cinematic chase through a rain-soaked city, opening with a wide street shot, cutting to a close-up of footsteps splashing through puddles, followed by an overhead tracking shot",
"duration": 10,
"width": "1920",
"height": "1080",
"providerSettings": {
"alibaba": {
"promptExtend": true,
"audio": true,
"shotType": "multi"
}
}
}{
"taskType": "videoInference",
"taskUUID": "6ba7b833-9dad-11d1-80b4-00c04fd430c8",
"model": "alibaba:wan@2.6",
"inputs": {
"frameImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
},
"positivePrompt": "The scene comes alive with gentle movement and atmospheric effects",
"duration": 5,
"resolution": "720p",
"providerSettings": {
"alibaba": {
"audio": true
}
}
}{
"taskType": "videoInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440015",
"model": "alibaba:wan@2.6",
"inputs": {
"referenceVideos": [
"c64351d5-4c59-42f7-95e1-eace013eddab",
"d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"
]
},
"positivePrompt": "character1 walks through a forest while character2 follows behind, maintaining their visual characteristics and movement styles",
"duration": 8,
"width": "1920",
"height": "1080",
"providerSettings": {
"alibaba": {
"promptExtend": true,
"audio": true,
"shotType": "single"
}
}
}{
"taskType": "videoInference",
"taskUUID": "a770f077-f413-47de-9dac-be0b26a35da7",
"model": "alibaba:wan@2.6",
"inputs": {
"referenceImages": [
"c64351d5-4c59-42f7-95e1-eace013eddab",
"d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f",
"e8f9a0b1-3c6d-4e8f-b2e4-0d9e8f7c6b5a"
]
},
"positivePrompt": "Maintain consistent character appearance and style across the animated sequence",
"duration": 5,
"width": "1280",
"height": "720",
"providerSettings": {
"alibaba": {
"promptExtend": true,
"shotType": "single"
}
}
}{
"taskType": "videoInference",
"taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d489",
"model": "alibaba:wan@2.6",
"inputs": {
"audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
},
"positivePrompt": "A dramatic scene with synchronized visuals matching the provided audio track",
"duration": 10,
"width": "1920",
"height": "1080",
"providerSettings": {
"alibaba": {
"promptExtend": true
}
}
}Wan2.6 Flash
Alibaba's Wan2.6 Flash is a distilled, low-latency variant optimized for rapid image-to-video and reference-to-video generation with fluid motion and visual stability. This fast model preserves subject structure and motion realism while producing HD clips, making it ideal for preview workflows, high-throughput creative pipelines, and scenarios requiring quick turnaround without sacrificing quality.
Model AIR ID: alibaba:wan@2.6-flash.
Supported workflows: Image-to-video, reference-to-video.
Technical specifications:
- Positive prompt: 1-1500 characters (supports English and Chinese).
- Negative prompt: 1-500 characters (optional).
- Frame images: Supports first frame via
inputs.frameImages(image-to-video only). - Reference images: Supports up to 5 images via
inputs.referenceImages(reference-to-video only, 10MB per image limit). - Reference videos: Supports up to 3 videos via
inputs.referenceVideos(reference-to-video only, 100MB per video limit). - Audio input: Supports custom audio via
inputs.audio. - Supported dimensions:
- 720p: 1280×720 (16:9), 720×1280 (9:16), 960×960 (1:1), 1088×832 (17:13), 832×1088 (13:17).
- 1080p: 1920×1080 (16:9), 1080×1920 (9:16), 1440×1440 (1:1), 1632×1248 (17:13), 1248×1632 (13:17).
- Dimension behavior:
- Reference-to-video: Specify explicit
widthandheightfrom the supported dimensions above. - Image-to-video: Two options available:
- Specify
widthandheightexplicitly for precise control. - Use
resolutionparameter (720por1080p) to automatically match the aspect ratio from the first frame image.
- Specify
- Reference-to-video: Specify explicit
- Duration:
- Image-to-video: 2-15 seconds (default: 5).
- Reference-to-video: 2-10 seconds (default: 5).
- Input image requirements: 360-2000 pixels, 10MB file size limit.
- Reference video requirements: Maximum 100MB per video.
- Audio requirements: WAV/MP3, 3-30 seconds duration, 15MB file size limit.
Reference images and reference videos cannot be used together with frame images. Choose either image-to-video or reference-to-video workflow.
Provider-specific settings:
Parameters supported: promptExtend, audio, shotType.
{
"taskType": "videoInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
"model": "alibaba:wan@2.6-flash",
"inputs": {
"frameImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
},
"positivePrompt": "Bring the scene to life with natural movement and atmospheric effects",
"duration": 5,
"resolution": "720p",
"providerSettings": {
"alibaba": {
"audio": true
}
}
}{
"taskType": "videoInference",
"taskUUID": "6ba7b837-9dad-11d1-80b4-00c04fd430c8",
"model": "alibaba:wan@2.6-flash",
"inputs": {
"referenceVideos": [
"c64351d5-4c59-42f7-95e1-eace013eddab",
"d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"
]
},
"positivePrompt": "Quick preview maintaining character consistency and motion style from reference videos",
"duration": 5,
"width": "1920",
"height": "1080",
"providerSettings": {
"alibaba": {
"promptExtend": true,
"shotType": "single"
}
}
}{
"taskType": "videoInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440016",
"model": "alibaba:wan@2.6-flash",
"inputs": {
"referenceImages": [
"c64351d5-4c59-42f7-95e1-eace013eddab",
"d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"
]
},
"positivePrompt": "Rapid animation preserving character features and style consistency",
"duration": 8,
"width": "1280",
"height": "720",
"providerSettings": {
"alibaba": {
"audio": true
}
}
}{
"taskType": "videoInference",
"taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d490",
"model": "alibaba:wan@2.6-flash",
"inputs": {
"frameImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
},
"positivePrompt": "Quick animated preview for social media with engaging motion",
"duration": 2,
"width": "720",
"height": "1280",
"providerSettings": {
"alibaba": {
"audio": true
}
}
}