Alibaba
Introduction
Alibaba Cloud's AI models are integrated into the Runware platform through our unified API, providing access to advanced generative capabilities across language, vision, and video domains. The Wan model family specializes in video generation with support for multi-shot sequencing, native audio, and strong temporal consistency.
Through the providerSettings.alibaba object, you can access Alibaba-specific features such as prompt extension, automatic audio generation, and multi-shot composition, while maintaining the consistency of Runware's standard API structure. This page documents the technical specifications, parameter requirements, and provider-specific settings for all Alibaba models available through our platform.
Image models
-
providerSettings»alibabaalibabaobject -
Configuration object for Alibaba-specific image generation settings. These parameters provide control over prompt enhancement for Wan image models.
View example
{ "taskType": "imageInference", "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6", "model": "alibaba:wan@2.5-image", "positivePrompt": "A cinematic still with rich detail", "width": 1280, "height": 1280, "providerSettings": { "alibaba": { "promptExtend": true } } }Properties 1 property
-
providerSettings»alibaba»promptExtendpromptExtendboolean Default: true -
Enables LLM-based prompt rewriting to improve generation quality by expanding and clarifying the input prompt. When enabled, the system analyzes and enhances the prompt to produce more detailed and coherent video output.
Enabling prompt extension increases generation time but typically results in higher quality output with better scene composition and narrative flow.
-
Wan2.5-Preview Image
Alibaba's Wan2.5-Preview Image delivers high-fidelity single frame generation built from the Wan2.5 video architecture. This model focuses on detailed depth structure, strong prompt following, multilingual text rendering, and video-grade visual quality for production-ready stills.
Model AIR ID: alibaba:wan@2.5-image.
Supported workflows: Text-to-image.
Technical specifications:
- Positive prompt: 1-2000 characters (supports English and Chinese).
- Negative prompt: 1-500 characters (optional).
- Supported dimensions: Minimum 768×768 total pixels (589,824), maximum 1440×1440 total pixels (2,073,600), aspect ratio between 1:4 and 4:1 (default: 1280×1280).
Provider-specific settings:
Parameters supported: promptExtend.
{
"taskType": "imageInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
"model": "alibaba:wan@2.5-image",
"positivePrompt": "A cinematic still of a dramatic landscape with detailed depth structure and rich atmospheric lighting",
"width": 1280,
"height": 1280,
"providerSettings": {
"alibaba": {
"promptExtend": true
}
}
}Video models
-
advancedFeatures»wanAnimatewanAnimateobject -
Configuration object for Wan2.2 Animate character animation and replacement features. These parameters control animation strategies, pose retargeting, and temporal consistency for character-focused video generation.
View example
{ "taskType": "videoInference", "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6", "model": "alibaba:wan@2.2-animate", "inputs": { "referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"], "referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"] }, "advancedFeatures": { "wanAnimate": { "mode": "animate", "retargetPose": true, "prevSegCondFrames": 3 } } }Properties 3 properties
-
advancedFeatures»wanAnimate»modemode"animate" | "replace" Default: animate -
Selects the animation strategy for character generation and integration into video footage.
Available values:
animate: Uses pose detection from the reference image and video, with optional skeleton retargeting to adjust the reference pose to match video movements. Ideal for bringing static characters to life with natural motion.replace: Uses pose detection and segmentation models to determine character pose and shape, then replaces the character in the video while preserving background and motion. Best for character substitution in existing footage.
-
advancedFeatures»wanAnimate»retargetPoseretargetPoseboolean Default: false -
Retargets the pose of the video to match the reference image's initial pose, with bone positions (notably hands) adjusted according to video movements. This creates more natural alignment between the reference character's pose and the video motion.
This parameter is only supported in
animatemode and has no effect when usingreplacemode.
-
advancedFeatures»wanAnimate»prevSegCondFramesprevSegCondFramesinteger Min: 1 Max: 5 Default: 1 -
Number of frames taken from the previous segment to maintain temporal consistency across video segments. Higher values improve visual continuity between segments but increase inference time, while lower values reduce consistency but generate faster.
-
-
providerSettings»alibabaalibabaobject -
Configuration object for Alibaba-specific video generation settings. These parameters provide control over prompt enhancement, audio generation, and shot composition for Wan video models.
View example
{ "taskType": "videoInference", "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6", "model": "alibaba:wan@2.6", "positivePrompt": "A cinematic scene with multiple shots", "duration": 10, "width": "1920", "height": "1080", "providerSettings": { "alibaba": { "promptExtend": true, "audio": true, "shotType": "multi" } } }Properties 3 properties
-
providerSettings»alibaba»promptExtendpromptExtendboolean Default: true -
Enables LLM-based prompt rewriting to improve generation quality by expanding and clarifying the input prompt. When enabled, the system analyzes and enhances the prompt to produce more detailed and coherent video output.
Enabling prompt extension increases generation time but typically results in higher quality output with better scene composition and narrative flow.
-
providerSettings»alibaba»audioaudioboolean Default: true -
Controls automatic audio generation for the video. When enabled, the model generates native audio that aligns with the visual content and scene progression.
This parameter is ignored if custom audio is provided via
inputs.audio.
-
providerSettings»alibaba»shotTypeshotType"single" | "multi" Default: single -
Determines the shot composition style for the generated video. This parameter controls whether the video is generated as a continuous single shot or as multiple shots with transitions.
Available values:
single: Generate video as a continuous single shot.multi: Generate video with multiple shots and transitions between them.
This parameter only takes effect when
promptExtendis set totrue. Multi-shot composition works best with prompts that explicitly describe shot changes or scene transitions.
-
Wan2.2 Animate
Alibaba's Wan2.2 Animate is a unified video model that produces character-focused animations from static images and reference videos or replaces characters in existing footage while preserving motion, expressions, and scene consistency. Built on the Wan2.2 mixture-of-experts architecture, this model generates coherent character movement and seamless integration with background video.
Model AIR ID: runware:200@8.
Supported workflows: Image-to-video, video-to-video.
Technical specifications:
- Positive prompt: 1-2000 characters (Default:
视频中的人在做动作). - Reference images: Supports
inputs.referenceImageswith 1 image (required). - Reference videos: Supports
inputs.referenceVideoswith 1 video (required). - Supported dimensions:
- 480p: 480×480 (1:1), 480×704 (±2:3), 704×480 (±3:2), 480×832 (±4:7), 832×480 (±7:4), 480×1280 (±3:8), 1280×480 (±8:3).
- 580p: 704×704 (1:1), 704×832 (±6:7), 832×704 (±7:6), 704×1280 (±11:20), 1280×704 (±20:11).
- 720p: 832×832 (1:1), 832×1280 (±13:20), 1280×832 (±20:13), 1280×1280 (1:1).
- Dimension behavior:
- Specify explicit
widthandheightfrom the supported dimensions above. - Use
resolutionparameter (480p,580p, or720p) to automatically match the aspect ratio from the reference video. - Omit both
width/heightandresolutionto automatically determine dimensions from the reference video. - Cannot use
width/heightandresolutiontogether.
- Specify explicit
- Steps: 2-50 (default: 30).
- Frame rate: 4-60 FPS (default: 16).
- LoRA: Supports LoRA configurations via
loraparameter.
The output video duration matches the reference video length. The model automatically resizes the reference video to match the requested output resolution while preserving aspect ratio.
Wan2.2 Animate Turbo (runware:200@9) provides optimized performance with pre-configured acceleration (high), reduced steps (6), and specialized LoRA optimizations for faster generation times while maintaining visual quality.
Advanced features:
Parameters supported: wanAnimate.mode, wanAnimate.retargetPose, wanAnimate.prevSegCondFrames.
{
"taskType": "videoInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d9",
"model": "alibaba:wan@2.2-animate",
"inputs": {
"referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"],
"referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
},
"steps": 6,
"advancedFeatures": {
"wanAnimate": {
"mode": "animate",
"retargetPose": false,
"prevSegCondFrames": 1
}
}
}{
"taskType": "videoInference",
"taskUUID": "6ba7b835-9dad-11d1-80b4-00c04fd430c8",
"model": "alibaba:wan@2.2-animate",
"inputs": {
"referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"],
"referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
},
"steps": 6,
"advancedFeatures": {
"wanAnimate": {
"mode": "replace"
}
}
}{
"taskType": "videoInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440017",
"model": "alibaba:wan@2.2-animate",
"inputs": {
"referenceImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"],
"referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
},
"steps": 6,
"advancedFeatures": {
"wanAnimate": {
"mode": "animate",
"retargetPose": true,
"prevSegCondFrames": 3
}
}
}Wan2.5-Preview
Alibaba's Wan2.5-Preview model represents a research preview of multimodal video generation with native audio support. This model offers strong prompt adherence, smooth motion, and multilingual audio capabilities for narrative scenes up to 10 seconds, making it suitable for short-form storytelling and creative video workflows.
Model AIR ID: alibaba:wan@2.5-preview.
Supported workflows: Text-to-video, image-to-video, audio-to-video.
Technical specifications:
- Positive prompt: 1-2000 characters (supports English and Chinese).
- Negative prompt: 1-500 characters (optional).
- Frame images: Supports first frame via
inputs.frameImages(image-to-video only). - Audio input: Supports custom audio via
inputs.audio. - Supported dimensions:
- 480p: 854×480 (16:9), 480×854 (9:16), 640×640 (1:1).
- 720p: 1280×720 (16:9), 720×1280 (9:16), 960×960 (1:1), 1088×832 (17:13), 832×1088 (13:17).
- 1080p: 1920×1080 (16:9), 1080×1920 (9:16), 1440×1440 (1:1), 1632×1248 (17:13), 1248×1632 (13:17).
- Dimension behavior:
- Text-to-video: Specify explicit
widthandheightfrom the supported dimensions above. - Image-to-video: Two options available:
- Specify
widthandheightexplicitly for precise control. - Use
resolutionparameter (480p,720p, or1080p) to automatically match the aspect ratio from the first frame image.
- Specify
- Text-to-video: Specify explicit
- Duration: 5 or 10 seconds (default: 5).
- Input image requirements: 360-2000 pixels, 10MB file size limit.
- Audio requirements: WAV/MP3, 3-30 seconds duration, 15MB file size limit.
Provider-specific settings:
Parameters supported: promptExtend, audio.
{
"taskType": "videoInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
"model": "alibaba:wan@2.5-preview",
"positivePrompt": "A cinematic narrative scene with smooth character movement and atmospheric storytelling",
"duration": 10,
"width": 1920,
"height": 1080,
"providerSettings": {
"alibaba": {
"promptExtend": true,
"audio": true
}
}
}{
"taskType": "videoInference",
"taskUUID": "6ba7b834-9dad-11d1-80b4-00c04fd430c8",
"model": "alibaba:wan@2.5-preview",
"inputs": {
"frameImages": [
{
"inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
"frame": "first"
}
]
},
"positivePrompt": "The character begins to move naturally through the scene with smooth motion",
"duration": 5,
"resolution": "720p",
"providerSettings": {
"alibaba": {
"audio": true
}
}
}{
"taskType": "videoInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440016",
"model": "alibaba:wan@2.5-preview",
"positivePrompt": "Visual narrative synchronized with the provided audio track",
"inputs": {
"audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
},
"duration": 10,
"width": 1280,
"height": 720,
"providerSettings": {
"alibaba": {
"promptExtend": true
}
}
}Wan2.6
Alibaba's Wan2.6 model delivers multimodal video generation with native audio support and multi-shot sequencing capabilities. This model emphasizes temporal stability, consistent visual structure across shots, and reliable alignment between visuals and audio for short-form narrative video production.
Model AIR ID: alibaba:wan@2.6.
Supported workflows: Text-to-video, image-to-video, reference-to-video.
Technical specifications:
- Positive prompt: 1-1500 characters (supports English and Chinese).
- Negative prompt: 1-500 characters (optional).
- Frame images: Supports first frame via
inputs.frameImages(image-to-video only). - Reference videos: Supports up to 3 videos via
inputs.referenceVideos(reference-to-video only). - Audio input: Supports custom audio via
inputs.audio. - Supported dimensions:
- 720p: 1280×720 (16:9), 720×1280 (9:16), 960×960 (1:1), 1088×832 (17:13), 832×1088 (13:17).
- 1080p: 1920×1080 (16:9), 1080×1920 (9:16), 1440×1440 (1:1), 1632×1248 (17:13), 1248×1632 (13:17).
- Dimension behavior:
- Text-to-video: Specify explicit
widthandheightfrom the supported dimensions above. - Image-to-video: Two options available:
- Specify
widthandheightexplicitly for precise control. - Use
resolutionparameter (720por1080p) to automatically match the aspect ratio from the first frame image.
- Specify
- Text-to-video: Specify explicit
- Duration: 5, 10, or 15 seconds (default: 5).
- Input image requirements: 360-2000 pixels, 10MB file size limit.
- Reference video requirements: Maximum 30MB per video.
- Audio requirements: WAV/MP3, 3-30 seconds duration, 15MB file size limit.
Reference videos cannot be used together with frame images. Choose either image-to-video or reference-to-video workflow.
Provider-specific settings:
Parameters supported: promptExtend, audio, shotType.
{
"taskType": "videoInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d7",
"model": "alibaba:wan@2.6",
"positivePrompt": "A cinematic chase through a rain-soaked city, opening with a wide street shot, cutting to a close-up of footsteps splashing through puddles, followed by an overhead tracking shot",
"duration": 10,
"width": "1920",
"height": "1080",
"providerSettings": {
"alibaba": {
"promptExtend": true,
"audio": true,
"shotType": "multi"
}
}
}{
"taskType": "videoInference",
"taskUUID": "6ba7b833-9dad-11d1-80b4-00c04fd430c8",
"model": "alibaba:wan@2.6",
"inputs": {
"frameImages": ["c64351d5-4c59-42f7-95e1-eace013eddab"]
},
"positivePrompt": "The scene comes alive with gentle movement and atmospheric effects",
"duration": 5,
"resolution": "720p",
"providerSettings": {
"alibaba": {
"audio": true
}
}
}{
"taskType": "videoInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440015",
"model": "alibaba:wan@2.6",
"inputs": {
"referenceVideos": [
"c64351d5-4c59-42f7-95e1-eace013eddab",
"d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"
]
},
"positivePrompt": "character1 walks through a forest while character2 follows behind, maintaining their visual characteristics and movement styles",
"duration": 15,
"width": "1920",
"height": "1080",
"providerSettings": {
"alibaba": {
"promptExtend": true,
"audio": true,
"shotType": "single"
}
}
}{
"taskType": "videoInference",
"taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d489",
"model": "alibaba:wan@2.6",
"inputs": {
"audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
},
"positivePrompt": "A dramatic scene with synchronized visuals matching the provided audio track",
"duration": 10,
"width": "1920",
"height": "1080",
"providerSettings": {
"alibaba": {
"promptExtend": true
}
}
}