Video Inference API
Introduction
Video inference enables video generation and transformation. This page is the complete API reference for video inference tasks. All workflows and operations use the single videoInference
task type, differentiated through parameter combinations.
Core operations
- Text-to-video: Generate videos from text descriptions.
- Image-to-video: Generate videos using images to guide content or constrain specific frames.
- Video-to-video: Transform existing videos based on prompts.
Advanced Features
- Style and control: Camera movements with cinematic lens effects, keyframe positioning.
- Content generation: Video extension capabilities, multi-shot storytelling with scene transitions.
- Visual effects: Effect templates and stylized filters.
- Identity and character: Character lip-sync, reference-based generation.
- Audio: Native audio generation with synchronized dialogue and effects.
Each feature includes detailed parameter documentation below.
Video generation uses asynchronous processing due to longer processing times. Setting "deliveryMethod": "async"
queues your task and returns an immediate acknowledgment. Use the getResponse
task to poll for status updates and retrieve the final video when processing completes.
Request
Our API always accepts an array of objects as input, where each object represents a specific task to be performed. The structure varies depending on the workflow and features used.
The following examples demonstrate how different parameter combinations create specific workflows.
{
"taskType": "videoInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d1",
"positivePrompt": "A cat playing with a ball of yarn, high quality, detailed",
"model": "klingai:5@3",
"duration": 10,
"width": 1920,
"height": 1080,
"seed": 42,
"numberResults": 1
}
{
"taskType": "videoInference",
"taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d59",
"positivePrompt": "smooth animation, natural movement, cinematic quality",
"model": "klingai:3@2",
"duration": 10,
"width": 1920,
"height": 1080,
"frameImages": [
{
"inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
"frame": "first"
},
{
"inputImage": "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f",
"frame": "last"
}
],
"numberResults": 1
}
-
taskType
string required -
The type of task to be performed. For this task, the value should be
videoInference
.
-
taskUUID
string required UUID v4 -
When a task is sent to the API you must include a random UUID v4 string using the
taskUUID
parameter. This string is used to match the async responses to their corresponding tasks.If you send multiple tasks at the same time, the
taskUUID
will help you match the responses to the correct tasks.The
taskUUID
must be unique for each task you send to the API.
-
outputType
"URL" Default: URL -
Specifies the output type in which the video is returned. Currently, only
URL
delivery is supported for video outputs.URL
: The video is returned as a URL string using thevideoURL
parameter in the response object.
-
outputFormat
"MP4" | "WEBM" Default: MP4 -
Specifies the format of the output video. Supported formats are:
MP4
andWEBM
.MP4
: MPEG-4 video format, widely compatible and recommended for most use cases.WEBM
: WebM video format, optimized for web delivery and smaller file sizes.
-
outputQuality
integer Min: 20 Max: 99 Default: 95 -
Sets the compression quality of the output video. Higher values preserve more quality but increase file size, lower values reduce file size but decrease quality.
-
deliveryMethod
"async" required -
Determines how the video generation results are delivered. Currently, video inference only supports asynchronous processing due to the computational intensity of video generation.
When set to
"async"
, the task is queued for background processing and you receive an immediate acknowledgment. Use the getResponse task to poll for status updates and retrieve the final video when processing completes.Asynchronous delivery is essential for video generation as processing times can range from several seconds to minutes depending on the complexity and length of the requested video.
-
uploadEndpoint
string -
This parameter allows you to specify a URL to which the generated video will be uploaded as binary video data using the HTTP PUT method. For example, an S3 bucket URL can be used as the upload endpoint.
When the video is ready, it will be uploaded to the specified URL.
-
includeCost
boolean Default: false -
If set to
true
, the cost to perform the task will be included in the response object.
-
positivePrompt
string required Min: 2 -
The text description that guides the video generation process. This prompt defines what you want to see in the video, including subject matter, visual style, actions, and atmosphere.
The model processes this text to understand the desired content and creates a video that matches your description. More detailed and specific prompts typically produce better results.
For optimal results, describe the motion, scene composition, and visual characteristics you want to see in the generated video.
-
negativePrompt
string -
Specifies what you want to avoid in the generated video. This parameter helps steer the generation away from undesired visual elements, styles, or characteristics.
Common negative prompts for video include terms like "blurry", "low quality", "distorted", "static", "flickering", or specific content you want to exclude.
-
frameImages
object[] -
An array of objects that define key frames to guide video generation. Each object specifies an input image and optionally its position within the video timeline.
The
frameImages
parameter allows you to constrain specific frames within the video sequence, ensuring that particular visual content appears at designated points. This is different fromreferenceImages
, which provide overall visual guidance without constraining specific timeline positions.When the
frame
parameter is omitted from frameImages objects, automatic distribution rules apply:- 1 image: Used as the first frame.
- 2 images: First and last frames.
- 3+ images: First and last frames, with intermediate images evenly spaced between.
View examples
Single frame (automatic positioning): When only one image is provided, it automatically becomes the first frame of the video.
{ "taskType": "videoInference", "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6", "positivePrompt": "a beautiful woman walking through a garden with flowers blooming", "model": "klingai:5@3", "duration": 5, "width": 1920, "height": 1080, "frameImages": [ { "inputImage": "aac49721-1964-481a-ae78-8a4e29b91402" } ] }
First and last frames: With two images, they automatically become the first and last frames of the video sequence.
{ "taskType": "videoInference", "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6", "positivePrompt": "cinematic shot of a cat playing with a ball of yarn, smooth motion", "model": "klingai:5@3", "duration": 5, "width": 1920, "height": 1080, "frameImages": [ { "inputImage": "aac49721-1964-481a-ae78-8a4e29b91402", "frame": "first" }, { "inputImage": "3ad204c3-a9de-4963-8a1a-c3911e3afafe", "frame": "last" } ] }
Mixed positioning: You can combine automatic distribution with explicit frame positioning using either numeric values or named positions.
{ "taskType": "videoInference", "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6", "positivePrompt": "time-lapse of clouds moving across a sunset sky, natural lighting", "model": "klingai:5@3", "duration": 5, "width": 1920, "height": 1080, "frameImages": [ { "inputImage": "aac49721-1964-481a-ae78-8a4e29b91402", "frame": 0 }, { "inputImage": "c00abf5f-6cdb-4642-a01d-1bfff7bc3cf7", "frame": 48 }, { "inputImage": "3ad204c3-a9de-4963-8a1a-c3911e3afafe", "frame": "last" } ] }
Array items 2 properties each
-
frameImages[i]
»inputImages
inputImages
string required -
Specifies the input image that will be used to constrain the video content at the specified frame position. The image can be specified in one of the following formats:
- An UUID v4 string of a previously uploaded image or a generated image.
- A data URI string representing the image. The data URI must be in the format
data:<mediaType>;base64,
followed by the base64-encoded image. For example:data:image/png;base64,iVBORw0KGgo...
. - A base64 encoded image without the data URI prefix. For example:
iVBORw0KGgo...
. - A URL pointing to the image. The image must be accessible publicly.
Supported formats are: PNG, JPG and WEBP.
-
frameImages[i]
»frame
frame
string | integer -
Specifies the position of this frame constraint within the video timeline.
Named positions:
"first"
: Places the image at the beginning of the video."last"
: Places the image at the end of the video.
Numeric positions:
0
: First frame (equivalent to "first").- Any positive integer: Specific frame number. Must be within the total frame count (
duration × fps
).
-
referenceImages
string[] -
An array containing reference images used to condition the generation process. These images provide visual guidance to help the model generate content that aligns with the style, composition, or characteristics of the reference materials.
Unlike
frameImages
which constrain specific timeline positions, reference images guide the general appearance that should appear consistently across the video.Reference images work in combination with your text prompt to provide both textual and visual guidance for the generation process.
Each image can be specified in one of the following formats:
- An UUID v4 string of a previously uploaded image or a generated image.
- A data URI string representing the image. The data URI must be in the format
data:<mediaType>;base64,
followed by the base64-encoded image. For example:data:image/png;base64,iVBORw0KGgo...
. - A base64 encoded image without the data URI prefix. For example:
iVBORw0KGgo...
. - A URL pointing to the image. The image must be accessible publicly.
Supported formats are: PNG, JPG and WEBP.
-
width
integer Min: 256 Max: 1920 -
The width of the generated video in pixels. Must be a multiple of 8 for compatibility with video encoding standards.
Higher resolutions produce more detailed videos but require significantly more processing time and computational resources. Consider your intended use case and quality requirements when selecting dimensions.
Work within your model's supported resolution range for optimal results. Some models may have specific aspect ratio recommendations.
-
height
integer Min: 256 Max: 1080 -
The height of the generated video in pixels. Must be a multiple of 8 for compatibility with video encoding standards.
Higher resolutions produce more detailed videos but require significantly more processing time and computational resources. Consider your intended use case and quality requirements when selecting dimensions.
Work within your model's supported resolution range for optimal results. Some models may have specific aspect ratio recommendations.
-
model
string required -
The AI model to use for video generation. Different models excel at different types of content, styles, and quality levels.
Models are identified by their AIR (Artificial Intelligence Resource) identifier in the format
provider:id@version
. Use the Model Search utility to discover available video models and their capabilities.Choose models based on your desired output quality and supported features like resolution or duration limits.
-
duration
float required Min: 1 Max: 10 -
The length of the generated video in seconds. This parameter directly affects the total number of frames produced based on the specified frame rate.
Total frames are calculated as
duration × fps
. For example, a 5-second video at 24 fps will contain 120 frames.Longer durations require significantly more processing time and computational resources. Consider your specific use case when choosing duration length.
-
fps
integer Min: 15 Max: 60 Default: 24 -
The frame rate (frames per second) of the generated video. Higher frame rates create smoother motion but require more processing time and result in larger file sizes.
Common frame rates:
- 24 fps: Standard cinematic frame rate, natural motion feel.
- 30 fps: Common for web video, smooth motion.
- 60 fps: High frame rate, very smooth motion for action content.
Note that using the same duration with higher frame rates creates smoother motion by generating more intermediate frames. The frame rate combines with duration to determine total frame count:
duration × fps = total frames
.
-
steps
integer Min: 10 Max: 50 -
The number of denoising steps the model performs during video generation. More steps typically result in higher quality output but require longer processing time.
Each step refines the entire sequence, improving temporal consistency and visual quality. Higher step counts are particularly important for achieving smooth motion and reducing visual artifacts.
Most video models work well with 20-40 steps. Values below 20 may produce lower quality results, while values above 40 provide diminishing returns for most use cases.
-
seed
integer Min: 1 Max: 9223372036854776000 Default: Random -
A seed is a value used to randomize the video generation. If you want to make videos reproducible (generate the same video multiple times), you can use the same seed value.
When requesting multiple videos with the same seed, the seed will be incremented by
1
(+1) for each video generated.
-
CFGScale
float Min: 0 Max: 50 -
Controls how closely the video generation follows your prompt. Higher values make the model adhere more strictly to your text description, while lower values allow more creative freedom.
CFGScale
affects both visual content and temporal consistency.Recommended range is 6.0-10.0 for most video models. Values above 12 may cause over-guidance artifacts or unnatural motion patterns.
-
numberResults
integer Min: 1 Max: 4 Default: 1 -
Specifies how many videos to generate for the given parameters. Each video will have the same parameters but different seeds, resulting in variations of the same concept.
If seed is set, it will be incremented by
1
(+1) for each video generated.
-
providerSettings
object -
Contains provider-specific configuration settings that customize the behavior of different AI models and services. Each provider has its own set of parameters that control various aspects of the generation process.
Currently supported providers:
google
: Settings for Veo 2 and Veo 3 models.bytedance
: Settings for ByteDance's Seedance models.minimax
: Settings for MiniMax 01, MiniMax 01 Director, MiniMax 01 Live and MiniMax Hailuo 02 models.klingai
: Settings for KlingAI 1.0, KlingAI 1.5, KlingAI 1.6, KlingAI 2.0 and KlingAI 2.1 model family.
The
providerSettings
parameter is an object that contains nested objects for each supported provider.View example
{ "taskType": "videoInference", "taskUUID": "991e641a-d2a8-4aa3-9883-9d6fe230fff8", "positivePrompt": "a beautiful landscape with mountains", "model": "google:3@0", "providerSettings": { "google": { "generateAudio": true } } }
Properties 4 properties
-
providerSettings[i]
»google
google
object -
Configuration settings specific to Google's video generation models (Veo 2 and Veo 3). These settings control various aspects of the generation process including prompt enhancement and audio generation capabilities.
View example
"providerSettings": { "google": { "enhancePrompt": true, "generateAudio": false } }
Properties 2 properties
-
providerSettings[i]
»google
»enhancePrompt
enhancePrompt
boolean Default: true -
Controls whether the input prompt is automatically enhanced and expanded to improve generation quality. When enabled, the system optimizes the prompt for better results by adding relevant details and context.
This setting cannot be disabled when using Veo 3 model, as prompt enhancement is always active. For Veo 2 model, this setting can be controlled and disabled if needed.
Enhanced prompts typically result in more detailed and higher-quality video generation by providing the model with richer context and clearer instructions.
When prompt enhancement is enabled, reproducibility is not guaranteed even when using the same seed value. The enhancement process may introduce variability that affects the deterministic nature of generation.
-
providerSettings[i]
»google
»generateAudio
generateAudio
boolean Default: false -
Controls whether the generated video includes audio content. When enabled, the system creates appropriate audio that matches the visual content and scene context within the video.
This feature is only available for Veo 3 model. Audio generation is not supported in Veo 2.
Generated audio can include ambient sounds, music, or other audio elements that enhance the video experience and provide a more immersive result.
-
-
providerSettings[i]
»bytedance
bytedance
object -
Configuration settings specific to ByteDance's video generation models. These settings control camera behavior and movement during video generation.
View example
"providerSettings": { "bytedance": { "cameraFixed": false } }
Properties 1 property
-
providerSettings[i]
»bytedance
»cameraFixed
cameraFixed
boolean Default: false -
Controls whether the camera remains stationary during video generation. When enabled, the camera position and angle are fixed, preventing any camera movement, panning, or zooming effects.
When disabled (default), the model can incorporate dynamic camera movements such as pans, tilts, zooms, or tracking shots to create more cinematic and engaging video content.
This setting is useful when you need static shots or want to avoid camera motion that might distract from the main subject or action in the video.
-
-
providerSettings[i]
»minimax
minimax
object -
Configuration settings specific to MiniMax's video generation models. These settings control prompt processing and optimization features.
View example
"providerSettings": { "miniMax": { "promptOptimizer": false } }
Properties 1 property
-
providerSettings[i]
»minimax
»promptOptimizer
promptOptimizer
boolean Default: false -
Controls whether the input prompt is automatically optimized and refined to improve generation quality. When enabled, the system analyzes and enhances the prompt by adding relevant details, improving clarity, and optimizing structure for better video generation results.
The prompt optimizer can help transform simple or basic prompts into more detailed and effective instructions, potentially leading to higher-quality video outputs with better adherence to the intended creative vision.
When disabled, the original prompt is used as-is without any modifications or enhancements.
When prompt enhancement is enabled, reproducibility is not guaranteed even when using the same seed value. The enhancement process may introduce variability that affects the deterministic nature of generation.
-
Response
Video inference operations require polling to retrieve results due to asynchronous processing. You'll need to use the getResponse
task to check status and retrieve the final video.
When you submit a video task, you receive immediate confirmation that your request was accepted and processing has started, or an error response if validation fails.
{
"data": [
{
"taskType": "videoInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d1"
}
]
}
{
"errors": [
{
"code": "unsupportedDuration",
"message": "Invalid value for duration parameter. This duration is not supported by the model architecture.",
"parameter": "duration",
"type": "float",
"documentation": "https://runware.ai/docs/en/video-inference/video-reference#request-duration",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d1",
"allowedValues": [6]
}
]
}
To retrieve the actual video results, use the getResponse
task with the returned taskUUID
. The response format depends on the current processing status:
{
"data": [
{
"taskType": "videoInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d1",
"status": "pending",
}
]
}
{
"data": [
{
"taskType": "videoInference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d1",
"status": "success",
"videoUUID": "b7db282d-2943-4f12-992f-77df3ad3ec71",
"videoURL": "https://im.runware.ai/video/ws/0.5/vi/b7db282d-2943-4f12-992f-77df3ad3ec71.mp4",
"cost": 0.18
}
]
}
{
"errors": [
{
"code": "timeoutProvider",
"status": "error",
"message": "The external provider did not respond within the timeout window. The request was automatically terminated.",
"documentation": "https://runware.ai/docs/en/video-inference/api-reference",
"taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d1"
}
]
}
-
taskType
string -
The API will return the
taskType
you sent in the request. In this case, it will bevideoInference
. This helps match the responses to the correct task type.
-
taskUUID
string UUID v4 -
The API will return the
taskUUID
you sent in the request. This way you can match the responses to the correct request tasks.
-
videoUUID
string UUID v4 -
A unique identifier for the generated video. This UUID can be used to reference the video in subsequent operations or for tracking purposes.
The
videoUUID
is different from thetaskUUID
. WhiletaskUUID
identifies the generation request,videoUUID
identifies the specific video output.
-
videoURL
string -
If
outputType
is set toURL
, this parameter contains the URL of the video to be downloaded.
-
seed
integer -
The seed value that was used to generate this video. This value can be used to reproduce the same video when using identical parameters in another request.
-
cost
float -
if
includeCost
is set totrue
, the response will include acost
field for each task object. This field indicates the cost of the request in USD.