MODEL ID klingai:kling-video@o3-pro

live

Kling VIDEO O3 Pro

by Kling AI February 5, 2026

Kling VIDEO O3 Pro is a unified multimodal video model that generates HD clips from text or images with native audio output. It prioritizes detail, motion realism, and stable subject identity, and it supports reference-driven generation plus prompt-based video editing with strong temporal consistency.

API Options

Platform-level options for task execution and delivery.

taskType string required value: videoInference: Identifier for the type of task being performed

taskUUID string required UUID v4: UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.

outputType string default: URL: Video output type.

Allowed values 1 value

outputFormat string default: MP4

Specifies the file format of the generated output. The available values depend on the task type and the specific model's capabilities.

`MP4`: Widely supported video container (H.264), recommended for general use.
`WEBM`: Optimized for web delivery.
`MOV`: QuickTime format, common in professional workflows (Apple ecosystem).

Allowed values 3 values

outputQuality integer min: 20 max: 99 default: 95: Compression quality of the output. Higher values preserve quality but increase file size.

webhookURL string URI

Specifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.

Learn more 1 resource

Webhooks
PLATFORM

deliveryMethod string default: async

Determines how the API delivers task results.

Allowed values 1 value

: Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse. Required for long-running tasks like video generation.

Learn more 1 resource

Task Polling
PLATFORM

uploadEndpoint string URI

Specifies a URL where the generated content will be automatically uploaded using the HTTP PUT method. The raw binary data of the media file is sent directly as the request body. For secure uploads to cloud storage, use presigned URLs that include temporary authentication credentials.

Common use cases:

Cloud storage: Upload directly to S3 buckets, Google Cloud Storage, or Azure Blob Storage using presigned URLs.
CDN integration: Upload to content delivery networks for immediate distribution.

// S3 presigned URL for secure upload
https://your-bucket.s3.amazonaws.com/generated/content.mp4?X-Amz-Signature=abc123&X-Amz-Expires=3600

// Google Cloud Storage presigned URL
https://storage.googleapis.com/your-bucket/content.jpg?X-Goog-Signature=xyz789

// Custom storage endpoint
https://storage.example.com/uploads/generated-image.jpg

The content data will be sent as the request body to the specified URL when generation is complete.

safety object

Content safety checking configuration for video generation.

Properties 2 properties

safety » checkContent checkContent boolean default: false: Enable or disable content safety checking. When enabled, defaults to fast mode.

safety » mode mode string default: none

Safety checking mode for video generation.

Allowed values 3 values

: Disables checking.
: Checks key frames.
: Checks all frames.

ttl integer min: 60: Time-to-live (TTL) in seconds for generated content. Only applies when outputType is URL.

includeCost boolean default: false: Include task cost in the response.

numberResults integer min: 1 max: 20 default: 1: Number of results to generate. Each result uses a different seed, producing variations of the same parameters.

Inputs

Input resources for the task (images, audio, etc). These must be nested inside the inputs object.

inputs » referenceImages referenceImages array of strings min items: 1max items: 7: List of reference images (UUID, URL, Data URI, or Base64).

inputs » frameImages frameImages array of objects min items: 1max items: 2

An array of objects that define key frames to guide video generation. Each object specifies an input image and optionally its position within the video timeline.

The frameImages parameter allows you to constrain specific frames within the video sequence, ensuring that particular visual content appears at designated points. This is different from referenceImages, which provide overall visual guidance without constraining specific timeline positions.

When the frame parameter is omitted from objects, automatic distribution rules apply:

1 image: Used as the first frame.
2 images: First and last frames.

Examples 2 examples

Single frame (automatic positioning): When only one image is provided, it automatically becomes the first frame of the video.

"frameImages": [
  {
    "image": "aac49721-1964-481a-ae78-8a4e29b91402"
  }
]

First and last frames: With two images, they automatically become the first and last frames of the video sequence.

"frameImages": [
  {
    "image": "aac49721-1964-481a-ae78-8a4e29b91402",
    "frame": "first"
  },
  {
    "image": "3ad204c3-a9de-4963-8a1a-c3911e3afafe",
    "frame": "last"
  }
]

Properties 2 properties

inputs » frameImages » image image string required: Image input (UUID, URL, Data URI, or Base64).

inputs » frameImages » frame frame object

Target frame position for the image. Supports first and last frame.

Allowed values 4 values

: First frame of the video.
: Last frame of the video.
: Frame index 0 (first frame).
: Frame index -1 (last frame).

inputs » video video string: Source video for prompt-based editing. Dimensions and duration match the input video (UUID or URL).

inputs » referenceVideos referenceVideos array of strings items: 1: List of reference videos (UUID, URL).

inputs » elements elements array of objects min items: 1max items: 3

Elements allow you to include reusable assets (images, videos, or voices) in your video generation. Each element is identified by an id and can be referenced in the prompt using <<<element_1>>>, <<<element_2>>>, etc. in order of appearance.

An element can contain:

Images via frontalImage and optionally images (up to 3 additional angles).
Videos via videos (cannot be combined with images).
Voices via voices (can only be combined with images, not videos).

Examples 2 examples

Create a new element with an image:

"positivePrompt": "A video of <<<element_1>>> walking through a futuristic city",
"inputs": {
  "elements": [
    {
      "id": "my-character-id",
      "description": "A young woman with red hair",
      "frontalImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "tags": ["Character"]
    }
  ]
}

Reuse a previously created element by ID:

"positivePrompt": "A video of <<<element_1>>> sitting in a coffee shop, reading a book",
"inputs": {
  "elements": [
    {
      "id": "my-character-id"
    }
  ]
}

Properties 7 properties

inputs » elements » id id string required: Unique identifier for this element. Use to create a new element or reference a previously created one.

inputs » elements » description description string: Description of the element.

inputs » elements » frontalImage frontalImage string: Frontal reference image for the element. Required when using image-based elements.

inputs » elements » images images array of strings min items: 1max items: 3: Reference images for the element. Up to 3 images. Requires frontalImage.

inputs » elements » videos videos array of strings items: 1: Reference video for the element. Cannot be combined with images or voices.

inputs » elements » voices voices array of strings items: 1: Voice audio for the element. Can only be combined with images, not videos.

inputs » elements » tags tags array of strings min items: 1: Classification tags for the element.

Generation Parameters

Core parameters for controlling the generated content.

model string required value: klingai:kling-video@o3-pro

Identifier of the model to use for generation.

Learn more 3 resources

positivePrompt string required min: 2 max: 2500

Text prompt describing elements to include in the generated output.

Learn more 2 resources

width integer required* paired with height

Width of the generated media in pixels.

Learn more 2 resources

height integer required* paired with width

Height of the generated media in pixels.

Learn more 2 resources

duration integer min: 3 max: 15 step: 1 default: 5: Duration of the generation in seconds. Total frames = duration × fps.

Provider Settings

Parameters specific to this model provider. These must be nested inside the providerSettings.klingai object.

providerSettings » klingai » characterOrientation characterOrientation string

Source for character orientation reference.

Allowed values 2 values

: Match orientation from the reference image.
: Match orientation from the reference video.

providerSettings » klingai » keepOriginalSound keepOriginalSound boolean default: false: Maintain the original sound from the reference video.

providerSettings » klingai » multiPrompt multiPrompt array of objects max items: 6

Sequential prompt segments for multi-shot generation. The sum of all segment durations must equal the root-level duration.

Properties 2 properties

providerSettings » klingai » multiPrompt » prompt prompt string required min: 2 max: 2500: Text prompt describing the content for this segment.

providerSettings » klingai » multiPrompt » duration duration integer required min: 1: Duration in seconds for this segment.

providerSettings » klingai » sound sound boolean default: false: Enable native audio generation.

taskType string required value: videoInference: Type of the task.

taskUUID string required UUID v4: UUID of the task.

videoUUID string required UUID v4: UUID of the output video.

videoURL string URI: URL of the output video.

videoBase64Data string: Base64-encoded video data.

videoDataURI string URI: Data URI of the output video.

seed integer: The seed used for generation. If none was provided, shows the randomly generated seed.

NSFWContent boolean: Flag indicating if NSFW content was detected.

cost float: Task cost in USD. Present when includeCost is set to true in the request.

Text to Video

Windblown Tundra Signal Camp

Request

{
  "taskType": "videoInference",
  "taskUUID": "f92fd87a-7215-449a-8281-7be52ed8f447",
  "model": "klingai:kling-video@o3-pro",
  "positivePrompt": "A cinematic arctic field camp at blue-hour twilight on a vast frozen plain, a lone polar researcher in a crimson expedition parka stands beside a tripod radio mast and a small tracked snow vehicle, distant ice ridges and drifting ground snow, nylon flags snapping hard in the wind, frost vapor from breathing, the researcher adjusts a headset and aims a directional antenna toward the horizon, then a faint green aurora ribbon slowly appears overhead while the machine idles and indicator lights blink, ultra-detailed realistic textures, natural body movement, believable wind interaction, dramatic wide shot with a slow lateral camera glide, high motion realism, crisp atmospheric depth, immersive native environmental audio with wind, engine hum, fabric flapping, radio static",
  "width": 1920,
  "height": 1080,
  "duration": 8,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  }
}

Response

{
  "taskType": "videoInference",
  "taskUUID": "f92fd87a-7215-449a-8281-7be52ed8f447",
  "videoUUID": "0d284b5f-9714-4731-a03f-5dd1afeeb4f6",
  "videoURL": "https://vm.runware.ai/video/os/a19d05/ws/5/vi/0d284b5f-9714-4731-a03f-5dd1afeeb4f6.mp4",
  "cost": 1.12
}

Image to Video

Flooded Opera Rooftop Finale

Request

{
  "taskType": "videoInference",
  "taskUUID": "7068c964-9ad3-4d46-a286-70b83434122e",
  "model": "klingai:kling-video@o3-pro",
  "positivePrompt": "A cinematic rooftop dance sequence on a flooded art deco opera house high above a sprawling metropolis. Begin from the first frame image and animate the same dancer with precise, realistic footwork splashing through shallow water, fringe costume swaying naturally, reflections rippling across the rooftop. Camera starts low and glides in a smooth arc around her as she moves with urgency and grace. Gusts push puddles into swirling patterns, loose sheet music skims across the surface, neon reflections smear and reform, gulls wheel overhead, and the storm gradually clears. Transition toward the last frame image as warm sunrise light breaks through the clouds, emphasizing continuity of character, costume, architecture, and rooftop layout. Highly detailed cinematic realism, strong temporal consistency, expressive environment motion, believable water physics, and immersive city ambience with generated audio of splashes, distant traffic, gull calls, and soft rooftop wind.",
  "duration": 8,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  },
  "inputs": {
    "frameImages": [
      {
        "image": "https://assets.runware.ai/assets/inputs/2afbb41d-078a-4474-9d18-ec4c7f5b59a1.jpg",
        "frame": "first"
      },
      {
        "image": "https://assets.runware.ai/assets/inputs/a2e70976-126c-44fa-b064-30372326269a.jpg",
        "frame": "last"
      }
    ]
  }
}

Response

{
  "taskType": "videoInference",
  "taskUUID": "7068c964-9ad3-4d46-a286-70b83434122e",
  "videoUUID": "bd90c6b4-6700-466c-bda4-5f4582cc80dc",
  "videoURL": "https://vm.runware.ai/video/os/a03d21/ws/5/vi/bd90c6b4-6700-466c-bda4-5f4582cc80dc.mp4",
  "cost": 1.12
}

Image to Video

Subway Busker Metamorphosis Sequence

Request

{
  "taskType": "videoInference",
  "taskUUID": "bff47908-c37f-40da-af84-222f7cb38710",
  "model": "klingai:kling-video@o3-pro",
  "positivePrompt": "Transform the source video into a surreal retro-futurist subway performance tableau. Keep the same performer position, camera motion, pacing, and overall action rhythm from the original clip, but restyle the scene with polished brass tile walls, glowing route diagrams, drifting paper ticket confetti, and a gradually appearing audience of silent mannequins in mismatched formalwear. The musician's instrument becomes an unusual hybrid of saxophone and mechanical coral, emitting visible pulses of colored vapor that ripple through the station air. Reflections shimmer across the floor, overhead signage flickers with invented symbols, and passing trains become sleek chrome streaks. Maintain strong temporal consistency, realistic body motion, coherent shadows, and cinematic detail, with natural ambient station sound evolving into echoing experimental jazz textures.",
  "inputs": {
    "video": "https://assets.runware.ai/assets/inputs/64792608-38ea-49bd-9e90-4353bcfb1d45.mp4"
  }
}

Response

{
  "taskType": "videoInference",
  "taskUUID": "bff47908-c37f-40da-af84-222f7cb38710",
  "videoUUID": "811f9722-b43d-40dc-9223-8505c4e65c9c",
  "videoURL": "https://vm.runware.ai/video/os/a14d18/ws/5/vi/811f9722-b43d-40dc-9223-8505c4e65c9c.mp4",
  "cost": 1.008
}

Image to Video

Orbital Greenhouse Dawn Ballet

Request

{
  "taskType": "videoInference",
  "taskUUID": "c59a73fe-f81f-46eb-bf0b-4fc6054cdfea",
  "model": "klingai:kling-video@o3-pro",
  "positivePrompt": "A cinematic scene in a rotating orbital greenhouse ring at dawn. Begin from the supplied first frame and animate it into a serene, high-detail sequence: a young dancer slowly performs a weightless-inspired contemporary routine on a circular brass platform while the camera makes a gentle floating arc around her. Hanging vines sway subtly with the station rotation, tiny pollen particles drift through warm peach sunlight, reflections slide across curved glass, and Earth glides majestically beyond the panoramic windows. Preserve the subject's identity and costume, maintain elegant realism, rich environmental detail, soft atmospheric depth, smooth physically plausible motion, stable composition, and natural cinematic audio with faint mechanical hum, quiet footwork, fabric movement, and distant greenhouse ambience.",
  "duration": 8,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  },
  "inputs": {
    "frameImages": [
      {
        "image": "https://assets.runware.ai/assets/inputs/32d50dea-cd64-4ff0-b682-e600af2c94a7.jpg",
        "frame": "first"
      }
    ]
  }
}

Response

{
  "taskType": "videoInference",
  "taskUUID": "c59a73fe-f81f-46eb-bf0b-4fc6054cdfea",
  "videoUUID": "fd2c1f7b-1ae2-48f3-9708-36a53afb87b4",
  "videoURL": "https://vm.runware.ai/video/os/a13d12/ws/5/vi/fd2c1f7b-1ae2-48f3-9708-36a53afb87b4.mp4",
  "cost": 1.12
}

Text to Video

Subway Busker Time Shift

Request

{
  "taskType": "videoInference",
  "taskUUID": "0b06a81c-dd0b-435c-918a-5603e69d0a3d",
  "model": "klingai:kling-video@o3-pro",
  "width": 1920,
  "height": 1080,
  "duration": 9,
  "providerSettings": {
    "klingai": {
      "sound": true,
      "multiPrompt": [
        {
          "prompt": "A cinematic wide shot inside a tiled subway platform at rush hour, a young street drummer performing with improvised buckets and metal cans, commuters flowing past in waves, fluorescent reflections on polished floor, subtle handheld camera drift, realistic crowd motion, detailed clothing, echoing percussion and footsteps",
          "duration": 3
        },
        {
          "prompt": "The same subway platform and same drummer, camera pushes closer as the beat intensifies, a ring of commuters begins clapping in sync, one child spins in delight, a train rushes through the background without stopping, paper flyers flutter from the air displacement, highly realistic motion, crisp urban ambience and rhythmic drumming",
          "duration": 3
        },
        {
          "prompt": "The same drummer and same platform moments later, crowd now fully engaged, a few strangers dance while others record on phones, the performer finishes with a rapid finale and throws both sticks upward, camera tilts up slightly as the sticks spin and fall back into his hands, triumphant natural audio, rich detail, stable subject identity",
          "duration": 3
        }
      ]
    }
  }
}

Response

{
  "taskType": "videoInference",
  "taskUUID": "0b06a81c-dd0b-435c-918a-5603e69d0a3d",
  "videoUUID": "3832a366-b2db-4d9d-a9fa-86d2888828f5",
  "videoURL": "https://vm.runware.ai/video/os/a12d13/ws/5/vi/3832a366-b2db-4d9d-a9fa-86d2888828f5.mp4",
  "cost": 1.26
}

Notes

Kling O3 Pro supports multiple generation modes:

Text-to-Video: Provide a prompt with width, height, and duration.
Image-to-Video: Provide frame images via inputs.frameImages (up to 2, first/last). Dimensions are inherited from the image and cannot be set manually.
Video Editing: Provide a source video via inputs.video for prompt-based editing. Dimensions and duration match the input video.
Motion Control: Provide a reference video via inputs.referenceVideos (max 1). Can be combined with inputs.referenceImages for additional visual guidance.
Reference-to-Video: Provide reference images via inputs.referenceImages to guide the visual style. Can be combined with other modes.

Elements allows you to include reusable assets like characters, objects, or voices in your video by referencing them in the prompt (e.g. <<<element_1>>>). Elements can be used with Text-to-Video, Image-to-Video, Motion Control, and Reference-to-Video modes, but is not available when using Video Editing (inputs.video).

The combined total of reference images and elements must not exceed 7 items. When inputs.referenceVideos is provided, this limit drops to 4 items.

Parameter Dependencies

Dimensions

The following dimension combinations are supported:

Configuration	Dimensions
`1080p (16:9)`	`1920x1080`
`1080p (1:1)`	`1440x1440`
`1080p (9:16)`	`1080x1920`

Pricing

Pricing starts from $0.084/s without audio.

720p · 1s · (no input + no audio) $0.112

720p · 1s · (video input + no audio) $0.168

720p · 1s · (audio + no input) $0.14