MODEL ID klingai:kling-video@3-pro

live

Kling VIDEO 3.0 Pro

by Kling AI February 5, 2026

Kling VIDEO 3.0 Pro is a unified multimodal video model that generates high-quality video with synchronized audio from text or images. It supports reference-guided generation, prompt-based editing, fine control over motion and pacing, and stable temporal coherence for cinematic and narrative clips. Native audio output includes dialogue, ambient sound, and effects aligned to the visuals.

API Options

Platform-level options for task execution and delivery.

taskType string required value: videoInference: Identifier for the type of task being performed

taskUUID string required UUID v4: UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.

outputType string default: URL: Video output type.

Allowed values 1 value

outputFormat string default: MP4

Specifies the file format of the generated output. The available values depend on the task type and the specific model's capabilities.

`MP4`: Widely supported video container (H.264), recommended for general use.
`WEBM`: Optimized for web delivery.
`MOV`: QuickTime format, common in professional workflows (Apple ecosystem).

Allowed values 3 values

outputQuality integer min: 20 max: 99 default: 95: Compression quality of the output. Higher values preserve quality but increase file size.

webhookURL string URI

Specifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.

Learn more 1 resource

Webhooks
PLATFORM

deliveryMethod string default: async

Determines how the API delivers task results.

Allowed values 1 value

: Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse. Required for long-running tasks like video generation.

Learn more 1 resource

Task Polling
PLATFORM

uploadEndpoint string URI

Specifies a URL where the generated content will be automatically uploaded using the HTTP PUT method. The raw binary data of the media file is sent directly as the request body. For secure uploads to cloud storage, use presigned URLs that include temporary authentication credentials.

Common use cases:

Cloud storage: Upload directly to S3 buckets, Google Cloud Storage, or Azure Blob Storage using presigned URLs.
CDN integration: Upload to content delivery networks for immediate distribution.

// S3 presigned URL for secure upload
https://your-bucket.s3.amazonaws.com/generated/content.mp4?X-Amz-Signature=abc123&X-Amz-Expires=3600

// Google Cloud Storage presigned URL
https://storage.googleapis.com/your-bucket/content.jpg?X-Goog-Signature=xyz789

// Custom storage endpoint
https://storage.example.com/uploads/generated-image.jpg

The content data will be sent as the request body to the specified URL when generation is complete.

safety object

Content safety checking configuration for video generation.

Properties 2 properties

safety » checkContent checkContent boolean default: false: Enable or disable content safety checking. When enabled, defaults to fast mode.

safety » mode mode string default: none

Safety checking mode for video generation.

Allowed values 3 values

: Disables checking.
: Checks key frames.
: Checks all frames.

ttl integer min: 60: Time-to-live (TTL) in seconds for generated content. Only applies when outputType is URL.

includeCost boolean default: false: Include task cost in the response.

numberResults integer min: 1 max: 20 default: 1: Number of results to generate. Each result uses a different seed, producing variations of the same parameters.

Inputs

Input resources for the task (images, audio, etc). These must be nested inside the inputs object.

inputs » referenceImages referenceImages array of strings items: 1: List of reference images (UUID, URL, Data URI, or Base64).

inputs » frameImages frameImages array of objects min items: 1max items: 2

An array of objects that define key frames to guide video generation. Each object specifies an input image and optionally its position within the video timeline.

The frameImages parameter allows you to constrain specific frames within the video sequence, ensuring that particular visual content appears at designated points. This is different from referenceImages, which provide overall visual guidance without constraining specific timeline positions.

When the frame parameter is omitted from objects, automatic distribution rules apply:

1 image: Used as the first frame.
2 images: First and last frames.

Examples 2 examples

Single frame (automatic positioning): When only one image is provided, it automatically becomes the first frame of the video.

"frameImages": [
  {
    "image": "aac49721-1964-481a-ae78-8a4e29b91402"
  }
]

First and last frames: With two images, they automatically become the first and last frames of the video sequence.

"frameImages": [
  {
    "image": "aac49721-1964-481a-ae78-8a4e29b91402",
    "frame": "first"
  },
  {
    "image": "3ad204c3-a9de-4963-8a1a-c3911e3afafe",
    "frame": "last"
  }
]

Properties 2 properties

inputs » frameImages » image image string required: Image input (UUID, URL, Data URI, or Base64).

inputs » frameImages » frame frame object

Target frame position for the image. Supports first and last frame.

Allowed values 4 values

: First frame of the video.
: Last frame of the video.
: Frame index 0 (first frame).
: Frame index -1 (last frame).

inputs » referenceVideos referenceVideos array of strings items: 1: List of reference videos (UUID, URL).

inputs » elements elements array of objects min items: 1max items: 3

Elements allow you to include reusable assets (images, videos, or voices) in your video generation. Each element is identified by an id and can be referenced in the prompt using <<<element_1>>>, <<<element_2>>>, etc. in order of appearance.

An element can contain:

Images via frontalImage and optionally images (up to 3 additional angles).
Videos via videos (cannot be combined with images).
Voices via voices (can only be combined with images, not videos).

Examples 2 examples

Create a new element with an image:

"positivePrompt": "A video of <<<element_1>>> walking through a futuristic city",
"inputs": {
  "elements": [
    {
      "id": "my-character-id",
      "description": "A young woman with red hair",
      "frontalImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "tags": ["Character"]
    }
  ]
}

Reuse a previously created element by ID:

"positivePrompt": "A video of <<<element_1>>> sitting in a coffee shop, reading a book",
"inputs": {
  "elements": [
    {
      "id": "my-character-id"
    }
  ]
}

Properties 7 properties

inputs » elements » id id string required: Unique identifier for this element. Use to create a new element or reference a previously created one.

inputs » elements » description description string: Description of the element.

inputs » elements » frontalImage frontalImage string: Frontal reference image for the element. Required when using image-based elements.

inputs » elements » images images array of strings min items: 1max items: 3: Reference images for the element. Up to 3 images. Requires frontalImage.

inputs » elements » videos videos array of strings items: 1: Reference video for the element. Cannot be combined with images or voices.

inputs » elements » voices voices array of strings items: 1: Voice audio for the element. Can only be combined with images, not videos.

inputs » elements » tags tags array of strings min items: 1: Classification tags for the element.

Generation Parameters

Core parameters for controlling the generated content.

model string required value: klingai:kling-video@3-pro

Identifier of the model to use for generation.

Learn more 3 resources

positivePrompt string required min: 2 max: 2500

Text prompt describing elements to include in the generated output.

Learn more 2 resources

negativePrompt string min: 2 max: 2500

Prompt to guide what to exclude from generation. Ignored when guidance is disabled (CFGScale ≤ 1).

Learn more 1 resource

Text To Image: Prompts Guiding The Generation
GUIDE

width integer required* paired with height

Width of the generated media in pixels.

Learn more 2 resources

height integer required* paired with width

Height of the generated media in pixels.

Learn more 2 resources

duration integer min: 3 max: 15 step: 1 default: 5: Duration of the generation in seconds. Total frames = duration × fps.

Provider Settings

Parameters specific to this model provider. These must be nested inside the providerSettings.klingai object.

providerSettings » klingai » characterOrientation characterOrientation string

Source for character orientation reference.

Allowed values 2 values

: Match orientation from the reference image.
: Match orientation from the reference video.

providerSettings » klingai » keepOriginalSound keepOriginalSound boolean default: false: Maintain the original sound from the reference video.

taskType string required value: videoInference: Type of the task.

taskUUID string required UUID v4: UUID of the task.

videoUUID string required UUID v4: UUID of the output video.

videoURL string URI: URL of the output video.

videoBase64Data string: Base64-encoded video data.

videoDataURI string URI: Data URI of the output video.

seed integer: The seed used for generation. If none was provided, shows the randomly generated seed.

NSFWContent boolean: Flag indicating if NSFW content was detected.

cost float: Task cost in USD. Present when includeCost is set to true in the request.

Text to Video

Volcanic Glass Violin Recital

Request

{
  "taskType": "videoInference",
  "taskUUID": "08885c1c-3794-4bb7-adab-b8df47f93c80",
  "model": "klingai:kling-video@3-pro",
  "positivePrompt": "A cinematic wide shot on a black volcanic shoreline at blue hour: a solitary violinist in a tailored copper-and-charcoal coat performs on a circular platform of dark glass while shallow waves slide over reflective obsidian sand. In the distance, slow lava veins glow through cracked rock formations, sending faint orange light into sea mist. The camera begins with a low tracking move around the performer, then eases into a gentle push-in as the bowing becomes more intense. Hair, coat hems, and drifting steam respond naturally to ocean gusts. Rich synchronized audio: expressive solo violin melody, soft surf, distant seabird cries, occasional hiss of hot stone meeting water, subtle foot movement on wet glass. Realistic body mechanics, detailed hands and bow contact, nuanced facial focus, high dynamic range lighting, elegant lens bloom on highlights, immersive atmosphere, polished cinematic color grading, stable motion, coherent reflections, no abrupt cuts.",
  "negativePrompt": "low detail, blurry hands, extra limbs, warped violin, duplicated person, jittery motion, flicker, broken reflections, overexposed highlights, text, watermark, logo, subtitle, frame artifacts, camera shake, cartoonish anatomy",
  "width": 1920,
  "height": 1080,
  "duration": 10
}

Response

{
  "taskType": "videoInference",
  "taskUUID": "08885c1c-3794-4bb7-adab-b8df47f93c80",
  "videoUUID": "baf38118-1b77-4a2b-9ec1-78a697bd135f",
  "videoURL": "https://vm.runware.ai/video/os/a18d05/ws/5/vi/baf38118-1b77-4a2b-9ec1-78a697bd135f.mp4",
  "cost": 1.12
}

Text to Video

Amber Observatory Ice Plain

Request

{
  "taskType": "videoInference",
  "taskUUID": "70b56006-db71-49e5-a2a1-acf3b771a505",
  "model": "klingai:kling-video@3-pro",
  "positivePrompt": "A cinematic wide shot of a remote polar observatory built on a vast cracked ice plain under a copper-gold sky. Massive parabolic antenna dishes slowly rotate while tiny maintenance drones skim over the surface leaving faint blue guide lights. In the foreground, a lone researcher in a reflective thermal suit walks toward the main dome, dragging a compact sled loaded with instruments. The camera begins low near textured ice, then glides forward and rises into a gentle sweeping reveal of the station and horizon. Far away, curtains of charged particles ripple across the sky in unusual geometric bands. Snow dust spirals lightly around metal structures, warning beacons pulse softly, and the observatory emits layered mechanical ambience. Native audio: distant antenna motors, crisp footsteps on ice, sled runners scraping, radio static bursts, low electrical hum, occasional wind gusts, and one short calm spoken line over headset: \"Signal lock confirmed.\" Ultra-detailed, realistic lighting, cinematic pacing, atmospheric depth, stable motion, coherent subject continuity, high-end science fiction grounded in physical realism.",
  "negativePrompt": "cartoon, low resolution, blurry, jittery motion, flicker, duplicated objects, warped anatomy, extra limbs, distorted face, unstable camera, oversaturated colors, text, watermark, logo, frame glitches, abrupt cuts, chaotic action",
  "width": 1920,
  "height": 1080,
  "duration": 10
}

Response

{
  "taskType": "videoInference",
  "taskUUID": "70b56006-db71-49e5-a2a1-acf3b771a505",
  "videoUUID": "673e3e9c-26a2-4259-8fc3-03b1a783fb72",
  "videoURL": "https://vm.runware.ai/video/os/a17d13/ws/5/vi/673e3e9c-26a2-4259-8fc3-03b1a783fb72.mp4",
  "cost": 1.12
}

Image to Video

Lantern Regatta at Daybreak

Request

{
  "taskType": "videoInference",
  "taskUUID": "b47e5dd4-5c7d-4c61-9341-071e00f3b06a",
  "model": "klingai:kling-video@3-pro",
  "positivePrompt": "Using the supplied first-frame image as the opening shot, create a cinematic video of a dawn river regatta. The camera begins with a calm wide composition, then slowly glides forward over the water as the lantern boats drift apart in elegant patterns. The violinist at the dock lifts the bow and begins to play softly. Nearby lanterns bob and rotate, tiny reflections trembling across the river surface. A few birds cross the brightening sky, reeds sway gently at the shoreline, and thin morning mist gradually thins as sunlight warms the scene. Maintain the character design and overall composition from the reference image while adding subtle, believable motion and rich environmental detail. Include synchronized natural audio: quiet river water, wood creaks from the dock, distant birdsong, soft fabric rustle, and a delicate solo violin melody that feels live and intimate.",
  "negativePrompt": "low quality, flicker, warped anatomy, duplicate people, extra limbs, distorted hands, abrupt camera shake, oversaturated colors, text, watermark, logo, heavy motion blur, noisy audio, robotic music, harsh cuts",
  "duration": 8,
  "inputs": {
    "frameImages": [
      {
        "image": "https://assets.runware.ai/assets/inputs/5e3433ef-b14e-4c65-8d25-54ecf737c589.jpg",
        "frame": "first"
      }
    ]
  }
}

Response

{
  "taskType": "videoInference",
  "taskUUID": "b47e5dd4-5c7d-4c61-9341-071e00f3b06a",
  "videoUUID": "b35955ce-800a-4a73-885c-a25bdff89cd4",
  "videoURL": "https://vm.runware.ai/video/os/a04d20/ws/5/vi/b35955ce-800a-4a73-885c-a25bdff89cd4.mp4",
  "cost": 0.896
}

Image to Video

Copper Aviary Dawn Tableau

Request

{
  "taskType": "videoInference",
  "taskUUID": "b8f22ea1-2020-4969-8d7c-caca97c55373",
  "model": "klingai:kling-video@3-pro",
  "positivePrompt": "Create a cinematic video that begins from the first guided frame and evolves naturally toward the last guided frame. The scene takes place in a grand glass aviary at sunrise, with elegant handcrafted mechanical birds gradually waking, tilting their heads, fluttering open articulated wings, hopping from perch to perch, then lifting into coordinated spirals above a calm caretaker in a teal coat. Preserve the architecture, lighting direction, and subject continuity between the guided frames. Use a slow opening with subtle ambient movement, then build into graceful layered flight with rich depth, drifting dust, soft lens bloom, realistic metal reflections, gentle cloth movement, and synchronized naturalistic sound design: creaking perches, light wing whirs, faint gear clicks, echoing flutter, glass resonance, and warm morning air. The pacing should feel lyrical and immersive, with smooth camera drift and strong temporal coherence.",
  "negativePrompt": "low detail, broken anatomy, extra limbs, warped birds, duplicated subjects, flicker, abrupt cuts, inconsistent architecture, oversaturated colors, text, watermark, logo, blurry caretaker, chaotic camera shake, horror tone, modern electronics, urban skyline",
  "duration": 8,
  "inputs": {
    "frameImages": [
      {
        "image": "https://assets.runware.ai/assets/inputs/5ce13305-faae-42ef-9a34-40150a2e3ae8.jpg",
        "frame": "first"
      },
      {
        "image": "https://assets.runware.ai/assets/inputs/49f83de5-ff24-442e-8e70-85ba7ed09e5a.jpg",
        "frame": "last"
      }
    ]
  }
}

Response

{
  "taskType": "videoInference",
  "taskUUID": "b8f22ea1-2020-4969-8d7c-caca97c55373",
  "videoUUID": "06452bee-c048-46c5-9be9-8ca67f161069",
  "videoURL": "https://vm.runware.ai/video/os/a04d20/ws/5/vi/06452bee-c048-46c5-9be9-8ca67f161069.mp4",
  "cost": 0.896
}

Notes

Kling 3.0 Pro supports multiple generation modes:

Text-to-Video: Provide a prompt with width, height, and duration.
Image-to-Video: Provide frame images via inputs.frameImages (up to 2, first/last). Dimensions are inherited from the image and cannot be set manually.
Motion Control: Provide a reference video via inputs.referenceVideos (max 1). Optionally provide a reference image via inputs.referenceImages (max 1).

Elements allows you to include reusable assets like characters, objects, or voices in your video by referencing them in the prompt (e.g. <<<element_1>>>). On Kling 3.0, Elements is only available in Image-to-Video mode.

Parameter Dependencies

Dimensions

The following dimension combinations are supported:

Configuration	Dimensions
`1080p (16:9)`	`1920x1080`
`1080p (1:1)`	`1440x1440`
`1080p (9:16)`	`1080x1920`

Pricing

Pricing starts at $0.112/s without audio & $0.168/s with audio.

1080p · 1s · (no audio) $0.112

1080p · 1s · (audio) $0.168