KlingAI provider

Access KlingAI's AI models including multiple generations of Standard, Pro, and Master tiers for video generation through Runware's unified API. Learn about KlingAI-specific parameters, limitations, and technical specifications.

Introduction

KlingAI's AI models are integrated into the Runware platform through our unified API, providing access to advanced video generation across multiple model generations and quality tiers.

KlingAI offers a comprehensive range of models from cost-effective Standard versions to high-end Master tiers, each optimized for different use cases and quality requirements. This page documents the technical specifications, parameter requirements, and model capabilities for all KlingAI models available through our platform.

Image models

Kling IMAGE O1

Kling IMAGE O1 is a high-control image model built for consistent character handling, precise modification, and strong stylization. It interprets inputs with high accuracy and supports detailed edits without structural drift, enabling stable creative workflows across complex compositions.

Model AIR ID: klingai:kling-image@o1.

Supported workflows: Text-to-image, image-to-image.

Technical specifications:

  • Positive prompt: 3-2500 characters.
  • Reference images: Supports up to 10 images via inputs.referenceImages.
  • Supported dimensions:
    • 1K: 1024×1024 (1:1), 1248×832 (3:2), 832×1248 (2:3), 1168×880 (4:3), 880×1168 (3:4), 768×1360 (9:16), 1360×768 (16:9), 1552×656 (21:9).
    • 2K: 2048×2048 (1:1), 2496×1664 (3:2), 1664×2496 (2:3), 2336×1760 (4:3), 1760×2336 (3:4), 1536×2720 (9:16), 2720×1536 (16:9), 3104×1312 (21:9).
  • Dimension behavior:
    • Text-to-image: Requires explicit width and height from the supported dimensions above.
    • Image-to-image: Two options available:
      • Specify width and height explicitly for precise control over output dimensions.
      • Omit width and height to automatically match the aspect ratio from the first reference image, then use the resolution parameter to control the output resolution tier (1k or 2k).
{
  "taskType": "imageInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d492",
  "model": "klingai:kling-image@o1",
  "positivePrompt": "A detailed character portrait with consistent features and strong stylization",
  "width": 1024,
  "height": 1024
}
{
  "taskType": "imageInference",
  "taskUUID": "6ba7b829-9dad-11d1-80b4-00c04fd430c8",
  "model": "klingai:kling-image@o1",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab"
    ]
  },
  "positivePrompt": "Enhance the character with more detailed clothing and refined features",
  "width": 2048,
  "height": 2048
}
{
  "taskType": "imageInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440012",
  "model": "klingai:kling-image@o1",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab"
    ]
  },
  "positivePrompt": "Apply artistic stylization while maintaining character consistency",
  "resolution": "2k"
}
{
  "taskType": "imageInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da8",
  "model": "klingai:kling-image@o1",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab",
      "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f",
      "e8f9a0b1-3c6d-5e8f-b2e4-0d9c8b7a6e5e"
    ]
  },
  "positivePrompt": "Blend these character references into a unified consistent design",
  "width": 2720,
  "height": 1536
}

Kling IMAGE 3.0

Kling IMAGE 3.0 targets professional-grade outputs with native 2K resolution and improved realism through enhanced texture, lighting, and material handling. This model supports practical image-to-image editing workflows for iterative refinement of subjects and layouts while maintaining consistency across edits.

Model AIR ID: klingai:kling-image@3.

Supported workflows: Text-to-image, image-to-image.

Technical specifications:

  • Positive prompt: 1-2500 characters.
  • Reference images: Supports 1 image via inputs.referenceImages.
  • Supported dimensions:
    • 1K: 1024×1024 (1:1), 1248×832 (3:2), 832×1248 (2:3), 1168×880 (4:3), 880×1168 (3:4), 768×1360 (9:16), 1360×768 (16:9), 1552×656 (21:9).
    • 2K: 2048×2048 (1:1), 2496×1664 (3:2), 1664×2496 (2:3), 2336×1760 (4:3), 1760×2336 (3:4), 1536×2720 (9:16), 2720×1536 (16:9), 3104×1312 (21:9).
{
  "taskType": "imageInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d493",
  "model": "klingai:kling-image@3",
  "positivePrompt": "Professional product photography with realistic materials and studio lighting at 2K resolution",
  "width": 2048,
  "height": 2048
}
{
  "taskType": "imageInference",
  "taskUUID": "6ba7b833-9dad-11d1-80b4-00c04fd430c8",
  "model": "klingai:kling-image@3",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab"
    ]
  },
  "positivePrompt": "Enhance texture detail and refine lighting while maintaining composition",
  "width": 2720,
  "height": 1536
}

Kling IMAGE O3

Kling IMAGE O3 is an Omni image model built for high-fidelity generation at up to 4K resolution with strong multi-reference consistency. This model supports series image generation for coherent variations and optional face-focused element control to maintain identity stability across outputs.

Model AIR ID: klingai:kling-image@o3.

Supported workflows: Text-to-image, image-to-image.

Technical specifications:

  • Positive prompt: 1-2500 characters.
  • Reference images: Supports up to 10 images via inputs.referenceImages.
  • Supported dimensions:
    • 1K: 1024×1024 (1:1), 1248×832 (3:2), 832×1248 (2:3), 1168×880 (4:3), 880×1168 (3:4), 768×1360 (9:16), 1360×768 (16:9), 1552×656 (21:9).
    • 2K: 2048×2048 (1:1), 2496×1664 (3:2), 1664×2496 (2:3), 2336×1760 (4:3), 1760×2336 (3:4), 1536×2720 (9:16), 2720×1536 (16:9), 3104×1312 (21:9).
    • 4K: 4096×4096 (1:1), 4992×3328 (3:2), 3328×4992 (2:3), 4736×3520 (4:3), 3520×4736 (3:4), 3072×5440 (9:16), 5440×3072 (16:9), 6272×2688 (21:9).
{
  "taskType": "imageInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d494",
  "model": "klingai:kling-image@o3",
  "positivePrompt": "Highly detailed character portrait with consistent facial features at 4K resolution",
  "width": 4096,
  "height": 4096
}
{
  "taskType": "imageInference",
  "taskUUID": "6ba7b834-9dad-11d1-80b4-00c04fd430c8",
  "model": "klingai:kling-image@o3",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab",
      "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f",
      "e8f9a0b1-3c6d-5e8f-b2e4-0d9c8b7a6e5e"
    ]
  },
  "positivePrompt": "Synthesize character identity from multiple reference angles with stable facial features",
  "width": 2048,
  "height": 2048
}
{
  "taskType": "imageInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440014",
  "model": "klingai:kling-image@o3",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab"
    ]
  },
  "positivePrompt": "Create coherent variations maintaining character identity across different poses",
  "width": 5440,
  "height": 3072
}

Video models

KlingAI 1.0 Standard

KlingAI's 1.0 Standard model provides cost-effective video generation with flexible controls and cinematic motion, ideal for simple scenes and single character focus.

Model AIR ID: klingai:1@1.

Supported workflows: Text-to-video, image-to-video.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Negative prompt: 2-2500 characters (optional).
  • CFG Scale: 0-1 (default: 0.5).
  • Supported dimensions: 1280×720 (16:9), 720×720 (1:1), 720×1280 (9:16).
  • Frame rate: 30 FPS.
  • Duration: 5 or 10 seconds.
  • Frame images: Supports first frame for frameImages.
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.
{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d1",
  "model": "klingai:1@1",
  "positivePrompt": "A person walking through a peaceful garden with soft sunlight filtering through trees",
  "duration": 5,
  "width": 1280,
  "height": 720
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d59",
  "model": "klingai:1@1",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "duration": 10,
  "width": 720,
  "height": 720
}

KlingAI 1.0 Pro

KlingAI's 1.0 Pro model builds on the Standard version with higher fidelity, better prompt adherence, and enhanced stability for complex scenes.

Model AIR ID: klingai:1@2.

Supported workflows: Text-to-video, image-to-video.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Negative prompt: 2-2500 characters (optional).
  • CFG Scale: 0-1 (default: 0.5).
  • Supported dimensions: 1280×720 (16:9), 720×720 (1:1), 720×1280 (9:16).
  • Frame rate: 30 FPS.
  • Duration: 5 or 10 seconds.
  • Frame images: Supports first frame for frameImages.
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.
{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d1",
  "model": "klingai:1@2",
  "positivePrompt": "A complex scene with multiple characters interacting in a bustling marketplace",
  "negativePrompt": "blurry, low quality",
  "duration": 10,
  "width": 720,
  "height": 1280
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d59",
  "model": "klingai:1@2",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "duration": 5,
  "width": 1280,
  "height": 720
}

KlingAI 1.5 Standard

KlingAI's 1.5 Standard model offers upgraded visuals with crisper output and fewer artifacts compared to 1.0, providing a good balance of quality and cost.

Model AIR ID: klingai:2@1.

Supported workflows: Image-to-video.

Technical specifications:

  • CFG Scale: 0-1 (default: 0.5).
  • Supported dimensions: 1280×720 (16:9), 720×720 (1:1), 720×1280 (9:16).
  • Frame rate: 30 FPS.
  • Duration: 5 or 10 seconds.
  • Frame images: Supports first frame for frameImages.
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.

KlingAI 1.5 Standard supports image-to-video generation only and does not support text-to-video workflows.

{
  "taskType": "videoInference",
  "taskUUID": "f3a2b8c9-1e47-4d3a-9b2f-8c7e6d5a4b3c",
  "model": "klingai:2@1",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "duration": 5,
  "width": 1280,
  "height": 720
}

KlingAI 1.5 Pro

KlingAI's 1.5 Pro model unlocks the full potential of version 1.5 with higher resolution support up to 1080p and enhanced motion controls.

Model AIR ID: klingai:2@2.

Supported workflows: Image-to-video.

Technical specifications:

  • CFG Scale: 0-1 (default: 0.5).
  • Supported dimensions: 1920×1080 (16:9), 1080×1080 (1:1), 1080×1920 (9:16).
  • Frame rate: 30 FPS.
  • Duration: 5 or 10 seconds.
  • Frame images: Supports first frame for frameImages.
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.

KlingAI 1.5 Pro supports image-to-video generation only and does not support text-to-video workflows.

{
  "taskType": "videoInference",
  "taskUUID": "f3a2b8c9-1e47-4d3a-9b2f-8c7e6d5a4b3c",
  "model": "klingai:2@2",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "duration": 10,
  "width": 1920,
  "height": 1080
}

KlingAI 1.6 Standard

KlingAI's 1.6 Standard model provides incremental improvements in motion smoothness and prompt handling over version 1.5.

Model AIR ID: klingai:3@1.

Supported workflows: Text-to-video, image-to-video.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Negative prompt: 2-2500 characters (optional).
  • CFG Scale: 0-1 (default: 0.5).
  • Supported dimensions: 1280×720 (16:9), 720×720 (1:1), 720×1280 (9:16).
  • Frame rate: 30 FPS (text-to-video), 24 FPS (image-to-video).
  • Duration: 5 or 10 seconds.
  • Frame images: Supports first frame for frameImages.
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.
{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d1",
  "model": "klingai:3@1",
  "positivePrompt": "Smooth camera movement following a cyclist through a scenic mountain trail",
  "duration": 5,
  "width": 720,
  "height": 720
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d59",
  "model": "klingai:3@1",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "duration": 10,
  "width": 1280,
  "height": 720
}

KlingAI 1.6 Pro

KlingAI's 1.6 Pro model elevates version 1.6 with enhanced visual quality, Full HD 1080p resolution support, and refined motion smoothness. This professional-tier model delivers cinematic output with improved detail rendering and stronger prompt adherence for high-quality video production workflows.

Model AIR ID: klingai:3@2.

Supported workflows: Text-to-video, image-to-video.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Negative prompt: 2-2500 characters (optional).
  • CFG Scale: 0-1 (default: 0.5).
  • Supported dimensions: 1920×1080 (16:9), 1080×1080 (1:1), 1080×1920 (9:16).
  • Frame rate: 24 FPS.
  • Duration: 5 or 10 seconds.
  • Frame images: Supports first frame for frameImages.
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.
{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d7",
  "model": "klingai:3@2",
  "positivePrompt": "Cinematic drone shot flying over misty mountain peaks during golden hour with dramatic lighting and smooth camera movement",
  "negativePrompt": "blurry, low quality, jerky motion",
  "duration": 10,
  "width": 1920,
  "height": 1080
}
{
  "taskType": "videoInference",
  "taskUUID": "f3a2b8c9-1e47-4d3a-9b2f-8c7e6d5a4b3c",
  "model": "klingai:3@2",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "positivePrompt": "The portrait subject begins to smile naturally with subtle facial animation and professional lighting",
  "duration": 5,
  "width": 1080,
  "height": 1920
}
{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440015",
  "model": "klingai:3@2",
  "positivePrompt": "Professional commercial shot of luxury car driving through city at night with reflections and detailed urban environment",
  "negativePrompt": "distorted, unrealistic, poor lighting",
  "duration": 10,
  "width": 1920,
  "height": 1080,
  "CFGScale": 0.7
}

KlingAI 2.0 Master

KlingAI's 2.0 Master model targets cinematic realism with high-end motion fidelity and strong prompt responsiveness for production-quality output.

Model AIR ID: klingai:4@3.

Supported workflows: Text-to-video, image-to-video.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Negative prompt: 2-2500 characters (optional).
  • CFG Scale: 0-1 (default: 0.5).
  • Supported dimensions: 1280×720 (16:9), 720×720 (1:1), 720×1280 (9:16).
  • Frame rate: 24 FPS.
  • Duration: 5 or 10 seconds.
  • Frame images: Supports first frame for frameImages.
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.
{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d1",
  "model": "klingai:4@3",
  "positivePrompt": "Cinematic close-up of rain drops falling on a leaf with shallow depth of field",
  "negativePrompt": "blurry, unrealistic, low quality",
  "duration": 5,
  "width": 1280,
  "height": 720
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d59",
  "model": "klingai:4@3",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "duration": 10,
  "width": 720,
  "height": 1280
}

KlingAI 2.1 Standard

KlingAI's 2.1 Standard model refines the 2.0 generation with smoother animations while maintaining cost-effective access to advanced features.

Model AIR ID: klingai:5@1.

Supported workflows: Image-to-video.

Technical specifications:

  • CFG Scale: 0-1 (default: 0.5).
  • Supported dimensions: 1280×720 (16:9), 720×720 (1:1), 720×1280 (9:16).
  • Frame rate: 24 FPS.
  • Duration: 5 or 10 seconds.
  • Frame images: Supports first frame for frameImages.
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.

KlingAI 2.1 Standard supports image-to-video generation only and does not support text-to-video workflows.

{
  "taskType": "videoInference",
  "taskUUID": "f3a2b8c9-1e47-4d3a-9b2f-8c7e6d5a4b3c",
  "model": "klingai:5@1",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "duration": 10,
  "width": 720,
  "height": 720
}

KlingAI 2.1 Pro

KlingAI's 2.1 Pro model unlocks higher frame fidelity and Full HD output, providing a middle ground between Standard and Master tiers.

Model AIR ID: klingai:5@2.

Supported workflows: Image-to-video.

Technical specifications:

  • CFG Scale: 0-1 (default: 0.5).
  • Supported dimensions: 1920×1080 (16:9), 1080×1080 (1:1), 1080×1920 (9:16).
  • Frame rate: 24 FPS.
  • Duration: 5 or 10 seconds.
  • Frame images: Supports first and last frame for frameImages.
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.

KlingAI 2.1 Pro supports image-to-video generation only and does not support text-to-video workflows.

{
  "taskType": "videoInference",
  "taskUUID": "f3a2b8c9-1e47-4d3a-9b2f-8c7e6d5a4b3c",
  "model": "klingai:5@2",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "duration": 5,
  "width": 1920,
  "height": 1080
}

KlingAI 2.1 Master

KlingAI's 2.1 Master model represents the peak of the KlingAI stack with Full HD image-to-video, ultra-fluid motion, and exceptional prompt precision for VFX-grade output.

Model AIR ID: klingai:5@3.

Supported workflows: Text-to-video, image-to-video.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Negative prompt: 2-2500 characters (optional).
  • CFG Scale: 0-1 (default: 0.5).
  • Supported dimensions: 1920×1080 (16:9), 1080×1080 (1:1), 1080×1920 (9:16).
  • Frame rate: 24 FPS.
  • Duration: 5 or 10 seconds.
  • Frame images: Supports first frame for frameImages.
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.
{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d1",
  "model": "klingai:5@3",
  "positivePrompt": "Cinematic aerial shot of waves crashing against dramatic cliffs during golden hour",
  "negativePrompt": "blurry, low quality, distorted",
  "duration": 10
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d59",
  "model": "klingai:5@3",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "duration": 5,
  "width": 1080,
  "height": 1080
}

KlingAI 2.5 Turbo Standard

Efficient edition of the 2.5 Turbo series designed for smooth, cinematic image-to-video generation. Delivers videos with strong motion control and dynamic composition, optimized for fast creative workflows.

Model AIR ID: klingai:6@0.

Supported workflows: Image-to-video.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Negative prompt: 2-2500 characters (optional).
  • CFG Scale: 0-1 (default: 0.5).
  • Frame rate: 30 FPS.
  • Duration: 5 or 10 seconds.
  • Frame images: Supports first frame for frameImages.
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.

The output dimensions are automatically inferred from the first frame image. The width and height parameters should not be specified.

{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d61",
  "model": "klingai:6@0",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "duration": 10
}

KlingAI 2.5 Turbo Pro

KlingAI's 2.5 Turbo Pro model delivers next-level creativity with turbocharged motion and cinematic visuals, featuring precise prompt adherence for both text-to-video and image-to-video workflows. This model combines enhanced motion fluidity with professional-grade cinematic capabilities.

Model AIR ID: klingai:6@1.

Supported workflows: Text-to-video, image-to-video.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Negative prompt: 2-2500 characters (optional).
  • CFG Scale: 0-1 (default: 0.5).
  • Supported dimensions: 1280×720 (16:9), 720×720 (1:1), 720×1280 (9:16), 1920×1080 (16:9), 1080×1080 (1:1), 1080×1920 (9:16).
  • Frame rate: 30 FPS.
  • Duration: 5 or 10 seconds.
  • Frame images: Supports first and last frame for frameImages.
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.
{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d2",
  "model": "klingai:6@1",
  "positivePrompt": "Cinematic aerial drone shot following a motorcycle racing through winding mountain roads at golden hour",
  "duration": 10,
  "width": 1280,
  "height": 720
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d60",
  "model": "klingai:6@1",
  "frameImages": [
    {
      "inputImage": "c64351d5-4c59-42f7-95e1-eace013eddab",
      "frame": "first"
    }
  ],
  "duration": 5
}

Kling VIDEO 2.6 Pro

Kling VIDEO 2.6 Pro is a next-generation video-and-audio AI model that delivers cinematic-quality visuals and native synchronized audio including dialogue, sound effects, and ambience. This model combines strong prompt fidelity with scene consistency, flexible artistic control, custom voice cloning, and motion control capabilities for professional video production workflows.

Model AIR ID: klingai:kling-video@2.6-pro.

Supported workflows: Text-to-video, image-to-video, motion-control.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Negative prompt: 2-2500 characters (optional).
  • CFG Scale: 0-1 (default: 0.5).
  • Supported dimensions: 1920×1080 (16:9), 1440×1440 (1:1), 1080×1920 (9:16).
  • Dimension behavior:
    • Text-to-video: Specify explicit width and height from the supported dimensions above.
    • Image-to-video: Dimensions are automatically inferred from the reference image.
  • Duration: 5 or 10 seconds (default: 5).
  • Frame images: Supports first and last frame for inputs.frameImages.
  • Reference voices: Supports multiple audio files via inputs.referenceVoices (MP3, WAV, MP4, MOV formats; 5-30 seconds duration; clean, single-speaker audio required).
  • Reference videos: Supports 1 video via inputs.referenceVideos for motion control (3-30 seconds duration; max 100MB; min 340px short edge, max 3850px long edge).
  • Input image requirements: Width and height between 300-2048 pixels, 20MB file size limit.

Provider-specific settings:

Parameters supported: sound, keepOriginalSound, characterOrientation.

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d4",
  "model": "klingai:kling-video@2.6-pro",
  "positivePrompt": "A bustling city street at night with people talking and traffic sounds",
  "duration": 10,
  "width": 1920,
  "height": 1080,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d63",
  "model": "klingai:kling-video@2.6-pro",
  "inputs": {
    "frameImages": [
      {
        "image": "c64351d5-4c59-42f7-95e1-eace013eddab",
        "frame": "first"
      },
      {
        "image": "450e8400-e29b-41d4-a716-446655440011",
        "frame": "last"
      }
    ]
  },
  "positivePrompt": "Bring this scene to life with natural motion and ambient sounds",
  "duration": 5,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440011",
  "model": "klingai:kling-video@2.6-pro",
  "positivePrompt": "The person on the left, in <<<voice_2>>>, says, 'Hello, James' and then the person on the right, in <<<voice_1>>>, says, 'Happy Christmas.'",
  "duration": 10,
  "width": 1920,
  "height": 1080,
  "inputs": {
    "referenceVoices": [
      "a1b2c3d4-5e6f-7g8h-9i0j-1k2l3m4n5o6p",
      "b2c3d4e5-6f7g-8h9i-0j1k-2l3m4n5o6p7q"
    ]
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da7",
  "model": "klingai:kling-video@2.6-pro",
  "inputs": {
    "referenceVideos": ["d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"]
  },
  "positivePrompt": "Character performs the same movements from the reference video",
  "duration": 10,
  "providerSettings": {
    "klingai": {
      "characterOrientation": "image",
      "keepOriginalSound": false,
      "sound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d494",
  "model": "klingai:kling-video@2.6-pro",
  "positivePrompt": "Cinematic drone shot of a serene forest landscape",
  "duration": 10,
  "width": 1920,
  "height": 1080,
  "providerSettings": {
    "klingai": {
      "sound": false
    }
  }
}

Kling VIDEO 3.0 Standard

Kling VIDEO 3.0 Standard generates synchronized video and audio from text and images with a balanced approach to quality, speed, and cost. This model supports reference-based generation, prompt-driven edits, and multi-prompt control while maintaining temporal stability and clear motion. Native audio output includes dialogue and ambient sound that aligns with the visual content.

Model AIR ID: klingai:kling-video@3-standard.

Supported workflows: Text-to-video, image-to-video.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Negative prompt: 2-2500 characters (optional).
  • Supported dimensions: 1280×720 (16:9), 960×960 (1:1), 720×1280 (9:16).
  • Dimension behavior:
    • Text-to-video: Specify explicit width and height from the supported dimensions above.
    • Image-to-video: Dimensions are automatically inferred from the reference image.
  • Duration: 3-15 seconds (default: 5).
  • Multi-prompt: Supports up to 6 sequential prompt segments via providerSettings.klingai.multiPrompt.
  • Frame images: Supports first and last frames for inputs.frameImages.
  • Input image requirements: 300×300 minimum, 10MB file size limit, aspect ratio 1:2.5 to 2.5:1.

Provider-specific settings:

Parameters supported: sound, multiPrompt, shotType.

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d8",
  "model": "klingai:kling-video@3-standard",
  "positivePrompt": "A peaceful park scene with birds chirping and gentle wind sounds",
  "duration": 8,
  "width": 1920,
  "height": 1080,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d65",
  "model": "klingai:kling-video@3-standard",
  "inputs": {
    "frameImages": [
      {
        "image": "c64351d5-4c59-42f7-95e1-eace013eddab",
        "frame": "first"
      }
    ]
  },
  "positivePrompt": "Animate this scene with natural motion and ambient audio",
  "duration": 5,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440016",
  "model": "klingai:kling-video@3-standard",
  "width": 1080,
  "height": 1920,
  "providerSettings": {
    "klingai": {
      "shotType": "customize",
      "sound": true,
      "multiPrompt": [
        {
          "prompt": "Character approaches door and reaches for handle",
          "duration": "5"
        },
        {
          "prompt": "Character opens door revealing bright room beyond",
          "duration": "5"
        }
      ]
    }
  }
}

Kling VIDEO 3.0 Pro

Kling VIDEO 3.0 Pro is a unified multimodal video model that generates high-fidelity video with synchronized native audio from text or images. This professional-grade model supports reference-guided generation, prompt-based editing, fine control over motion and pacing through multi-prompt segments, and stable temporal coherence for cinematic and narrative clips. Native audio output includes dialogue, ambient sound, and effects aligned to the visuals.

Model AIR ID: klingai:kling-video@3-pro.

Supported workflows: Text-to-video, image-to-video.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Negative prompt: 2-2500 characters (optional).
  • Supported dimensions: 1920×1080 (16:9), 1440×1440 (1:1), 1080×1920 (9:16).
  • Dimension behavior:
    • Text-to-video: Specify explicit width and height from the supported dimensions above.
    • Image-to-video: Dimensions are automatically inferred from the reference image.
  • Duration: 3-15 seconds (default: 5).
  • Multi-prompt: Supports up to 6 sequential prompt segments via providerSettings.klingai.multiPrompt.
  • Frame images: Supports first and last frames for inputs.frameImages.
  • Input image requirements: 300×300 minimum, 10MB file size limit, aspect ratio 1:2.5 to 2.5:1.

Provider-specific settings:

Parameters supported: sound, multiPrompt, shotType.

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d7",
  "model": "klingai:kling-video@3-pro",
  "positivePrompt": "A cinematic shot of a city street at golden hour with ambient sounds of traffic and conversation",
  "duration": 10,
  "width": 1920,
  "height": 1080,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d64",
  "model": "klingai:kling-video@3-pro",
  "inputs": {
    "frameImages": [
      {
        "image": "c64351d5-4c59-42f7-95e1-eace013eddab",
        "frame": "first"
      }
    ]
  },
  "positivePrompt": "Bring this portrait to life with subtle movement and synchronized dialogue",
  "duration": 8,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440015",
  "model": "klingai:kling-video@3-pro",
  "width": 1920,
  "height": 1080,
  "providerSettings": {
    "klingai": {
      "shotType": "customize",
      "sound": true,
      "multiPrompt": [
        {
          "prompt": "Wide shot establishing a quiet library interior",
          "duration": "3"
        },
        {
          "prompt": "Camera pushes in on a person reading intently",
          "duration": "4"
        },
        {
          "prompt": "Close-up of turning pages with rustling sound",
          "duration": "3"
        }
      ]
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da8",
  "model": "klingai:kling-video@3-pro",
  "inputs": {
    "frameImages": [
      {
        "image": "c64351d5-4c59-42f7-95e1-eace013eddab",
        "frame": "first"
      },
      {
        "image": "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f",
        "frame": "last"
      }
    ]
  },
  "positivePrompt": "Smooth transition between these two scenes with natural motion",
  "duration": 12,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  }
}

KlingAI Lip-Sync

KlingAI's Lip-Sync model matches lip movements and facial expressions to new dialogue or audio, making speech look accurate and natural in any video. This specialized model is ideal for dubbing, voiceover replacement, and creating multilingual content.

Model AIR ID: klingai:7@1.

Supported workflows: Video-to-video with audio synchronization.

Technical specifications:

  • Input video: Supports inputs.video (required).
  • Input audio: Supports inputs.audio (required).
  • Input video requirements: 2-60 seconds duration, width and height between 512-2160 pixels, 20MB file size limit.
  • Audio input requirements: 2-60 seconds duration.
  • Supported resolutions: 720p or 1080p.

Provider-specific settings:

Parameters supported: soundVolume, originalAudioVolume.

{
  "taskType": "videoInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d491",
  "model": "klingai:7@1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
    "audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "6ba7b828-9dad-11d1-80b4-00c04fd430c8",
  "model": "klingai:7@1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
    "audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
  },
  "providerSettings": {
    "klingai": {
      "soundVolume": 0.8,
      "originalAudioVolume": 0.3
    }
  }
}

Kling VIDEO O1 Standard

Kling VIDEO O1 Standard is a unified multimodal video model for controllable generation and instruction-based editing. It supports text prompts, image references, and video input to enable precise control over motion, transitions, object changes, and visual adjustments within short-form video workflows.

Model AIR ID: klingai:kling@o1-standard.

Supported workflows: Text-to-video, image-to-video, reference-to-video, video-edit.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Supported dimensions: 1280×720 (16:9), 960×960 (1:1), 720×1280 (9:16) (text-to-video only).
  • Duration:
    • Text-to-video: 5 or 10 seconds (default: 5).
    • Image-to-video: 5 or 10 seconds (default: 5).
    • Reference-to-video: 3-10 seconds (default: 5).
    • Video-edit: Matches input video duration (3-10 seconds, 6-20 seconds for fast mode).
  • Frame images: Supports first and last frame for inputs.frameImages (image-to-video only).
  • Reference images:
    • Reference-to-video: 1-7 images via inputs.referenceImages (at least 1 required).
    • Video-edit: Supports up to 4 images via inputs.referenceImages (optional).
  • Reference videos: Supports inputs.referenceVideos with 1 video (reference-to-video only).
  • Input video: Supports inputs.video with 3-10 second clips (6-20 seconds for fast mode, video-edit only).
  • Input requirements: 32MB file size limit.

Provider-specific settings:

Parameters supported: keepOriginalSound.

Kling O1 automatically determines the workflow based on provided inputs: referenceImages or referenceVideos trigger reference-to-video, frameImages triggers image-to-video, inputs.video triggers video editing, and no inputs triggers text-to-video generation.

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d3",
  "model": "klingai:kling@o1-standard",
  "positivePrompt": "A serene mountain landscape at sunrise with mist rolling through valleys",
  "duration": 5,
  "width": 1280,
  "height": 720
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d62",
  "model": "klingai:kling@o1-standard",
  "inputs": {
    "frameImages": [
      {
        "image": "c64351d5-4c59-42f7-95e1-eace013eddab",
        "frame": "first"
      }
    ]
  },
  "positivePrompt": "Camera slowly zooms into the scene with smooth motion",
  "duration": 10,
  "width": 960,
  "height": 960
}
{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440010",
  "model": "klingai:kling@o1-standard",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab",
      "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"
    ]
  },
  "positivePrompt": "Create a smooth transition between these reference styles",
  "duration": 7
}
{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da7",
  "model": "klingai:kling@o1-standard",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab"
  },
  "positivePrompt": "Change the scene to nighttime with moonlight",
  "providerSettings": {
    "klingai": {
      "keepOriginalSound": true
    }
  }
}

Kling VIDEO O1 Pro

Kling VIDEO O1 Pro is a unified multimodal video foundation model for controllable generation and instruction-based editing. It supports text prompts, visual references, and video input so developers can build high-control pipelines for pacing, transitions, object changes, and style revisions with enhanced quality and resolution.

Model AIR ID: klingai:kling@o1.

Supported workflows: Text-to-video, image-to-video, reference-to-video, video-edit.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Supported dimensions: 1920×1080 (16:9), 1440×1440 (1:1), 1080×1920 (9:16) (text-to-video only).
  • Duration:
    • Text-to-video: 5 or 10 seconds (default: 5).
    • Image-to-video: 5 or 10 seconds (default: 5).
    • Reference-to-video: 3-10 seconds (default: 5).
    • Video-edit: Matches input video duration (3-10 seconds, 6-20 seconds for fast mode).
  • Frame images: Supports first and last frame for inputs.frameImages (image-to-video only).
  • Reference images:
    • Reference-to-video: 1-7 images via inputs.referenceImages (at least 1 required).
    • Video-edit: Supports up to 4 images via inputs.referenceImages (optional).
  • Reference videos: Supports inputs.referenceVideos with 1 video (reference-to-video only).
  • Input video: Supports inputs.video with 3-10 second clips (6-20 seconds for fast mode, video-edit only).
  • Input requirements: 32MB file size limit.

Provider-specific settings:

Parameters supported: keepOriginalSound, fast.

Kling O1 automatically determines the workflow based on provided inputs: referenceImages or referenceVideos trigger reference-to-video, frameImages triggers image-to-video, inputs.video triggers video editing, and no inputs triggers text-to-video generation.

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d3",
  "model": "klingai:kling@o1",
  "positivePrompt": "A serene mountain landscape at sunrise with mist rolling through valleys",
  "duration": 5,
  "width": 1920,
  "height": 1080
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d62",
  "model": "klingai:kling@o1",
  "inputs": {
    "frameImages": [
      {
        "image": "c64351d5-4c59-42f7-95e1-eace013eddab",
        "frame": "first"
      }
    ]
  },
  "positivePrompt": "Camera slowly zooms into the scene with smooth motion",
  "duration": 10,
  "width": 1440,
  "height": 1440
}
{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440010",
  "model": "klingai:kling@o1",
  "inputs": {
    "referenceImages": [
      "c64351d5-4c59-42f7-95e1-eace013eddab",
      "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"
    ]
  },
  "positivePrompt": "Create a smooth transition between these reference styles",
  "duration": 7
}
{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da7",
  "model": "klingai:kling@o1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab"
  },
  "positivePrompt": "Change the scene to nighttime with moonlight",
  "providerSettings": {
    "klingai": {
      "keepOriginalSound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d489",
  "model": "klingai:kling@o1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
    "referenceImages": [
      "d7e8f9a0-2b5c-4e7f-a1d3-9c8b7a6e5d4f"
    ]
  },
  "positivePrompt": "Apply this artistic style to the entire video",
  "providerSettings": {
    "klingai": {
      "fast": true,
      "keepOriginalSound": true
    }
  }
}

Kling VIDEO O3 Standard

Kling VIDEO O3 Standard is a cost-efficient multimodal video generation model that produces HD video from text or images with native audio output. This model balances quality with speed and price, supporting reference-based generation and prompt-based video edits while maintaining temporal stability across clips. Native audio includes dialogue and ambient sound synchronized to the visual content.

Model AIR ID: klingai:kling-video@o3-standard.

Supported workflows: Text-to-video, image-to-video, reference-to-video, video-to-video.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Supported dimensions: 1280×720 (16:9), 960×960 (1:1), 720×1280 (9:16).
  • Dimension behavior:
    • Text-to-video: Specify explicit width and height from the supported dimensions above.
    • Image-to-video: Dimensions are automatically inferred from the reference image.
    • Video-to-video: Dimensions match the input video, custom dimensions not supported.
  • Duration: 3-15 seconds (default: 5).
  • Duration behavior:
    • Video-to-video: Output duration matches input video duration (parameter ignored).
  • Multi-prompt: Supports up to 6 sequential prompt segments via providerSettings.klingai.multiPrompt (not supported with reference videos).
  • Frame images: Supports first and last frames for inputs.frameImages.
  • Reference images: Supports up to 7 images via inputs.referenceImages (or 4 if reference video is also used).
  • Reference videos: Supports 1 video via inputs.referenceVideos for feature reference.
  • Video editing: Supports 1 video via inputs.video for prompt-based editing.
  • Input requirements:
    • Images: 300×300 minimum, 10MB file size limit, aspect ratio 1:2.5 to 2.5:1.
    • Videos: 3-10 seconds duration, 720-2160 pixels.

Provider-specific settings:

Parameters supported: sound (not supported with reference videos), multiPrompt (not supported with reference videos), keepOriginalSound (only with reference videos).

When using reference videos (inputs.referenceVideos or inputs.video), only the keepOriginalSound parameter is supported. The sound and multiPrompt parameters cannot be used in these workflows.

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9d9",
  "model": "klingai:kling-video@o3-standard",
  "positivePrompt": "A bustling marketplace with vendor calls and ambient crowd noise",
  "duration": 8,
  "width": 1280,
  "height": 720,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d66",
  "model": "klingai:kling-video@o3-standard",
  "inputs": {
    "frameImages": [
      {
        "image": "c64351d5-4c59-42f7-95e1-eace013eddab",
        "frame": "first"
      }
    ]
  },
  "positivePrompt": "Animate this scene with natural motion and environmental sounds",
  "duration": 5,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440017",
  "model": "klingai:kling-video@o3-standard",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab"
  },
  "positivePrompt": "Change the lighting to golden hour while maintaining the scene composition",
  "providerSettings": {
    "klingai": {
      "keepOriginalSound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da9",
  "model": "klingai:kling-video@o3-standard",
  "width": 720,
  "height": 1280,
  "providerSettings": {
    "klingai": {
      "sound": true,
      "multiPrompt": [
        {
          "prompt": "Character walks through a quiet forest path",
          "duration": "5"
        },
        {
          "prompt": "Character discovers a hidden waterfall with rushing water sounds",
          "duration": "5"
        }
      ]
    }
  }
}

Kling VIDEO O3 Pro

Kling VIDEO O3 Pro is a unified multimodal video model that generates HD clips from text or images with native audio output, prioritizing detail, motion realism, and stable subject identity. This professional-grade model supports reference-driven generation and prompt-based video editing with strong temporal consistency, delivering higher-fidelity renders for production-quality content. Native audio includes synchronized dialogue, ambient sound, and effects.

Model AIR ID: klingai:kling-video@o3-pro.

Supported workflows: Text-to-video, image-to-video, reference-to-video, video-to-video.

Technical specifications:

  • Positive prompt: 2-2500 characters.
  • Supported dimensions: 1920×1080 (16:9), 1440×1440 (1:1), 1080×1920 (9:16).
  • Dimension behavior:
    • Text-to-video: Specify explicit width and height from the supported dimensions above.
    • Image-to-video: Dimensions are automatically inferred from the reference image.
    • Video-to-video: Dimensions match the input video, custom dimensions not supported.
  • Duration: 3-15 seconds (default: 5).
  • Duration behavior:
    • Video-to-video: Output duration matches input video duration (parameter ignored).
  • Multi-prompt: Supports up to 6 sequential prompt segments via providerSettings.klingai.multiPrompt (not supported with reference videos).
  • Frame images: Supports first and last frames for inputs.frameImages.
  • Reference images: Supports up to 7 images via inputs.referenceImages (or 4 if reference video is also used).
  • Reference videos: Supports 1 video via inputs.referenceVideos for feature reference.
  • Video editing: Supports 1 video via inputs.video for prompt-based editing.
  • Input requirements:
    • Images: 300×300 minimum, 10MB file size limit, aspect ratio 1:2.5 to 2.5:1.
    • Videos: 3-10 seconds duration, 720-2160 pixels.

Provider-specific settings:

Parameters supported: sound (not supported with reference videos), multiPrompt (not supported with reference videos), keepOriginalSound (only with reference videos).

When using reference videos (inputs.referenceVideos or inputs.video), only the keepOriginalSound parameter is supported. The sound and multiPrompt parameters cannot be used in these workflows.

{
  "taskType": "videoInference",
  "taskUUID": "24cd5dff-cb81-4db5-8506-b72a9425f9da",
  "model": "klingai:kling-video@o3-pro",
  "positivePrompt": "Cinematic shot of waves crashing against rocky cliffs with dramatic ocean sounds",
  "duration": 10,
  "width": 1920,
  "height": 1080,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d67",
  "model": "klingai:kling-video@o3-pro",
  "inputs": {
    "frameImages": [
      {
        "image": "c64351d5-4c59-42f7-95e1-eace013eddab",
        "frame": "first"
      }
    ]
  },
  "positivePrompt": "Bring this character to life with subtle facial expressions and synchronized dialogue",
  "duration": 8,
  "providerSettings": {
    "klingai": {
      "sound": true
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440018",
  "model": "klingai:kling-video@o3-pro",
  "inputs": {
    "referenceVideos": [
      {
        "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
        "referType": "feature"
      }
    ]
  },
  "positivePrompt": "Generate a new scene maintaining the character and motion style from the reference",
  "duration": 12,
  "width": 1920,
  "height": 1080,
  "providerSettings": {
    "klingai": {
      "keepOriginalSound": false
    }
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35daa",
  "model": "klingai:kling-video@o3-pro",
  "width": 1920,
  "height": 1080,
  "providerSettings": {
    "klingai": {
      "sound": true,
      "multiPrompt": [
        {
          "prompt": "Establishing shot of a modern laboratory with equipment humming",
          "duration": "4"
        },
        {
          "prompt": "Scientist examines samples under microscope with focused concentration",
          "duration": "6"
        },
        {
          "prompt": "Close-up of breakthrough discovery moment with triumphant music",
          "duration": "5"
        }
      ]
    }
  }
}

KlingAI Avatar 2.0 Standard

KlingAI's Avatar 2.0 Standard generates expressive talking avatar videos from a single portrait image and audio input, preserving identity while producing natural lip synchronization and expressive motion. This model supports up to five minutes of video with multilingual control and gesture clarity for both human and cartoon characters.

Model AIR ID: klingai:avatar@2.0-standard.

Supported workflows: Image-to-video with audio.

Technical specifications:

  • Image input: Via inputs.image (required).
  • Audio input: Via inputs.audio (required).
  • Positive prompt: 1-2500 characters (optional).
  • Input image requirements: Width and height minimum 300 pixels, aspect ratio between 1:2.5 and 2.5:1, 10MB file size limit.
  • Audio input requirements: Duration between 2-300 seconds, 5MB file size limit.
{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da7",
  "model": "klingai:avatar@2.0-standard",
  "inputs": {
    "image": "aac49721-1964-481a-ae78-8a4e29b91402",
    "audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d63",
  "model": "klingai:avatar@2.0-standard",
  "inputs": {
    "image": "aac49721-1964-481a-ae78-8a4e29b91402",
    "audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
  },
  "positivePrompt": "Professional speaking with confident gestures and natural expressions"
}

KlingAI Avatar 2.0 Pro

KlingAI's Avatar 2.0 Pro builds on the Standard version with higher visual fidelity, smoother motion, and improved expressivity. This enhanced model generates up to five-minute avatar videos with production-ready quality, enhanced detail, and superior motion consistency across varied character types.

Model AIR ID: klingai:avatar@2.0-pro.

Supported workflows: Image-to-video with audio.

Technical specifications:

  • Image input: Via inputs.image (required).
  • Audio input: Via inputs.audio (required).
  • Positive prompt: 1-2500 characters (optional).
  • Input image requirements: Width and height minimum 300 pixels, aspect ratio between 1:2.5 and 2.5:1, 10MB file size limit.
  • Audio input requirements: Duration between 2-300 seconds, 5MB file size limit.
{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da8",
  "model": "klingai:avatar@2.0-pro",
  "inputs": {
    "image": "aac49721-1964-481a-ae78-8a4e29b91402",
    "audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
  }
}
{
  "taskType": "videoInference",
  "taskUUID": "b8c4d952-7f27-4a6e-bc9a-83f01d1c6d64",
  "model": "klingai:avatar@2.0-pro",
  "inputs": {
    "image": "aac49721-1964-481a-ae78-8a4e29b91402",
    "audio": "b4c57832-2075-492b-bf89-9b5e3ac02503"
  },
  "positivePrompt": "Professional presentation with enhanced detail and smooth expressive motion"
}