Sync
Introduction
Sync's AI models are integrated into the Runware platform through our unified API, providing access to advanced video performance editing and lip synchronization technology. The platform enables creators to modify spoken audio in existing videos while preserving speaker identity, style, and natural motion.
Through the providerSettings.sync object, you can access Sync's unique features such as synchronization modes, active speaker detection, and segment-based control, while maintaining the consistency of Runware's standard API structure. This page documents the technical specifications, parameter requirements, and provider-specific settings for all Sync models available through our platform.
-
providerSettings»syncsyncobject -
Configuration object for Sync.so-specific video synchronization and performance editing features. These settings control how audio is synchronized with video, speaker detection, and segment-based editing.
View example
{ "taskType": "videoInference", "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6", "model": "sync:lipsync-2@1", "inputs": { "video": "c64351d5-4c59-42f7-95e1-eace013eddab", "audios": [ { "id": "main-audio", "source": "b4c57832-2075-492b-bf89-9b5e3ac02503" } ] }, "providerSettings": { "sync": { "syncMode": "bounce", "temperature": 0.5, "activeSpeakerDetection": true } } }Properties 7 properties
-
providerSettings»sync»syncModesyncModestring Default: bounce -
Specifies the synchronization strategy when audio duration doesn't match video duration.
Available values:
bounce: Audio bounces back and forth to fill video duration.loop: Audio repeats from the beginning when it ends.cut_off: Audio is cut when video ends.silence: Remaining video plays with silence after audio ends.remap: Audio is time-stretched or compressed to match video duration exactly.
-
providerSettings»sync»temperaturetemperaturefloat Min: 0 Max: 1 Default: 0.5 -
Controls the expressiveness and variation in the generated lip sync and facial movements. Lower values produce more conservative, precise movements, while higher values allow more expressive and varied animations.
-
providerSettings»sync»activeSpeakerDetectionactiveSpeakerDetectionboolean Default: false -
Enables automatic detection of the active speaker in the video. When enabled, the model identifies which person is speaking and applies lip sync only to the detected speaker, leaving other faces unmodified.
This is useful for multi-person scenes where only one person should have their lips synchronized to the audio.
-
providerSettings»sync»occlusionDetectionEnabledocclusionDetectionEnabledboolean Default: false -
Enables detection and handling of facial occlusions such as hands covering the mouth, objects in front of the face, or partial visibility. When enabled, the model adapts the lip sync to account for occluded regions. This helps maintain natural appearance when faces are partially hidden or obstructed during the video.
-
providerSettings»sync»segmentssegmentsarray -
Defines specific time segments in the video where different audio inputs should be applied. This enables precise control over which audio is synchronized to which portion of the video.
Each segment object specifies the video time range, which audio input to use (via reference ID), and optionally which portion of the source audio to use.
The
reffield must match anidspecified in theinputs.audiosorinputs.speecharray objects. This links each segment to its corresponding audio source.Segment object properties:
startTime(float, required): Start time in seconds for the segment in the video timeline.endTime(float, required): End time in seconds for the segment in the video timeline. Must be greater thanstartTime.ref(string, required): Reference ID linking to an audio input defined ininputs.audioorinputs.speech.audioStartTime(float, optional): Start time in seconds within the source audio file. Defaults to 0.audioEndTime(float, optional): End time in seconds within the source audio file. Defaults to end of audio.
"providerSettings": { "sync": { "segments": [ { "startTime": 0, "endTime": 5, "ref": "audio-1", "audioStartTime": 0, "audioEndTime": 5 }, { "startTime": 5, "endTime": 10, "ref": "audio-2", "audioStartTime": 2, "audioEndTime": 7 } ] } }
-
providerSettings»sync»editRegioneditRegionstring Default: face -
Specifies which region of the subject should be modified during performance re-animation. This controls the scope of facial changes and movement generation.
Available values:
lips: Modifies only lip movements for synchronization.face: Affects lip sync and emotional expressions in the face region.head: Generates natural talking head movements along with emotions and lip sync for full performance animation.
This parameter is only available for the react-1 model (sync:react-1@1).
-
providerSettings»sync»emotionPromptemotionPromptstring -
Guides the emotional tone and delivery style for the performance re-animation. This allows you to modify the acting interpretation without reshooting.
Available values:
happy,sad,angry,disgusted,surprised,neutral.This parameter is only available for the react-1 model (sync:react-1@1).
-
Video models
lipsync-2
Sync's lipsync-2 is a zero-shot lip-sync model that synchronizes spoken audio to existing video without training or fine-tuning. It preserves the speaker's unique speaking style and works across live-action and AI-generated content.
Model AIR ID: sync:lipsync-2@1.
Supported workflows: Video-to-video with audio replacement.
Technical specifications:
- Input video: Required via
inputs.video. - Input audio: Supports audio files via
inputs.audioor text-to-speech viainputs.speech. - Advanced control: Use segments to map different audio/speech inputs to specific time ranges (requires IDs on inputs).
Provider-specific settings:
Parameters supported: syncMode, temperature, activeSpeakerDetection, occlusionDetectionEnabled, segments.
{
"taskType": "videoInference",
"taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d490",
"model": "sync:lipsync-2@1",
"inputs": {
"video": "c64351d5-4c59-42f7-95e1-eace013eddab",
"audio": [
{
"id": "main-audio",
"source": "b4c57832-2075-492b-bf89-9b5e3ac02503"
}
]
},
"providerSettings": {
"sync": {
"syncMode": "bounce",
"temperature": 0.5
}
}
}{
"taskType": "videoInference",
"taskUUID": "6ba7b827-9dad-11d1-80b4-00c04fd430c9",
"model": "sync:lipsync-2@1",
"inputs": {
"video": "c64351d5-4c59-42f7-95e1-eace013eddab",
"speech": [
{
"id": "dialogue-1",
"provider": {
"name": "elevenlabs",
"voiceId": "21m00Tcm4TlvDq8ikWAM"
},
"text": "Welcome to our presentation about artificial intelligence."
}
]
},
"providerSettings": {
"sync": {
"syncMode": "bounce"
}
}
}{
"taskType": "videoInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440010",
"model": "sync:lipsync-2@1",
"inputs": {
"video": "c64351d5-4c59-42f7-95e1-eace013eddab",
"audio": [
{
"id": "part-1",
"source": "32754c50-4506-4b37-87a7-fdb75a7a55df"
},
{
"id": "part-2",
"source": "b4f87ba1-df22-4606-a60f-c1d5467e5bf0"
}
]
},
"providerSettings": {
"sync": {
"segments": [
{
"startTime": 0,
"endTime": 5,
"ref": "part-1"
},
{
"startTime": 5,
"endTime": 10,
"ref": "part-2"
}
]
}
}
}lipsync-2-pro
Sync's lipsync-2-pro extends lipsync-2 with diffusion-based super-resolution to deliver studio-grade lip-sync editing. It preserves facial details including teeth, beards, and subtle expressions, and supports high-resolution output up to 4K content for professional production workflows.
Model AIR ID: sync:lipsync-2-pro@1.
Supported workflows: Video-to-video with audio replacement.
Technical specifications:
- Input video: Required via
inputs.video. - Input audio: Supports audio files via
inputs.audioor text-to-speech viainputs.speech. - Advanced control: Use segments to map different audio/speech inputs to specific time ranges (requires IDs on inputs).
- Output resolution: Supports up to 4K content.
Provider-specific settings:
Parameters supported: syncMode, temperature, activeSpeakerDetection, occlusionDetectionEnabled, segments.
{
"taskType": "videoInference",
"taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d491",
"model": "sync:lipsync-2-pro@1",
"inputs": {
"video": "c64351d5-4c59-42f7-95e1-eace013eddab",
"audio": [
{
"id": "professional-audio",
"source": "b4c57832-2075-492b-bf89-9b5e3ac02503"
}
]
},
"providerSettings": {
"sync": {
"syncMode": "bounce",
"temperature": 0.3,
"occlusionDetectionEnabled": true
}
}
}{
"taskType": "videoInference",
"taskUUID": "6ba7b828-9dad-11d1-80b4-00c04fd430c9",
"model": "sync:lipsync-2-pro@1",
"inputs": {
"video": "c64351d5-4c59-42f7-95e1-eace013eddab",
"speech": [
{
"id": "cinematic-dialogue",
"provider": {
"name": "elevenlabs",
"voiceId": "21m00Tcm4TlvDq8ikWAM"
},
"text": "This is a cinematic close-up with preserved facial details."
}
]
},
"providerSettings": {
"sync": {
"syncMode": "bounce",
"temperature": 0.4
}
}
}react-1
Sync's react-1 extends beyond lip synchronization to re-animate emotional delivery, micro-expressions, and facial performance. It enables directors to modify acting after the fact without reshooting, allowing dialogue reinterpretation with different emotional style.
Model AIR ID: sync:react-1@1.
Supported workflows: Video-to-video with performance re-animation.
Technical specifications:
- Input video: Required via
inputs.video. - Input audio: Supports audio files via
inputs.audioor text-to-speech viainputs.speech. - Edit regions: Control scope of facial modifications (
lips,face, orhead). - Emotion guidance: Direct emotional performance with single-word emotion prompts.
- Output resolution: Supports up to 4K content.
Provider-specific settings:
Parameters supported: syncMode, temperature, activeSpeakerDetection, occlusionDetectionEnabled, editRegion, emotionPrompt.
Segment-based control is coming soon for react-1, which will enable precise mapping of different audio inputs and emotion prompts to specific time ranges within the video.
{
"taskType": "videoInference",
"taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d492",
"model": "sync:react-1@1",
"inputs": {
"video": "c64351d5-4c59-42f7-95e1-eace013eddab",
"audio": [
{
"id": "new-performance",
"source": "b4c57832-2075-492b-bf89-9b5e3ac02503"
}
]
},
"providerSettings": {
"sync": {
"syncMode": "bounce",
"temperature": 0.7,
"editRegion": "face",
"emotionPrompt": "happy"
}
}
}{
"taskType": "videoInference",
"taskUUID": "6ba7b829-9dad-11d1-80b4-00c04fd430c9",
"model": "sync:react-1@1",
"inputs": {
"video": "c64351d5-4c59-42f7-95e1-eace013eddab",
"speech": [
{
"id": "dramatic-reading",
"provider": {
"name": "elevenlabs",
"voiceId": "21m00Tcm4TlvDq8ikWAM"
},
"text": "This is a dramatic performance with natural head movements."
}
]
},
"providerSettings": {
"sync": {
"syncMode": "remap",
"temperature": 0.8,
"editRegion": "head",
"emotionPrompt": "surprised",
"activeSpeakerDetection": true
}
}
}{
"taskType": "videoInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440011",
"model": "sync:react-1@1",
"inputs": {
"video": "c64351d5-4c59-42f7-95e1-eace013eddab",
"audio": [
{
"id": "localized-audio",
"source": "b4c57832-2075-492b-bf89-9b5e3ac02503"
}
]
},
"providerSettings": {
"sync": {
"syncMode": "bounce",
"temperature": 0.6,
"editRegion": "face",
"emotionPrompt": "sad"
}
}
}