Sync

Access Sync's AI models including lipsync-2, lipsync-2-pro, and react-1 for video performance editing and lip synchronization through Runware's unified API. Learn about Sync-specific parameters, segments, and synchronization modes.

Introduction

Sync's AI models are integrated into the Runware platform through our unified API, providing access to advanced video performance editing and lip synchronization technology. The platform enables creators to modify spoken audio in existing videos while preserving speaker identity, style, and natural motion.

Through the providerSettings.sync object, you can access Sync's unique features such as synchronization modes, active speaker detection, and segment-based control, while maintaining the consistency of Runware's standard API structure. This page documents the technical specifications, parameter requirements, and provider-specific settings for all Sync models available through our platform.

providerSettings » sync sync object

Configuration object for Sync.so-specific video synchronization and performance editing features. These settings control how audio is synchronized with video, speaker detection, and segment-based editing.

Example 1 example

{
  "taskType": "videoInference",
  "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6",
  "model": "sync:lipsync-2@1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
    "audios": [
      {
        "id": "main-audio",
        "source": "b4c57832-2075-492b-bf89-9b5e3ac02503"
      }
    ]
  },
  "providerSettings": {
    "sync": {
      "syncMode": "bounce",
      "temperature": 0.5,
      "activeSpeakerDetection": true
    }
  }
}

Properties 7 properties

providerSettings » sync » syncMode syncMode string Default: bounce

Specifies the synchronization strategy when audio duration doesn't match video duration.

Available values:

bounce: Audio bounces back and forth to fill video duration.
loop: Audio repeats from the beginning when it ends.
cut_off: Audio is cut when video ends.
silence: Remaining video plays with silence after audio ends.
remap: Audio is time-stretched or compressed to match video duration exactly.

providerSettings » sync » temperature temperature float Min: 0 Max: 1 Default: 0.5: Controls the expressiveness and variation in the generated lip sync and facial movements. Lower values produce more conservative, precise movements, while higher values allow more expressive and varied animations.

providerSettings » sync » activeSpeakerDetection activeSpeakerDetection boolean Default: false

Enables automatic detection of the active speaker in the video. When enabled, the model identifies which person is speaking and applies lip sync only to the detected speaker, leaving other faces unmodified.

This is useful for multi-person scenes where only one person should have their lips synchronized to the audio.

providerSettings » sync » occlusionDetectionEnabled occlusionDetectionEnabled boolean Default: false: Enables detection and handling of facial occlusions such as hands covering the mouth, objects in front of the face, or partial visibility. When enabled, the model adapts the lip sync to account for occluded regions. This helps maintain natural appearance when faces are partially hidden or obstructed during the video.

providerSettings » sync » segments segments array

Defines specific time segments in the video where different audio inputs should be applied. This enables precise control over which audio is synchronized to which portion of the video.

Each segment object specifies the video time range, which audio input to use (via reference ID), and optionally which portion of the source audio to use.

The ref field must match an id specified in the inputs.audios or inputs.speech array objects. This links each segment to its corresponding audio source.

Segment object properties:

startTime (float, required): Start time in seconds for the segment in the video timeline.
endTime (float, required): End time in seconds for the segment in the video timeline. Must be greater than startTime.
ref (string, required): Reference ID linking to an audio input defined in inputs.audio or inputs.speech.
audioStartTime (float, optional): Start time in seconds within the source audio file. Defaults to 0.
audioEndTime (float, optional): End time in seconds within the source audio file. Defaults to end of audio.

"providerSettings": {
  "sync": {
    "segments": [
      {
        "startTime": 0,
        "endTime": 5,
        "ref": "audio-1",
        "audioStartTime": 0,
        "audioEndTime": 5
      },
      {
        "startTime": 5,
        "endTime": 10,
        "ref": "audio-2",
        "audioStartTime": 2,
        "audioEndTime": 7
      }
    ]
  }
}

providerSettings » sync » editRegion editRegion string Default: face

Specifies which region of the subject should be modified during performance re-animation. This controls the scope of facial changes and movement generation.

Available values:

lips: Modifies only lip movements for synchronization.
face: Affects lip sync and emotional expressions in the face region.
head: Generates natural talking head movements along with emotions and lip sync for full performance animation.

This parameter is only available for the react-1 model (sync:react-1@1).

providerSettings » sync » emotionPrompt emotionPrompt string

Guides the emotional tone and delivery style for the performance re-animation. This allows you to modify the acting interpretation without reshooting.

Available values:

happy, sad, angry, disgusted, surprised, neutral.

This parameter is only available for the react-1 model (sync:react-1@1).

Video models

lipsync-2

Sync's lipsync-2 is a zero-shot lip-sync model that synchronizes spoken audio to existing video without training or fine-tuning. It preserves the speaker's unique speaking style and works across live-action and AI-generated content.

Model AIR ID: sync:lipsync-2@1.

Supported workflows: Video-to-video with audio replacement.

Technical specifications:

Input video: Required via inputs.video.
Input audio: Supports audio files via inputs.audio or text-to-speech via inputs.speech.
Advanced control: Use segments to map different audio/speech inputs to specific time ranges (requires IDs on inputs).

Provider-specific settings:

Parameters supported: syncMode, temperature, activeSpeakerDetection, occlusionDetectionEnabled, segments.

Audio replacement

{
  "taskType": "videoInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d490",
  "model": "sync:lipsync-2@1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
    "audio": [
      {
        "id": "main-audio",
        "source": "b4c57832-2075-492b-bf89-9b5e3ac02503"
      }
    ]
  },
  "providerSettings": {
    "sync": {
      "syncMode": "bounce",
      "temperature": 0.5
    }
  }
}

Text-to-speech

{
  "taskType": "videoInference",
  "taskUUID": "6ba7b827-9dad-11d1-80b4-00c04fd430c9",
  "model": "sync:lipsync-2@1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
    "speech": [
      {
        "id": "dialogue-1",
        "provider": {
          "name": "elevenlabs",
          "voiceId": "21m00Tcm4TlvDq8ikWAM"
        },
        "text": "Welcome to our presentation about artificial intelligence."
      }
    ]
  },
  "providerSettings": {
    "sync": {
      "syncMode": "bounce"
    }
  }
}

Segment-based control

{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440010",
  "model": "sync:lipsync-2@1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
    "audio": [
      {
        "id": "part-1",
        "source": "32754c50-4506-4b37-87a7-fdb75a7a55df"
      },
      {
        "id": "part-2",
        "source": "b4f87ba1-df22-4606-a60f-c1d5467e5bf0"
      }
    ]
  },
  "providerSettings": {
    "sync": {
      "segments": [
        {
          "startTime": 0,
          "endTime": 5,
          "ref": "part-1"
        },
        {
          "startTime": 5,
          "endTime": 10,
          "ref": "part-2"
        }
      ]
    }
  }
}

lipsync-2-pro

Sync's lipsync-2-pro extends lipsync-2 with diffusion-based super-resolution to deliver studio-grade lip-sync editing. It preserves facial details including teeth, beards, and subtle expressions, and supports high-resolution output up to 4K content for professional production workflows.

Model AIR ID: sync:lipsync-2-pro@1.

Supported workflows: Video-to-video with audio replacement.

Technical specifications:

Input video: Required via inputs.video.
Input audio: Supports audio files via inputs.audio or text-to-speech via inputs.speech.
Advanced control: Use segments to map different audio/speech inputs to specific time ranges (requires IDs on inputs).
Output resolution: Supports up to 4K content.

Provider-specific settings:

Parameters supported: syncMode, temperature, activeSpeakerDetection, occlusionDetectionEnabled, segments.

4K professional output

{
  "taskType": "videoInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d491",
  "model": "sync:lipsync-2-pro@1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
    "audio": [
      {
        "id": "professional-audio",
        "source": "b4c57832-2075-492b-bf89-9b5e3ac02503"
      }
    ]
  },
  "providerSettings": {
    "sync": {
      "syncMode": "bounce",
      "temperature": 0.3,
      "occlusionDetectionEnabled": true
    }
  }
}

Close-up with detail preservation

{
  "taskType": "videoInference",
  "taskUUID": "6ba7b828-9dad-11d1-80b4-00c04fd430c9",
  "model": "sync:lipsync-2-pro@1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
    "speech": [
      {
        "id": "cinematic-dialogue",
        "provider": {
          "name": "elevenlabs",
          "voiceId": "21m00Tcm4TlvDq8ikWAM"
        },
        "text": "This is a cinematic close-up with preserved facial details."
      }
    ]
  },
  "providerSettings": {
    "sync": {
      "syncMode": "bounce",
      "temperature": 0.4
    }
  }
}

react-1

Sync's react-1 extends beyond lip synchronization to re-animate emotional delivery, micro-expressions, and facial performance. It enables directors to modify acting after the fact without reshooting, allowing dialogue reinterpretation with different emotional style.

Model AIR ID: sync:react-1@1.

Supported workflows: Video-to-video with performance re-animation.

Technical specifications:

Input video: Required via inputs.video.
Input audio: Supports audio files via inputs.audio or text-to-speech via inputs.speech.
Edit regions: Control scope of facial modifications (lips, face, or head).
Emotion guidance: Direct emotional performance with single-word emotion prompts.
Output resolution: Supports up to 4K content.

Provider-specific settings:

Parameters supported: syncMode, temperature, activeSpeakerDetection, occlusionDetectionEnabled, editRegion, emotionPrompt.

Segment-based control is coming soon for react-1, which will enable precise mapping of different audio inputs and emotion prompts to specific time ranges within the video.

Emotional re-interpretation

{
  "taskType": "videoInference",
  "taskUUID": "f47ac10b-58cc-4372-a567-0e02b2c3d492",
  "model": "sync:react-1@1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
    "audio": [
      {
        "id": "new-performance",
        "source": "b4c57832-2075-492b-bf89-9b5e3ac02503"
      }
    ]
  },
  "providerSettings": {
    "sync": {
      "syncMode": "bounce",
      "temperature": 0.7,
      "editRegion": "face",
      "emotionPrompt": "happy"
    }
  }
}

Full head performance

{
  "taskType": "videoInference",
  "taskUUID": "6ba7b829-9dad-11d1-80b4-00c04fd430c9",
  "model": "sync:react-1@1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
    "speech": [
      {
        "id": "dramatic-reading",
        "provider": {
          "name": "elevenlabs",
          "voiceId": "21m00Tcm4TlvDq8ikWAM"
        },
        "text": "This is a dramatic performance with natural head movements."
      }
    ]
  },
  "providerSettings": {
    "sync": {
      "syncMode": "remap",
      "temperature": 0.8,
      "editRegion": "head",
      "emotionPrompt": "surprised",
      "activeSpeakerDetection": true
    }
  }
}

Emotion-guided localization

{
  "taskType": "videoInference",
  "taskUUID": "550e8400-e29b-41d4-a716-446655440011",
  "model": "sync:react-1@1",
  "inputs": {
    "video": "c64351d5-4c59-42f7-95e1-eace013eddab",
    "audio": [
      {
        "id": "localized-audio",
        "source": "b4c57832-2075-492b-bf89-9b5e3ac02503"
      }
    ]
  },
  "providerSettings": {
    "sync": {
      "syncMode": "bounce",
      "temperature": 0.6,
      "editRegion": "face",
      "emotionPrompt": "sad"
    }
  }
}