sync-3

Full-scene lip synchronization with global face understanding and obstruction handling

Video to VideoAudio to Video

Launch model

sync.

sync-3

Full-scene lip synchronization with global face understanding and obstruction handling

Video to VideoAudio to Video

Launch model

sync-3 Overview

sync-3 is a lip synchronization model that processes entire shots as a single generation rather than stitching independent segments. It builds a global understanding of the speaker across all frames, enabling consistent output on close-ups, extreme face angles, partially obscured faces, and obstructed mouths. The model preserves the original speaker's style, cadence, and emotional expression across 95+ languages.

From $0.1330/ video

1 sec$0.133

Commercial use

How to Use sync-3

Overview

Sync-3 is a lip sync model built for full-scene generation. Instead of stitching clips together, it processes the entire shot in one go. That means more consistent results and fewer visual glitches.

It keeps the speaker’s style and emotion intact across 95+ languages. It also handles tricky situations like close-ups, extreme angles, and partially hidden faces or mouths.

How it works

Understanding the input

Sync-3 works with video and audio, not text prompts. It looks at the whole sequence to understand the speaker across every frame. This helps it keep expression and movement consistent from start to finish.

Because it reads the full scene, it knows what to change and what to leave alone.

Lip sync

The model matches mouth movements to your audio. It processes the full shot at once, so the result feels smooth and natural without awkward transitions.

Scene awareness

Sync-3 sees more of the scene than earlier models. That wider view helps it handle angles and obstructions around the face while keeping movement believable. You’ll spend less time fixing issues and rerunning outputs.

Key features

Full-scene lip sync

Processes the entire shot in one pass for consistent results.
Global face understanding

Keeps identity, expression, and movement stable across frames.
Obstruction handling

Works well even when parts of the face or mouth are hidden.
Multi-language support

Maintains style and expression across 95+ languages.
Wider scene context

Uses more visual context to improve accuracy.

How to use

Upload your video (MP4) and the audio you want to sync (WAV or MP3).
Choose the model and endpoint.
Run the sync.
Download your finished video.

For best results, use clear audio and high-quality video with a visible face.

Documentation

Full docs, parameters, and examples: https://runware.ai/docs/models/sync-3

More models from sync.

View details

Api Only

react-1

react-1 is a video performance editing model designed for post-production direction without reshoots. It modifies acting and emotional delivery within existing footage while preserving identity and visual continuity, enabling directors to reshape performances using audio or written guidance.

View details

Api Only

lipsync-2

lipsync-2 is a zero-shot lip synchronization model that aligns spoken audio to existing video while preserving the speaker’s identity and natural speaking style. It works across live-action, animation, and AI-generated footage without training or fine-tuning.

View details

Api Only

lipsync-2-pro

lipsync-2-pro extends lipsync-2 with diffusion-based enhancement for studio-grade lip synchronization. It preserves fine facial details such as teeth, facial hair, and micro-expressions while supporting high-resolution output suitable for professional post-production workflows.