
sync.
Production grade AI lip sync and video dubbing
Sync.so develops lip synchronization models that generate realistic mouth movement in video from a supplied audio track. The technology is used to align spoken dialogue with facial motion on existing footage, supporting automated dubbing and localization workflows. The models integrate alongside image, video, and audio generation systems to add lip sync capabilities without manual animation or custom facial rigs.
Models by sync.
sync-3 is a lip synchronization model that processes entire shots as a single generation rather than stitching independent segments. It builds a global understanding of the speaker across all frames, enabling consistent output on close-ups, extreme face angles, partially obscured faces, and obstructed mouths. The model preserves the original speaker's style, cadence, and emotional expression across 95+ languages.
react-1 is a video performance editing model designed for post-production direction without reshoots. It modifies acting and emotional delivery within existing footage while preserving identity and visual continuity, enabling directors to reshape performances using audio or written guidance.
lipsync-2-pro extends lipsync-2 with diffusion-based enhancement for studio-grade lip synchronization. It preserves fine facial details such as teeth, facial hair, and micro-expressions while supporting high-resolution output suitable for professional post-production workflows.
lipsync-2 is a zero-shot lip synchronization model that aligns spoken audio to existing video while preserving the speaker’s identity and natural speaking style. It works across live-action, animation, and AI-generated footage without training or fine-tuning.



