lipsync-2

Zero-shot audio-driven lip synchronization for video

lipsync-2

lipsync-2 is a zero-shot lip synchronization model that aligns spoken audio to existing video while preserving the speaker’s identity and natural speaking style. It works across live-action, animation, and AI-generated footage without training or fine-tuning.

sync.
Commercial use
video-to-videoaudio-to-videotext-to-audioaudio-to-audio
$0.0440 per second

Average savings vs typical market rates

Per second of audioSave ~12%$0.0440