lipsync-2

Zero-shot audio-driven lip synchronization for video

lipsync-2 is a zero-shot lip synchronization model that aligns spoken audio to existing video while preserving the speaker’s identity and natural speaking style. It works across live-action, animation, and AI-generated footage without training or fine-tuning.

Commercial use

Video to VideoAudio to VideoText to AudioAudio to Audio

$0.0440 per second

Average savings vs typical market rates

Per second of audioSave ~12%$0.0440

More models from this creator

View Model

Api Only

react-1

react-1 is a video performance editing model designed for post-production direction without reshoots. It modifies acting and emotional delivery within existing footage while preserving identity and visual continuity, enabling directors to reshape performances using audio or written guidance.

View Model

Api Only

lipsync-2-pro

lipsync-2-pro extends lipsync-2 with diffusion-based enhancement for studio-grade lip synchronization. It preserves fine facial details such as teeth, facial hair, and micro-expressions while supporting high-resolution output suitable for professional post-production workflows.