Ovi

Unified text to video with synchronized cinematic audio

Ovi
Commercial use
Text To VideoImage To VideoText To AudioVideo To Audio

Ovi is a unified audio video diffusion model that treats sound and visuals as one generative process. It uses twin DiT backbones with blockwise cross modal fusion to create synchronized speech, effects, and motion from text prompts or text plus image inputs in a single pass.