Ovi
Unified text to video with synchronized cinematic audio
Ovi is a unified audio video diffusion model that treats sound and visuals as one generative process. It uses twin DiT backbones with blockwise cross modal fusion to create synchronized speech, effects, and motion from text prompts or text plus image inputs in a single pass.
