OmniHuman-1.5

Cognitive avatar video from image, audio, and text

OmniHuman-1.5 generates high fidelity avatar video from a single image with audio and optional text prompts. It fuses multimodal reasoning with diffusion motion to keep identity stable, lip sync accurate, and gestures context aware for long, multi subject clips.

Commercial use

Image to VideoAudio to VideoText to Video

Average savings vs typical market rates

10 secondsSave ~17%1.3247