OmniHuman-1.5

Cognitive avatar video from image, audio, and text

OmniHuman-1.5
Commercial use
10 seconds1.3247
Image To VideoAudio To VideoText To Video

OmniHuman-1.5 generates high fidelity avatar video from a single image with audio and optional text prompts. It fuses multimodal reasoning with diffusion motion to keep identity stable, lip sync accurate, and gestures context aware for long, multi subject clips.