P-Video-Avatar

High-fidelity talking avatar video generation from a portrait with script or audio control

Text to VideoImage to VideoAudio to Video

Launch model

P-Video-Avatar

High-fidelity talking avatar video generation from a portrait with script or audio control

Text to VideoImage to VideoAudio to Video

Launch model

P-Video-Avatar Overview

P-Video-Avatar is a portrait-driven avatar video model that turns a single image into a speaking video using either an uploaded audio track or a generated voice from script. It is built for production avatar workflows with strong lip sync, selectable voices and languages, optional speaking-style control, seeded generation, and 720p or 1080p output for scalable talking-head video creation.

From $0.0250/ video

720p . 1s$ 0.025

1080p . 1s$ 0.045

Commercial use

How to Use P-Video-Avatar

Overview

P-Video-Avatar is a talking avatar video model that generates speaking portrait videos from a single image.

It is best suited to workflows where you want a portrait to speak from either a supplied audio track or a generated voice based on script, with strong lip sync, fast iteration, and production-ready cost efficiency.

Strengths

Script-Driven and Audio-Driven Avatar Generation

P-Video-Avatar supports two main operating modes: avatar generation from a written script with built-in voice generation, and avatar generation from an uploaded audio file. This makes it flexible for both synthetic narration workflows and pre-recorded voice performances.

Strong Lip Sync and Audio-Visual Alignment

The model is designed for talking-head generation with close alignment between speech and facial motion. It is a good fit for presenter-style videos, avatar explainers, and dialogue-driven portrait clips where timing accuracy matters.

Built-In Voice Selection

When using script-driven generation, the model supports a large set of selectable voices and multiple output languages. This helps teams create localized or stylistically varied avatar videos without needing a separate TTS system.

Speaking Style and Atmosphere Control

P-Video-Avatar exposes both voice_prompt and video_prompt controls. This makes it possible to steer delivery style, pacing, tone, and the surrounding visual mood rather than only generating a neutral talking head.

Resolution Options for Scaled Production

The model supports both 720p and 1080p output. This gives teams a practical path for balancing iteration cost against final delivery quality.

Capabilities

Portrait-to-Avatar Video

P-Video-Avatar accepts a single portrait image as the visual source and generates a speaking video from that image.

Script-to-Video

With voice_script, the model can generate avatar video directly from written speech using a selected synthetic voice.

Audio-to-Video

With an uploaded audio file, the model can animate the portrait to match the supplied voice performance.

Seeded Generation

The model supports a seed parameter for more reproducible generations when iterating on the same avatar setup.

Input and Output

AIR ID: prunaai:p-video@avatar
Input: one portrait image, plus either a script or an audio file
Output: talking avatar video
Resolution: 720p or 1080p
Voice controls: selectable voices, language selection, optional speaking-style prompt

Best Fit

Presenter and spokesperson videos
Avatar explainers and product demos
Localized talking-head content
Scripted social and marketing videos
High-volume portrait avatar generation