MODEL ID prunaai:p-video@animate
live

P-Video-Animate

P-Video-Animate is a motion-transfer video model that animates a single reference image using a source video as the motion driver. It preserves the original acting, timing, camera movement, and scene structure from the driving clip while restyling the output around the supplied image. It is well suited to UGC ad variations, meme remixes, character or avatar recasting, and other high-volume creative workflows that need fast, repeatable image-to-video animation with strong motion fidelity.

P-Video-Animate

Animating images with a source video

How to use P-Video-Animate to animate a reference image with the motion, timing, and camera movement from a source video. Covers the pairing rule, optional prompt steering, and four concrete patterns.

Introduction

Reusing the same motion across different visuals is awkward in most video pipelines. You can describe what you want in a text prompt and hope the model generates something close, but the timing, hand position, and expression come back different every time. General-purpose video editors can rework one clip, but they can't take a video's motion and apply it to a static image of a different subject.

P-Video-Animate makes that workflow direct. You pass one reference image and one reference video. The image controls who is on screen, and the video controls what happens. The model returns a new video that animates the image's character using the video's exact motion, timing, camera movement, and scene structure.

Animate the content creator in the reference image using the source video. He speaks with confident creator energy directly to the camera, brief warm smile, slight head nod, keeping the same casual posture and the vibrant neon-lit creator studio glowing behind him.

This guide covers the request shape, how to pair an image and video so the model has what it needs, when to add an optional prompt to refine specifics, and four concrete patterns to start from.

Request shape

A P-Video-Animate request takes one reference image, one reference video, and a small set of optional parameters:

[
  {
    "taskType": "videoInference",
    "taskUUID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "model": "prunaai:p-video@animate",
    "inputs": {
      "referenceImages": ["https://example.com/portrait.jpg"],
      "referenceVideos": ["https://example.com/source-motion.mp4"]
    },
    "resolution": "720p",
    "settings": { "preserveAudio": true }
  }
]
{
  "data": [
    {
      "taskType": "videoInference",
      "taskUUID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "videoUUID": "f1e2d3c4-b5a6-7890-1234-567890abcdef",
      "videoURL": "https://vm.runware.ai/video/os/a14d18/ws/2/vi/f1e2d3c4-b5a6-7890-1234-567890abcdef.mp4"
    }
  ]
}

Two required fields, the rest optional:

  • inputs.referenceImages takes exactly one image. Accepts a public URL, base64 string, data URI, or a UUID from a previous generation or the Image Upload API . The image's character is what gets animated.
  • inputs.referenceVideos takes exactly one video. Accepts a public URL or a UUID from a previous generation. The video supplies the motion.
  • positivePrompt is optional. Use it to override or refine specifics from the source motion. See Steering with a prompt below.
  • resolution is "720p" (default) or "1080p". The output aspect ratio is inferred from the source video.
  • fps is 24 or 48. Omit to preserve the source video's frame rate. Higher values render smoother motion at a higher cost.
  • seed is an integer for reproducibility.
  • settings.preserveAudio keeps the source video's audio track in the output. Defaults to true. Set to false for muted output.

Pairing the image and video

The single largest factor in result quality is how well the reference image matches the first frame of the source video. The model can absorb small differences, but a large mismatch in framing or subject visibility produces visible distortion.

For each pair, ask three questions:

  1. Framing. Does the image show the same body region as the video (head-and-shoulders, medium shot, or full body)?
  2. Pose. Is the subject in roughly the same position (facing the camera, arms in roughly the same place)?
  3. Subject visibility. Is the subject's body visible in the same way, without occlusions or cropping that the video doesn't have?

When all three line up, the model has a clean starting point and the motion transfers without artifacts.

Both the image and the first frame of the video show a centered head-and-shoulders portrait facing the camera. The model transfers the talking-head motion onto the businesswoman without warping the framing.

When the reference image and reference video share a clean subject and matching framing, the model also picks up secondary motion the source contains but doesn't strictly describe: subtle camera drift, hair movement, soft shadows shifting with the body. Mismatched pairs lose all of that.

When the pair doesn't match

Pairing a head-and-shoulders image with a full-body source video gives the model no body to map the choreography onto. Rather than distorting the subject or hallucinating limbs, the model falls back to barely animating what it has, and most of the source motion is lost.

Same image, but the source video is now a full-body dance shot. The image has nothing below the shoulders for the model to map the choreography onto. The subject still looks at the camera and the head drifts subtly, but the dance itself doesn't transfer. If the motion you want needs a full body, generate a full-body reference image to match.

When you already have a reference image you like but the framing or pose doesn't quite line up with the source video, you can edit the image to match using P-Image-Edit before passing it here. Reposition the subject, adjust the framing, or change the pose, then animate the edited image with this model.

Steering with a prompt

positivePrompt is optional. Without one, the model transfers the source video's motion as-is. With one, you can override or refine specific behaviors: an expression, a hand position, a moment of emphasis, the words being lip-synced.

Reach for a prompt when:

  • The source motion is mostly right but one detail needs to change (raise an eyebrow, hold a smile longer, keep both hands up)
  • You want lip-sync to specific words rather than a generic mouth shape
  • A small action needs to be added or removed (a nod at the end, a head turn, a wave)

Leave it blank when the source video already does exactly what you want.

Both outputs use the same image and video. The prompted version added a thumbs-up gesture at the end of the clip that the source video doesn't contain. The rest of the body motion still comes from the source.

What to write

Describe the specific behavior you want to override. The character and setting are already locked by the image. Write what the subject does, when, and how.

Useful (used for the video above):

"At the very end of the clip, just after her last gesture, she gives a clear thumbs-up directly toward the camera. Keep the source motion otherwise."

Less useful:

"A confident woman in a charcoal blazer speaks to the camera in a modern office."

The second prompt repeats what the image already shows. The first prompt adds one specific action on top of the source motion, which is what the prompt actually controls.

Patterns

The model is style-agnostic: as long as the pairing rule holds, the visual style of the reference image carries directly into the output. The same model handles photorealistic portraits, 2D cartoons, stylized 3D characters, and brand mascots without changing approach.

Photorealistic: skin, hair, and lighting carry through
Cartoon: line work and flat colors preserved
3D character: rendered look and stylized proportions intact
Mascot: silhouette, color, and rendering style locked

Tips

  1. Match the first frame. Pose, framing, and subject visibility in the reference image should line up with the first frame of the source video.

  2. Pick source videos with clear, readable motion. Sharp, well-lit motion transfers cleanly. Blurry, fast-cut, or low-contrast source clips produce blurry, less coherent results.

  3. Use the prompt for behavior, not framing. The image already controls who is on screen. The prompt should describe what the character does, when, and how. Repeating the image description doesn't add anything.

  4. Plan around generation time. The model produces output at roughly five seconds of compute per one second of video. For longer clips, split the source video and animate the segments separately.

  5. Use a dedicated tool when motion transfer isn't the goal. Object removal, background replacement, scene rewriting, and frame-exact edits aren't what this model does. Pair it with the right tool for those workflows rather than trying to force it.