---
title: "Driving the avatar: text to speech or your own audio — HeyGen Avatar V | Runware Docs"
url: https://runware.ai/docs/models/heygen-avatar-v/guides/script-vs-audio
description: How to choose between the TTS path and the audio-input path when generating Avatar V videos. Covers avatar selection, voice swapping, speed tuning, and multilingual delivery from a single script.
---
### [Introduction](https://runware.ai/docs/models/heygen-avatar-v/guides/script-vs-audio#introduction)

Avatar V can be driven two ways: you write a script and the model speaks it (text-to-speech), or you provide your own audio file and the model lip-syncs the avatar to it. **You pick one or the other, not both**. The choice changes everything downstream, from which parameters are available to how fast you can iterate.

[Watch video](https://runware.ai/docs/assets/hero.4ZsyhoQD.mp4)

> **Prompt**: This is Avatar V. One brief becomes a video presenter that speaks your script, in any voice, on any background, in any language.

```json
{
  "taskType": "videoInference",
  "taskUUID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "This is Avatar V. One brief becomes a video presenter that speaks your script, in any voice, on any background, in any language.",
    "voice": "chill_brian_male_english"
  },
  "width": 1280,
  "height": 720
}
```

This guide covers the avatar selection that opens every request, both input modes for driving the speech, and the tuning parameters (speed, pitch, language) that come with the TTS path.

### [Two input modes](https://runware.ai/docs/models/heygen-avatar-v/guides/script-vs-audio#two-input-modes)

Every Avatar V request needs an avatar. After that, you supply **either** a `speech` block (with `text` and `voice`) **or** an `inputs.audio` reference. Sending both produces a validation error.

```json
// TTS path
{
  "inputs": { "avatar": "..." },
  "speech": { "text": "...", "voice": "..." }
}

// Audio path
{
  "inputs": { "avatar": "...", "audio": "..." }
}
```

Use the **TTS path** when you want fast iteration on copy and easy localization. Use the **audio path** when you already have the exact voice you want. Voice tuning parameters (`speed`, `pitch`, `volume`) only apply on the TTS path.

### [Picking an avatar](https://runware.ai/docs/models/heygen-avatar-v/guides/script-vs-audio#picking-an-avatar)

The `inputs.avatar` parameter is a string ID from a fixed catalog of registered looks. Each avatar looks and moves differently. Same script, same voice, four different presenters:

**1**:

[Watch video](https://runware.ai/docs/assets/avatar-swap-1.b_k8watj.mp4)

*man_casual_young_adult*

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "b2c3d4e5-f6a7-8901-bcde-f23456789012",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
    "voice": "jenny_female_english"
  },
  "width": 1280,
  "height": 720
}
```

**2**:

[Watch video](https://runware.ai/docs/assets/avatar-swap-2.Cj_GIDCi.mp4)

*woman_business_office*

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "c3d4e5f6-a7b8-9012-cdef-345678901234",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "woman_business_office"
  },
  "speech": {
    "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
    "voice": "jenny_female_english"
  },
  "width": 1280,
  "height": 720
}
```

**3**:

[Watch video](https://runware.ai/docs/assets/avatar-swap-3.DRoE9j0j.mp4)

*woman_middle_aged_sitting*

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "d4e5f6a7-b8c9-0123-def0-456789012345",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "woman_middle_aged_sitting"
  },
  "speech": {
    "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
    "voice": "jenny_female_english"
  },
  "width": 1280,
  "height": 720
}
```

**4**:

[Watch video](https://runware.ai/docs/assets/avatar-swap-4.C3Iso89g.mp4)

*casual_sitting_young_adult*

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "e5f6a7b8-c9d0-1234-ef01-567890123456",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "casual_sitting_young_adult"
  },
  "speech": {
    "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
    "voice": "jenny_female_english"
  },
  "width": 1280,
  "height": 720
}
```

All four videos use the same script and the same voice (`jenny_female_english`). On the male avatar this produces an intentional **voice/face mismatch**, which is exactly the point: lip sync adapts to whatever face you pair the audio with. Voice and avatar are **independent parameters**, picked separately.

> [!NOTE]
> Avatar IDs are validated against an enum at request time. Check the [API reference](https://runware.ai/docs/models/heygen-avatar-v) for the full list of available avatars. A registered avatar look is required for every request, including audio-path calls.

### [Driving with text](https://runware.ai/docs/models/heygen-avatar-v/guides/script-vs-audio#driving-with-text)

The TTS path takes a `speech.text` (the literal script) and a `speech.voice` (which voice reads it). Both are required together. Optional `language` defaults the voice's pronunciation to the target locale.

```json
[
  {
    "taskType": "videoInference",
    "taskUUID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "model": "heygen:avatar@5",
    "inputs": {
      "avatar": "man_casual_young_adult"
    },
    "speech": {
      "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
      "voice": "chill_brian_male_english"
    },
    "width": 1280,
    "height": 720
  }
]
```

#### [Swapping voices](https://runware.ai/docs/models/heygen-avatar-v/guides/script-vs-audio#swapping-voices)

The `voice` parameter has the largest impact on perceived performance. Same script, same avatar, four different voices:

[Watch video](https://runware.ai/docs/assets/voice-swap-1.B3GjTZJW.mp4)

*chill_brian_male_english*

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "f6a7b8c9-d0e1-2345-f012-678901234567",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
    "voice": "chill_brian_male_english"
  },
  "width": 1280,
  "height": 720
}
```

[Watch video](https://runware.ai/docs/assets/voice-swap-2.DkQnWCn5.mp4)

*baritone_ben_male_english*

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "a7b8c9d0-e1f2-3456-0123-789012345678",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
    "voice": "baritone_ben_male_english"
  },
  "width": 1280,
  "height": 720
}
```

[Watch video](https://runware.ai/docs/assets/voice-swap-3.CI8af_Oh.mp4)

*expressive_evan_male_english_6638ff*

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "b8c9d0e1-f2a3-4567-1234-890123456789",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
    "voice": "expressive_evan_male_english_6638ff"
  },
  "width": 1280,
  "height": 720
}
```

[Watch video](https://runware.ai/docs/assets/voice-swap-4.CugGyc5d.mp4)

*professor_dean_male_english*

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "c9d0e1f2-a3b4-5678-2345-901234567890",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
    "voice": "professor_dean_male_english"
  },
  "width": 1280,
  "height": 720
}
```

The voice catalog covers a wide range of registers and styles. Pick the one that fits the persona, then keep it fixed while you iterate on copy.

> [!WARNING]
> Voice IDs aren't tied to a specific avatar. A masculine-named voice on a feminine-presenting avatar will sync correctly, but the audio/visual mismatch is usually jarring for viewers. Pair voices and avatars deliberately.

### [Driving with audio](https://runware.ai/docs/models/heygen-avatar-v/guides/script-vs-audio#driving-with-audio)

When you already have the voice you want, send the audio directly. The `inputs.audio` parameter accepts a public URL or a UUID from any previously uploaded asset. The model **extracts phonemes** from the audio and animates the avatar to match.

The clip below was generated separately by [Inworld TTS-2](https://runware.ai/docs/models/inworld-tts-2) , then passed straight into Avatar V via its returned URL:

[Listen to audio](https://runware.ai/docs/assets/_source-audio.Bd2CvP--.mp3)

*Source audio, generated separately*

[Watch video](https://runware.ai/docs/assets/audio-path.CK9EbVMh.mp4)

*Avatar V driven by the audio above*

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "d0e1f2a3-b4c5-6789-3456-012345678901",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult",
    "audio": "https://example.com/audio.mp3"
  },
  "width": 1280,
  "height": 720
}
```

The audio-path request omits the `speech` block entirely:

```json
[
  {
    "taskType": "videoInference",
    "taskUUID": "b2c3d4e5-f6a7-8901-bcde-f23456789012",
    "model": "heygen:avatar@5",
    "inputs": {
      "avatar": "man_casual_young_adult",
      "audio": "https://example.com/audio.mp3"
    },
    "width": 1280,
    "height": 720
  }
]
```

The audio path is the right call when:

- You have a recording of a real human voice or output from a **voice clone** and want that delivery preserved exactly.
- The audio is the source of truth and the visual is the wrapper (podcasts, interviews, narration, dubbed content).

You lose access to `speech.speed`, `speech.pitch`, `speech.volume`, and `speech.language` on this path. Adjust those upstream, in whatever produced the audio.

### [Voice tuning](https://runware.ai/docs/models/heygen-avatar-v/guides/script-vs-audio#voice-tuning)

When you're on the TTS path, three numeric parameters reshape delivery without changing the voice or script.

- `speech.speed` ranges from `0.5` to `1.5`, default `1.0`. Below `0.85` the read feels deliberate. Above `1.15` it starts to feel rushed.
- `speech.pitch` ranges from `-50` to `+50`, default `0`. Small adjustments (±5 to ±15) shift the voice's age and tone without making it sound processed. Larger values quickly cross into chipmunk or robot territory.
- `speech.volume` ranges from `0.0` to `1.0`, default `1.0`. Most useful when you're mixing the avatar's voice into a track with background music or effects, where lowering the avatar's volume creates room for the mix.

Speed is the lever you'll reach for most often:

**×0.7**:

[Watch video](https://runware.ai/docs/assets/speed-slow.Dx6YW6ct.mp4)

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "e1f2a3b4-c5d6-7890-4567-123456789012",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
    "voice": "chill_brian_male_english",
    "speed": 0.7
  },
  "width": 1280,
  "height": 720
}
```

**×1.0**:

[Watch video](https://runware.ai/docs/assets/speed-normal.DA4DZ8UE.mp4)

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "f2a3b4c5-d6e7-8901-5678-234567890123",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
    "voice": "chill_brian_male_english",
    "speed": 1.0
  },
  "width": 1280,
  "height": 720
}
```

**×1.3**:

[Watch video](https://runware.ai/docs/assets/speed-fast.Dsj2y6Uu.mp4)

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "a3b4c5d6-e7f8-9012-6789-345678901234",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
    "voice": "chill_brian_male_english",
    "speed": 1.3
  },
  "width": 1280,
  "height": 720
}
```

The slower read suits training content where every word matters. The faster read fits social cuts where attention drops off after a few seconds. The default is calibrated for general-purpose narration.

### [Multilingual delivery](https://runware.ai/docs/models/heygen-avatar-v/guides/script-vs-audio#multilingual-delivery)

The `speech.language` parameter accepts a **BCP 47 locale code** (`en-US`, `es-ES`, `fr-FR`, `ja-JP`, and roughly 180 others). When set, the same voice adapts its pronunciation to the target language. Translate the script once, set the language, send the request:

**EN**:

[Watch video](https://runware.ai/docs/assets/multilang-en.B3AmjWGb.mp4)

*en-US*

> **Prompt**: Welcome to our team. Here's a quick overview of what you'll cover this week.

```json
{
  "taskType": "videoInference",
  "taskUUID": "b4c5d6e7-f8a9-0123-7890-456789012345",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "Welcome to our team. Here's a quick overview of what you'll cover this week.",
    "voice": "jenny_female_english",
    "language": "en-US"
  },
  "width": 1280,
  "height": 720
}
```

**ES**:

[Watch video](https://runware.ai/docs/assets/multilang-es.BhqPu9lK.mp4)

*es-ES*

> **Prompt**: Bienvenido a nuestro equipo. Aquí tienes un breve resumen de lo que verás esta semana.

```json
{
  "taskType": "videoInference",
  "taskUUID": "c5d6e7f8-a9b0-1234-8901-567890123456",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "Bienvenido a nuestro equipo. Aquí tienes un breve resumen de lo que verás esta semana.",
    "voice": "jenny_female_english",
    "language": "es-ES"
  },
  "width": 1280,
  "height": 720
}
```

**FR**:

[Watch video](https://runware.ai/docs/assets/multilang-fr.aHwk_mGn.mp4)

*fr-FR*

> **Prompt**: Bienvenue dans notre équipe. Voici un bref aperçu de ce que vous découvrirez cette semaine.

```json
{
  "taskType": "videoInference",
  "taskUUID": "d6e7f8a9-b0c1-2345-9012-678901234567",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "Bienvenue dans notre équipe. Voici un bref aperçu de ce que vous découvrirez cette semaine.",
    "voice": "jenny_female_english",
    "language": "fr-FR"
  },
  "width": 1280,
  "height": 720
}
```

**JA**:

[Watch video](https://runware.ai/docs/assets/multilang-ja.D-2yYp9w.mp4)

*ja-JP*

> **Prompt**: 私たちのチームへようこそ。今週学ぶ内容の概要を簡単にご紹介します。

```json
{
  "taskType": "videoInference",
  "taskUUID": "e7f8a9b0-c1d2-3456-0123-789012345678",
  "model": "heygen:avatar@5",
  "inputs": {
    "avatar": "man_casual_young_adult"
  },
  "speech": {
    "text": "私たちのチームへようこそ。今週学ぶ内容の概要を簡単にご紹介します。",
    "voice": "jenny_female_english",
    "language": "ja-JP"
  },
  "width": 1280,
  "height": 720
}
```

All four videos use the **same avatar** and the **same voice** (`jenny_female_english`). The only differences are the translated script and the locale code:

```json
// English
"speech": { "text": "Welcome to our team...", "voice": "jenny_female_english", "language": "en-US" }

// Spanish
"speech": { "text": "Bienvenido a nuestro equipo...", "voice": "jenny_female_english", "language": "es-ES" }
```

This is the cheapest way to localize a video at scale. One script, one voice, one avatar, looped over a list of locale codes and translations.

> [!NOTE]
> The voice catalog is largely English-named but each voice can speak any supported language. If you need a voice built for a specific language, check the catalog for entries whose name reflects that language. Otherwise let the voice adapt via the `language` parameter.

### [Tips](https://runware.ai/docs/models/heygen-avatar-v/guides/script-vs-audio#tips)

1. **Lock the avatar and voice before iterating on copy.** Both are visible-in-the-output decisions. Changing them mid-iteration resets your sense of what the read sounds like and slows you down.
    
2. **Use the TTS path for A/B testing copy.** Two requests with different `speech.text` produce two videos in minutes. The same iteration on the audio path requires re-recording or re-generating audio first.
    
3. **Use the audio path for brand voices.** If your brand has a specific human voice associated with it, regenerate that voice upstream (clone, recording, separate TTS provider) and feed Avatar V the audio. The lip sync handles the rest.
    
4. **Test the avatar/voice pairing on a short script first.** A 10-second take renders faster and surfaces any avatar/voice mismatch just as clearly as a full minute. Once the pairing feels right, send the full script.
    
5. **Translate, don't transliterate.** When localizing, get a translation that reads naturally in the target language, then send the translated string as `speech.text`. The language code alone won't fix awkward source copy.