Dia 1.6B

Dialogue TTS with multi-speaker generation, voice cloning, and non-verbal cues

Text to Audio

Dia 1.6B Overview

Dia 1.6B is a 1.6 billion parameter text-to-speech model from Nari Labs that generates realistic dialogue from transcripts in a single pass. It supports multi-speaker generation via speaker tags, voice cloning from 5-10 seconds of reference audio, and non-verbal cues like laughter, sighs, coughs, and throat clearing. English only. Released under Apache 2.0 for commercial use.