Fish Audio S2.1 Pro

by Fish AudioJune 1, 2026

Fish Audio S2.1 Pro is a flagship text-to-speech model built for highly expressive, low-latency speech generation. It supports natural-language bracket cues for emotion and delivery control, multi-speaker dialogue in a single generation, 80+ languages with automatic language detection, and realtime streaming with very fast time to first audio.

Emotion and expression control
How to control vocal delivery in Fish Audio S2-Pro with bracket tags. The tag system steers emotion, expression, paralanguage, and phoneme-level pronunciation in one inline syntax.
Multi-speaker dialogue
How to generate two-speaker dialogue audio in a single request to Fish Audio S2-Pro using inline speaker tags. One call, two voices, full per-speaker emotion control.