Fish Audio

Fish Audio

Expressive multilingual speech generation with realtime delivery and voice model workflows

Fish Audio develops speech generation models focused on naturalness, low-latency streaming, multilingual delivery, and flexible voice control. Their platform spans text-to-speech, voice cloning, and custom voice model creation for conversational products, narration, agents, and interactive audio systems.

Models by Fish Audio

Fish Audio S2.1 Pro is a flagship text-to-speech model built for highly expressive, low-latency speech generation. It supports natural-language bracket cues for emotion and delivery control, multi-speaker dialogue in a single generation, 80+ languages with automatic language detection, and realtime streaming with very fast time to first audio.