Fish Audio

Expressive multilingual speech generation with realtime delivery and voice model workflows

Fish Audio develops speech generation models focused on naturalness, low-latency streaming, multilingual delivery, and flexible voice control. Their platform spans text-to-speech, voice cloning, and custom voice model creation for conversational products, narration, agents, and interactive audio systems.

Models by Fish Audio

Launch View details

Fish Audio S2.1 Pro

Fish Audio S2.1 Pro is a flagship text-to-speech model built for highly expressive, low-latency speech generation. It supports natural-language bracket cues for emotion and delivery control, multi-speaker dialogue in a single generation, 80+ languages with automatic language detection, and realtime streaming with very fast time to first audio.