
Fish Audio
Expressive multilingual speech generation with realtime delivery and voice model workflows
Fish Audio develops speech generation models focused on naturalness, low-latency streaming, multilingual delivery, and flexible voice control. Their platform spans text-to-speech, voice cloning, and custom voice model creation for conversational products, narration, agents, and interactive audio systems.
Models by Fish Audio
Fish Audio S2.1 Pro is a flagship text-to-speech model built for highly expressive, low-latency speech generation. It supports natural-language bracket cues for emotion and delivery control, multi-speaker dialogue in a single generation, 80+ languages with automatic language detection, and realtime streaming with very fast time to first audio.
