Details
- Alibaba-backed Qwen introduced Qwen3-TTS-Flash on September 22, 2025, marketing it as the company’s next-generation text-to-speech engine.
- The model is optimized for Mandarin Chinese and English, with the development team touting “best-in-class stability” that minimizes stutters and mispronunciations in longer readings.
- Benchmarks from the launch highlight state-of-the-art word-error rates across Chinese, English, Italian, and French, showing improved multilingual handling compared to earlier Qwen TTS models.
- A public demo, technical blog, and launch video were released alongside the announcement, and model checkpoints are expected soon under Qwen’s usual open-source license.
- The model features 17 built-in voice presets, allowing developers to switch speaker styles without extra training data.
- Qwen3-TTS-Flash follows the company’s naming scheme connecting speech tools to its Qwen3 language-model family, hinting at future multimodal assistant integrations.
Impact
This launch puts new pressure on global voice AI players such as Google’s AudioLM, OpenAI’s TTS-1, and ElevenLabs, especially with Qwen’s superior support for Chinese—a market where overseas competitors have struggled. The multilingual capabilities and regulatory compliance position Qwen well for rapid adoption in sectors such as call centers, audiobooks, and gaming, and bolster its role in shaping the next wave of real-time, multimodal AI assistants across Asia.