Qwen Unveils Qwen3-LiveTranslate-Flash for Instant Multimodal Language Translation

Details

Qwen introduced Qwen3-LiveTranslate-Flash, a live translation engine that handles speech, text, lip-reading, and gestures at the same time.
The system understands 18 languages and six regional dialects, and can vocalize replies in 10 languages without using the cloud.
Its vision layer detects on-screen captions, signs, and facial cues to enhance accuracy when audio is poor or background noise is high.
Demos show end-to-end translation in under 500 ms on a single NVIDIA L4 GPU, an improvement over the previous 700 ms with Qwen2.
The SDK will be available for Android, iOS, and WebRTC in October 2025, with an enterprise on-premises appliance coming in early 2026 including privacy controls.
Pricing is set at US$0.002 per translated word for cloud users, with discounts for volumes above 5 million words per month.
A "Flash" mode can lower video frame rates to 12 fps, reducing compute expenses by 30 percent compared to the top-tier Qwen3-LiveTranslate-Pro.
Beta testers include Alibaba’s DingTalk for multilingual conferences and China Eastern Airlines for spoken in-flight messages.

Impact

This rollout intensifies the race with Google Interpreter Mode and Meta’s SeamlessM4T, as neither currently features integrated lip-reading. With latency now below half a second, Qwen's solution could unlock real-time translation in new arenas like esports and legal transcription. Compliance-ready on-premise hardware and aggressive pricing position Qwen as a formidable challenger to Microsoft Azure Speech in the enterprise segment.

Qwen Unveils Qwen3-LiveTranslate-Flash for Instant Multimodal Language Translation

Details

Impact

Social

CONTENT

INFO