Details
- Stability AI has teamed up with Arm to run its Stable Audio Open model natively on smartphones, allowing text-to-audio generation without cloud connectivity as of March 2025.
- This collaboration uses Arm's KleidiAI libraries and advanced model distillation techniques to optimize the technology for Armv9 CPUs.
- Model inference time has been reduced dramatically from 240 seconds to just 8 seconds, achieving a 30-fold speedup while preserving 44.1kHz stereo output quality.
- The development utilizes Arm's established AI runtimes including ExecuTorch and XNNPack, along with Stability AI's open-source audio generation architecture.
- The enhanced model is accessible through Hugging Face, specifically targeting creators who need fast sound effect generation.
Impact
This partnership marks a significant leap for edge-based generative AI, decreasing reliance on the cloud and enabling real-time creativity on mobile devices. Widespread availability of high-fidelity audio generation on current hardware could spur new tools for content creators and signals a broader movement toward on-device AI across media applications.