Stability AI and Arm Bring Fast, On-Device Generative Audio to Smartphones

Details

Stability AI has teamed up with Arm to run its Stable Audio Open model natively on smartphones, allowing text-to-audio generation without cloud connectivity as of March 2025.
This collaboration uses Arm's KleidiAI libraries and advanced model distillation techniques to optimize the technology for Armv9 CPUs.
Model inference time has been reduced dramatically from 240 seconds to just 8 seconds, achieving a 30-fold speedup while preserving 44.1kHz stereo output quality.
The development utilizes Arm's established AI runtimes including ExecuTorch and XNNPack, along with Stability AI's open-source audio generation architecture.
The enhanced model is accessible through Hugging Face, specifically targeting creators who need fast sound effect generation.

Impact

This partnership marks a significant leap for edge-based generative AI, decreasing reliance on the cloud and enabling real-time creativity on mobile devices. Widespread availability of high-fidelity audio generation on current hardware could spur new tools for content creators and signals a broader movement toward on-device AI across media applications.

Stability AI and Arm Bring Fast, On-Device Generative Audio to Smartphones

Details

Impact

Social

CONTENT

INFO