Details

  • Stability AI has teamed up with Arm to run its Stable Audio Open model natively on smartphones, allowing text-to-audio generation without cloud connectivity as of March 2025.
  • This collaboration uses Arm's KleidiAI libraries and advanced model distillation techniques to optimize the technology for Armv9 CPUs.
  • Model inference time has been reduced dramatically from 240 seconds to just 8 seconds, achieving a 30-fold speedup while preserving 44.1kHz stereo output quality.
  • The development utilizes Arm's established AI runtimes including ExecuTorch and XNNPack, along with Stability AI's open-source audio generation architecture.
  • The enhanced model is accessible through Hugging Face, specifically targeting creators who need fast sound effect generation.

Impact

This partnership marks a significant leap for edge-based generative AI, decreasing reliance on the cloud and enabling real-time creativity on mobile devices. Widespread availability of high-fidelity audio generation on current hardware could spur new tools for content creators and signals a broader movement toward on-device AI across media applications.