Meta Unveils SAM Audio: Unified AI Model for Multimodal Sound Separation

Details

Meta has released SAM Audio, a unified AI model designed for audio separation that enables users to isolate distinct sounds using text, visual, and temporal prompts.
SAM Audio leverages Perception Encoder Audiovisual (PE-AV), extending Meta's earlier open-source vision AI for precise multimodal sound separation.
The model allows users to identify sounds by describing them in text, clicking on sources in a video, or selecting time spans, simplifying a previously manual and specialized task.
Alongside the model, Meta launched SAM Audio-Bench for real-world audio separation benchmarking and SAM Audio Judge, an automatic evaluation system reflecting human perceptual judgments.
SAM Audio operates faster than real time and has demonstrated superior performance over specialized tools in separating speech, music, and general sounds, with accessibility partnerships targeting hearing aids and inclusive technology.

Impact

With SAM Audio, Meta deepens its push into generative and multimodal AI, bringing professional-grade audio manipulation tools to a broader set of users. The combination of cutting-edge performance and intuitive multimodal prompting sets a new bar for audio processing, likely to shape workflows in content creation, accessibility, and beyond as Meta challenges incumbents in the creative and media technology space.

Meta Unveils SAM Audio: Unified AI Model for Multimodal Sound Separation

Details

Impact

Social

CONTENT

INFO