Meta launches SAM Audio
A unified model segments individual sounds from complex audio mixtures.
Meta launches SAM Audio, a unified AI model that segments individual sounds from complex audio mixtures.
The model isolates audio using text, visual and time span prompts, the company said in a statement. Meta said users could, for example, extract a guitar or vocals from a recorded video, filter traffic noise from outdoor footage, or remove a dog barking from a podcast recording.
Meta described three prompting modes: text prompting to specify sounds such as dog barking or singing voice; visual prompting to click on the person or object producing the sound; and span prompting, which the company called an industry first and which lets users mark time segments where target audio occurs.
The company said prompts can be used alone or combined for precise control and that SAM Audio supports use cases across music, podcasting, television, film, scientific research and accessibility. People can try SAM Audio in the Segment Anything Playground and download the model, starting today, the company added.
The Recap
- Meta introduced SAM Audio to segment sounds from audio mixtures.
- Supports text, visual and span prompts for audio extraction.
- Available to try in Segment Anything Playground today.