This AI uses sight to isolate the sound of specific instruments

If you’ve ever learnt an instrument, straining to hear what your favorite musician is playing under the din of other players is probably familiar to you.

Luckily for all the budding Hendrixes, Rachmaninoffs, and Harold Bishops (from the Australian soap opera Neighbours) out there, MIT has a solution. The university’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has created an AI system that can analyze a musical performance and isolate the sounds of specific instruments.

Check it in action here:

Named PixelPlayer, it uses something the researchers refer to as “self-supervised” deep learning. In other words, it requires no direct human interaction and, weirdly, the team involved don’t really know exactly what the AI is doing. Yay for the singularity!

TNW City Coworking space - Where your best work happens

A workspace designed for growth, collaboration, and endless networking opportunities in the heart of tech.

Book a tour now

What sets PixelPlayer apart from other attempts to isolate sound in this manner though is its use of vision. It locates the literal pixels in the video that are producing sound, hence its name. This allows it to ‘see’ where the sound is originating, letting it split the sounds accurately.

The system was trained on over 60 hours of video and can currently identify more than 20 common instruments. The researchers believe that if it’s given the chance to analyze more data, this number can increase. Although, they did raise doubts about its ability to separate similar sounding instruments, such as a violin or viola for example.

“We were surprised that we could actually spatially locate the instruments at the pixel level,” said Hang Zhao, lead author of the paper and a PhD at CSAIL. “Being able to do that opens up a lot of possibilities, like being able to edit the audio of individual instruments by a single click on the video.”

With the technology in its infancy, it has a whole load of potential. Whether it’s changing the mix, levels, or quality of old video recordings, improving how robots understand sound, or just letting people get better at their instrument, I’m excited to see where PixelPlayer goes.

Story by Callum Booth

Callum Booth is a freelance journalist with over a decade of experience. Previously, he was the Managing Editor of TNW, where his reporting (show all) Callum Booth is a freelance journalist with over a decade of experience. Previously, he was the Managing Editor of TNW, where his reporting was cited widely, including in VICE, the FT, and the BBC. Callum’s writing has appeared in The Verge, The Daily Telegraph, Time Out, and many more. He covers the full spectrum of technology, with a particular focus on how it shapes our daily lives. And a lot of regulation stuff too. Outside of work, Callum’s an avid bookworm, a Fisherman’s Friends addict, and resolutely unshaven. Follow him on Twitter @CallumBooth or visit www.callumbooth.net.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

This AI uses sight to isolate the sound of specific instruments

Get the TNW newsletter

Anthropic’s model shutdown just handed India’s sovereign AI movement its strongest argument yet

Why Apple built a third-party AI system for Siri and then refused to show it at WWDC

Discover TNW All Access

De Beers weaponises blockchain to fight lab-grown diamonds, but a 45% price crash looms large

Grassroots opposition blocked $130 billion in US data center projects in the first three months of 2026