Deezer’s Spleeter: Deconstructing music with AI
Deezer, an online music streaming service recently released Spleeter, an ML tool used to deconstruct music into its constituent instrumental tracks.
Founded in August 2007, Deezer is an online music streaming service based in Paris, France. While Spotify has gained market leadership in the online music steaming space, Deezer continues to hold its own with 14 million monthly active users (MAU) in over 180 countries as of January 2019 . Per Similarweb.com estimates, 35% of Deezer’s users are in France, 11% in Brazil and the remainder spread out across the world.
What is Spleeter?
On November 4th 2019, Deezer released Spleeter, a machine learning tool for source separation. Spleeter is a project from Deezer’s research division and is made available online as a Python library based on Tensorflow. Although source separation remains a relatively obscure topic, its applications in music information retrieval (MIR) has the potential to have a far reaching impact on the way we are able to produce and consume music.
Source separation 101
At its core, source separation is the separation of a desired signal from a set of mixed signals. In music, this means deconstructing a recorded musical piece into its components by isolating each instrumental layer on the track. For example, 4-stem source separation on Coldplay’s hit single, “Yellow”, would yield the following layers (aka “stems”) in isolation:
- Vocals by Chris Martin, singing about the stars and such
- Drums/percussion by Will Champion
- Bass guitar by Guy Berryman
- Electric guitar by Jonny Buckland
While it may seem like a relatively straightforward process, accurate source separation is difficult to accomplish. Today, most professionally recorded music is made by recording each instrument on a separate channel, and then the final combined track is produced in a step called the “mixdown”. In this final step, all the individual tracks are blended together for mastering and then digitally compressed for delivery. All the sound waveforms are meshed together, in a process akin to an irreversible chemical reaction that is it impossible to undo.Nonetheless, Spleeter has made the impossibly difficult task of source separation a lot easier using machine learning.
How does Spleeter work?
A common technique used source separation is time-frequency (TF) masking. Different types of sounds in the musical track correspond to varying frequencies. For instance, the lead vocals would occupy different frequency bands as compared with the drums. Using TF masking, the mixture of frequencies that make up a piece of music is filtered, allowing us to pick and choose which frequencies to keep. What remains after this process is the separated stem of the instrument that we want to isolate.
The tricky part of this process is being able to approximate which frequencies correspond to which instruments. Given that the audible range of frequencies for humans is 20 to 20,000 Hz, a lot of processing is needed to accurately classify the broad range of frequencies contained in a musical track. Traditionally, this step was done manually by using snippets of isolated vocals (which are hard to find) to approximate the frequencies that should be left unmasked, thereby making a “minus one” track commonly used for bootleg karaoke.
Today, Spleeter does the heavy lifting as it comes with pretrained models for standard 2,4 and 5-stem separation. Using Spleeter is as simple as installing the package and running the separator function on a command line interface, which then creates a .wav file for each stem. In addition, Spleeter also allows users to train custom source separation models and evaluate the models against a benchmark dataset (musDB18 available on SigSep, another open source project).
Applications and challenges
Deezer is in a unique position to build a generalized source separation engine because it has access to a large catalog of music that few organizations do. However, a major challenge is that Deezer is not legally or ethically able to release the vast library of stems made from its own catalog due to copyrighting. So how does Deezer capture value?
Deezer can use its musical data assets to train its own machine learning models, which can then be deployed to improve its user value proposition. In addition, Spleeter enables Deezer to perform source separation at scale. On the GPU version, one can expect separation at 100x faster than real-time , which allows for rapid deployment on its growing catalog of music. As such, it is conceivable that Deezer can accomplish the following:
Make novel music recommendations. Deezer can use the finer data granularity to cluster music by similarity for a given instrument. For example, Deezer could identify bands that have lead singers who have a similar vocal timbre as Coldplay’s lead singer, Chris Martin. Coldplay fans might hence get recommendations to give Blue Merle a listen.
Machine-learning enabled classification of music by genre. Most music genres have unique instrumental characteristics. For instance, music in the drum and bass genre has a distinctive style of complex percussive syncopation whereas the house music genre almost always has a steady rhythm in 4/4 time. Using Spleeter, Deezer can isolate the drum stems of the music in its catalog and classify them along the axes of style and speed. As a result, Deezer can use machine-learning to automatically classify music by genre as it is able to deconstruct a mixdown and find similarities across different songs for any given instrument or combinations of instruments. Automatic classification would allow users to browse music by genre without requiring Deezer to manually label each track in its catalog.
Deezer should maintain Spleeter as an open source project that is kept autonomous from the consumer-facing business unit. In my opinion, the real value of a generalized source separation engine lies with applications in music production. Having access to high quality stems allows music producers to experiment, stretch the boundaries of musical genres and create new affective experiences for listeners. By becoming a leader in the open source music community, Deezer would cement its position as the center of gravity for digital innovation in music. In doing so, it will capture a fair portion of the value created as it gains brand recognition over its famous rival, Spotify.
 “About Us”. 2019. Deezer. https://www.deezer.com/us/company.
 Rafii, Zafar, Antoine Liutkus, Fabian-Robert Stoter, Stylianos Mimilakis, Derry FitzGerald, and Bryan Bryan. 2019. “An Overview Of Lead And Accompaniment Separation In Music”. IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, no. 201: 3.
 “Releasing Spleeter: Deezer R&D Source Separation Engine”. 2019. Medium. https://deezer.io/releasing-spleeter-deezer-r-d-source-separation-engine-2b88985e797e.
Student comments on Deezer’s Spleeter: Deconstructing music with AI
I wonder how strong is the potential for music production / DJs. If this tool was able to fully de-construct the building blocks of an audio track, it could split the pieces for producers to use and mix freely. Eg. Have Chris Martin singing Yellow a-capella and then bring it into my own hip-hop or electro-pop masterpiece in the making. Boom!
Great article. I wonder what the artists’ perception of Spleeter is.
As a DJ, it is common to mix tracks and sounds (cf. Soundcloud). For a band, however, would the band agree to have its music/members’ contributions further digitally dissected? Are there different levels of copyright regulations?
As you mention, Deezer is avoiding this issue by not using its copyrighted catalog to train the model however by making it open-source, aren’t they opening a new pandora box?
The copyright issue is an interesting one, and as we know the music industry is concentrated in a number of very large record labels, who own the vast majority of the music catalog. While I agree that the company should keep its applications open-source, I wonder if a viable path to value capture would be to partner with one of the major record labels. My deconstructing some of the more famous tracks, they can maybe use some of this AI to understand where consumer preferences are going, and give these recommendations to artists to then create new music. In addition, it could work with DJs to create sounds that have never been heard before.
Since Spotify already has its own semi-AI recommendation algorithm, it may make sense to try and create differentiation through a different value-add, mainly on the musician side as opposed to the consumer side.
That’s a remarkable depths of analysis, cool!
With all my love to open source projects, monetization is still uncertain. Professional usage is very limited and the idea of using almost anything to predict music the use like didn’t really work yet on my opinion. Did you use any service that was particularly good in that?