Window into the soul?: Machine learning for song recommendations at Spotify
“Music is like a window do your soul”… “it tells people a lot about who you are and what you care about, whether you like it or not.”
-Christine Hung, Head of Data Solutions at Spotify [4][6]
If music is a window into the soul, one could argue that Spotify knows its customers better than any of their closest friends. The Swedish music-streaming company has collected extensive data on its 170 million active users [9] to develop a deep understanding of listening preferences. By combining this knowledge with machine learning, Spotify is able to create new song preferences by generating individualized song recommendations for its users to enjoy [1].
Compelling song recommendations are critical to the success of Spotify’s user engagement, a key metric for both of the company’s two revenue sources [4]:
- Paid subscriptions (44% of users, 91% of revenue) [9]
- Ads from users who listen for free (56% of users, 8% of revenue) [9]
Paid subscribers have a variety of music streaming options available, with Apple Music being the company’s main competitor [10]. Since most streaming services offer nearly identical content catalogues, platforms must differentiate on user experience. By serving fresh, highly-relevant song recommendations, Spotify is able to retain loyalty to its platform. Moreover, use of machine learning creates a virtuous cycle: as Spotify’s user base grows, the company has more data to refine smarter recommendations, which promote further user growth through retention and referrals [1][5][7].
Further, the profitability of ad-supported listeners is directly tied to the amount of time these users spend streaming: more listening means more opportunities to present advertisements. Providing users with consistently fresh and appealing content indirectly encourages them to spend more time listening to ads.
Spotify tackles machine learning using a number of data sources and approaches:
- Collaborative filtering: Generates recommendations based on the behavior of users with similar listening histories. If, for example, two users have generally similar song histories, and User 1 listens to a song that the User 2 hasn’t heard of, it’s likely that User 2 will like that song (see Image 1 for example). However, a critical downside of collaborative filtering is known as the slow-start problem: new or unpopular songs that don’t yet have historical data are excluded from analyses. [8]
- Audio signal models: Spotify has developed sophisticated models to analyze audio sources themselves. While this offers a direct way to evaluate the characteristics of a song, the techniques to do so are complex and not necessarily reliable. The execution of these models is complicated by the semantic gap, or the ambiguous connection between the audio itself and the traits that influence whether a user likes a song (mood of the song, lyrical themes, etc.). Spotify’s acquisition of Echo Nest, a music analytics platform founded at the MIT Media Lab, should help drive these types of analyses forward. [8]
- Natural Language Processing (NLP): Language from blogs, music review sites and social media can be scraped and analyzed using NLP to identify shared traits among songs based on the words used in association with them. [7]
- Producer Data: Additional sources of data can be gleaned from music producers directly (i.e., production year, lyrics, album images, etc.) [1].
- User feedback: Spotify is able to create feedback loops to refine it’s models using two sources of user feedback. The first is manual “Thumbs Up” or “Thumbs Down” input by users during songs. The second, perhaps more reliable, signal is observed user behavior – i.e., users skipping over songs, listening to songs repeatedly, etc [5].
To improve their product offering and competitive position further, Spotify should consider addressing several additional issues:
- Cold-start problem for users: Spotify has ways to circumvent the cold-start problem for new songs, but it’s less clear how they tackle a lack of historical data for users. By accessing social media posts, networks, and demographic data, Spotify could better customize content to capture new users, driving retention or faster conversion to paid subscriptions. However, they should be cautious in considering which data sources to access given previous controversies related to overreaching privacy policies [3].
- Content generation: Though studies have shown mixed results on machine learning’s ability to predict highly-successful songs, Spotify may consider selling data to music producers, who want more accurate insights into the likelihood of a song’s commercial viability. In a riskier move, Spotify might also benefit from launching its own record label and producing content influenced by its wealth of customer and machine learning data.
Open questions
- Netflix has leveraged machine learning insights to inform in-house content development. Would the approach Netflix has taken with movies work for Spotify and music?
- Apple is moving into music subscription services, and has a much more robust set of data on users than Spotify. How worried should Spotify be about Apple’s ability to compete in music?
(798 words)
SOURCES
[1] Bernhardsson, E. (2014). Music Discovery at Spotify. In: MLConf. [online] Available at: https://www.slideshare.net/erikbern/music-recommendations-mlconf-2014 [Accessed 13 Nov. 2018].
[2] Dieleman, S. (2018). Recommending music on Spotify with deep learning. [online] GitHub. Available at: http://benanne.github.io/2014/08/05/spotify-cnns.html [Accessed 10 Nov. 2018].
[3] Hern, A. and Rankin, J. (2018). Spotify’s chief executive apologises after user backlash over new privacy policy. [online] the Guardian. Available at: https://www.theguardian.com/technology/2015/aug/21/spotify-faces-user-backlash-over-new-privacy-policy [Accessed 13 Nov. 2018].
[4] Hung, C. (2018). Strata Data Conference 2017 – New York, New York. [online] O’Reilly | Safari. Available at: https://www.oreilly.com/library/view/strata-data-conference/9781491976326/video314445.html?utm_source=oreilly&utm_medium=newsite&utm_campaign=20171206_data_show_christine_hung_related_resources_music_the_window_into_your_soul [Accessed 10 Nov. 2018].
[5] Johnson, C. (2014). Algorithmic Music Discovery at Spotify. [online] Available at: https://www.slideshare.net/MrChrisJohnson/algorithmic-music-recommendations-at-spotify [Accessed 13 Nov. 2018].
[6] Lorica, B. (2018). Machine learning at Spotify: You are what you stream. [online] O’Reilly Media. Available at: https://www.oreilly.com/ideas/machine-learning-at-spotify-you-are-what-you-stream [Accessed 10 Nov. 2018].
[7] Murali, V. (2016). Music Personalization at Spotify. In: RecSys.
[8] Oord, A., Dieleman, S. and Schrauwen, B. (2018). Deep content-based music recommendation. [online] Papers.nips.cc. Available at: https://papers.nips.cc/paper/5004-deep-content-based-music-recommendation [Accessed 10 Nov. 2018].
[9] Spotify (2018). Annual Report for the First Quarter of 2018. [online] Spotify. Available at: https://investors.spotify.com/financials/press-release-details/2018/Spotify-Technology-SA-Announces-Financial-Results-for-First-Quarter-2018/default.aspx [Accessed 10 Nov. 2018].
[10] Steele, A. (2018). Apple Music on Track to Overtake Spotify in U.S. Subscribers. [online] WSJ. Available at: https://www.wsj.com/articles/apple-music-on-track-to-overtake-spotify-in-u-s-subscribers-1517745720 [Accessed 10 Nov. 2018].
As a music lover, the music streaming industry has made my life much more enjoyable. Since the music is the same across different streaming provider, user experience is probably the only way (apart from lifetime free subscription) to capture and retain customer. One of the biggest pain point of music lover is to pick the song that align with the mood and tone of user’s current stage and machine learning to help user pick the song sounds so wonderful. I also agree that customer data analytics can help companies whether Spotify, Apple, or Netflix to provide far better customer experience, but at the same time letting these companies know me better than my best friends or even myself sounds a bit bizarre. I am not quite sure about the future of the big data due to the prevailing tension between customized consumer experience and potential threat.
Great read! It does seem Spotify’s natural extension would be to start providing label or publisher services, but if one is to learn from Netflix’s approach, it is not without it’s risks. Labels with music catalogues who are sharing rights with Spotify to stream the content will feel alienated if Spotify, as their distributor, starts wanting to eat their lunch. Making content can be extremely expensive (as we saw during the Netflix case!), although music certainly less so than film/TV content. There are other areas Spotify can however expand to that would not alienate their suppliers – providing a cloud platform for artists to create music online (Splice as the current market leader) or enabling easier dialogue between artists and the consumer through live collaborative radio (Stationhead for example). I also think they might even have a shot on becoming the go-to-platform for podcasts (6% current market share), even though Apple with its current ~65% market share will be a domineering hindrance en route.
Great essay! Regarding your question around Apple’s competition – I do believe Spotify might also be worried for the following reasons:
(i) Apple Music has a larger song catalog (45 million vs. 35 million for Spotify)
(ii) Apple Music allows you to store songs in the iCloud (vs. Spotify from local files)
(iii) Social sharing is easier on Apple Music
To the extent that machine learning helps detect trends in terms of musical patterns, genres, etc. that can be dissected by age/gender/other demographic characteristics, I wonder if there really is any natural edge for Spotify to go the route of Netflix to develop in house content. Unlike the film industry, one could argue there is inherently more convergence within different genres of music as well as distinct artist styles (e.g. techno music having similar beat duration and sounds or an artist like One Direction having the same style across songs). Given that large variations in data are necessary for a robust predictive model, does this hamper the ability of the algorithm to find truly significant correlations in a way that would feed into stronger predictive models? One could argue that we look for a lot more diversity in TV watching/ movies in that new content has to be different enough, while Spotify history reveals that humans will repeat musical content a lot more. Curious if this in any way reduces the statistical power of their models on an individual level.
One thing I think the Spotify model doesn’t solve for is people who have aspiration music preferences. For example, for some people the music they listen to today is not representative of the music they want to discover. These people want Spotify the ability to introduce new music that does not overlap with their listening history. I imagine their is other data available on customers social media accounts or other accounts that could suggest what some of these aspiration music categories could be – perhaps this will be an edge for a player in the music subscription business.
This is a really good point, Chris! Sometimes I get into a bit of a music rut where I’ll listen to a lot of top 40-ish music and want spotify to redirect me to more interesting songs, but instead it gives me more Bieber, and I further disengage from the service 🙂 Definitely a real flaw.
This is so interesting! I am a massive Spotify loyalist for exactly this reason. Every week, I get to play my “Discover Weekly” playlist, and I am amazed by how well they understand my music preferences. This is certainly a reason I would never switch from Spotify to Apple Music. It was interesting learning about how they actually provide these recommendations. It really encompasses all aspects of a song: other consumer preferences, sound, and lyrics. For these points, I am not worried about Apple Music. I think Spotify has developed a loyal base of users who are unlikely to switch. Additionally, I see them as the experts in music – Apple can stick to the hardware!
So interesting! I converted from Apple over to Spotify because I actually think their predictive technology for which songs I would like is so much better than Apple’s (sorry Karishma). I don’t really think Netflix is a great comparable in the sense that I don’t really see Spotify getting into the music production business. I think live studio recordings of existing artists is one thing but actually managing artists is a completely different business altogether.
I think generally it’s such a hard question to determine why someone likes a particular song. I mean, I can’t even tell you why I like certain songs, I just do. Certain artists just have a sound to them that I enjoy. That’s what’s so cool about Spotify. I actually feel like they know what I like better than I do sometimes.
Looks like I will be having Spotify’s algorithm DJ my wedding. Seriously though, I wonder if there is an opportunity to allow users to provide additional feedback into the algorithm. The only direct feedback I’ve noticed is the thumbs up or thumbs down that you mentioned. Maybe I skip a new song that I like because I just happen to not be in the mood to listen to it. Or alternatively I listen to one one song on repeat because I am hanging out with my cousin who really likes it – am I doomed to listen to similar songs over and over? Is the meta data really enough to truly capture our preferences?