From Auto- tune to Auto- compose

Can composers see themselves outdone by data scientists and computer programmers?

The sound of music

In a September 2018 paper, Japanese data scientists Eita Nakamura and Kunihiko Kaneko at Kyoyo University and the University of Tokyo respectively found that western classical music over the past several centuries has followed laws of evolution. This large- scale study has implications for the understanding of other cultural phenomena such as the evolution of language, fashion, and science. Evolution is an algorithmic process applied to populations in which certain traits are passed on to the next generation and others are culled. In music, as in art in general, this involves passing on past traditions while incorporating new features. Indeed, algorithmic music composition has a long standing history as in Chinese windchimes, Greek wind- powered Aeolian harps of the Japanese suikinkutsu. Iannis Xenakis used Markov chains – which use existing fragments of music in equal probability- for his 1958 compositions, Analogique.

I see in Magenta

Google Magenta’s NSynth takes this a step further and uses deep learning to aid music makers in their composition. NSynth specifically aims to use deep learning and deep neural networks to provide artists with a vast array of instrument sounds (according to instrument class, genre and complexity). Your odd musician could then answer a question such as “What do you get when you cross a piano with a flute?” Being able to do so pushes the evolution of music further and provide artists with a wide range of tools through which they expand existing global discography. Algorithmic composition has also been shown to be helpful in addressing composer’s block; arguably every artist’s worst nightmare.

The main challenge that the team at Magenta face is the fact that available data-sets are small, biased and not freely available largely due to copyright restrictions. To solve this in the short term, the team is building on the Free Music Archive (FMA), a web based repository and use the existing AudioSet concept ontology to classify specific sounds. This ontology (or knowledge management structure) is originally motivated by a lack of large- scale annotated audio- data for scientific research purposes and is derived from YouTube videos, with the goal of providing a testbed for identifying acoustic events. Lastly, the team has also conducted crowd- sourced annotation using the CrowdFlower platform in correcting and verify the labels of clips identified as likely positives. The use of control examples at this stage is especially important in weeding out contributors whose accuracy drops before a certain threshold. Should they consistently answer obvious questions wrongly (such as labelling a piano sample as a trumpet), they are not eligible to take part in the exercise.

However, given that the AudioSet collection is derived from YouTube videos, there are no guarantees on the legality of licensing, sharing, and archiving the content. As such, the use of the content is currently extremely limited. Moreover, most of the content comes from solo performances making it currently difficult to model and evaluate on ensemble performances. Nevertheless, this work will serve to build a baseline model that the team can build up from. In particular, they aim to focus on identifying the classes that correspond to musical instruments, resulting in a set of more than 70 relevant classes. For the sake of coverage, they are merged into “instruments”, e.g. “Acoustic Guitar”, “Electric Guitar”, and “Tapping (guitar technique)” become guitar, while “Cello” and “Violin” remain distinct. The medium to long term goal of the team is to iteratively refine the concepts as the dataset grows and the acoustic model improves. Moreover, another novel approach in designing their instrument dataset that they aim to use going forward is to consider instruments outside the current ‘vocabulary’ through additional crowd- sourcing, semi- supervised learning and incremental evaluation.

50 shades of audio

I would suggest that the management in the medium term also look into different deep learning techniques that can be used for broader ethnomusicology. A risk that exists given the current approach is that the dataset and instruments used are limited to those dominant in the West. As Magenta thinks about crowdsourcing, it could look at other models such as Vocaloids, software voicebanks that have grown over the past few years from collaborative content creation. Hatsune Miku, one of the vocaloid personas has performed in front of sold- out concerts. In the same vein of thinking out of the box, I believe there are multiple uses for products such as NSynth can be extended into other cultural fields. It has been argued that music could be older by language. Using neural networks in training a deep learning network for a dead language by way of example (half of the approximately 6000 languages in the world today will be extinct today) could be incredibly helpful in storing the rich cultural diversity we currently enjoy.

Interesting questions remain for Magenta and artists in general. To what degree can “art” be drawn from algorithmic composition based off computing power? What would the role of human composers look like in the future? How does the industry approach copyright law and proprietary rights in this context? (791 words)


[1] MIT Technology Review, “Data mining reveals the hidden laws of evolution behind classzical music,” Sept. 28, 2018, at

[2] Nakamura, E. and Kaneko, K. (2018), “Statistical Evolutionary Laws in Music Styles,” available at

[3] Schulkin, J., and Raglan, G. (2014), “The Evolution of Music and Human Social Capability,” available at

[4]Giuseppe Bandiera, Oriol Romani Picas, Hiroshi Tokuda, Wataru Hariya, Koji Oishi, and Xavier Serra. Good-sounds. org: A framework to explore goodness in instrumental sounds. In Proceedings of the 17th International Society for Music Information Retrieval Conference, pages 414–419, 2016.

[5] Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference, pages 591– 596, 2011.

[6] Rachel M Bittner, Justin Salamon, Mike Tierney, Matthias Mauch, Chris Cannam, and Juan Pablo Bello. MedleyDB: A multitrack dataset for annotation intensive MIR research. In ISMIR, volume 14, pages 155–160, 2014.

[7] Juan J Bosch, Jordi Janer, Ferdinand Fuhrmann, and Perfecto Herrera. A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In ISMIR, pages 559–564, 2012.

[8] Mark Cartwright, Ayanna Seals, Justin Salamon, Alex Williams, Stefanie Mikloska, Duncan MacConnell, E Law, J Bello, and O Nov. Seeing sound: Investigating the effects of visualizations and complexity on crowdsourced audio annotations. Proceedings of the ACM on Human-Computer Interaction, 1(1), 2017.

[9] Olivier Chapelle, Bernhard Schlkopf, and Alexander Zien. Semi-Supervised Learning. The MIT Press, 1st edition, 2010.


Using Machine Learning to Improve How Energy Gets to Market


Machine learning as a tool to predict future earning power

Student comments on From Auto- tune to Auto- compose

  1. Based on Huffington Post’s article named “10 Powerful Responses To The Question ‘What is Art'” [1], art has two distinct pieces: tangible and intangible. Intangible piece reflects personality, imagination and emotions [2]. These characteristics prevent art to get commoditized. To your question about use of machine learning in creating art, as long as algorithms do not develop emotions, only tangible part of art can be drawn from algorithmic composition. Without emotions, personality and imagination, this tangible piece will be a commodity -no different than a barrel of oil- that can be produced by anyone who has access to specific algorithms. Thus, human composers will still be there in future as they need to input their emotions into the tangible piece to create “real art”. My imagination for copyright would be very similar to current world. As long as something can be publicly accessed (such as songs created by algorithms), there will not be any copyright. At the same time, modifications and additions may make this commodity tangible pieces properties of individuals who contributed their personality, imagination and emotions.

    [2] Ibid

  2. Your question regarding what the role of human composers looks like in the future is an interesting one. Already, there are artists that are creating music entirely based on AI. For example, YouTube star Taryn Southern recently released an album using an open source AI platform called Amper, which allowed her to select several parameters such as “genre, instrumentation, key and beats per minute” to create her songs [1]. She mentions that she does not have a traditional music background and Amper was a low-cost option that allowed her to enter the music industry without recruiting a band. While it is admirable that AI allowed someone to enter the music industry who may not have otherwise been able to do so, it calls into question how much a human composer must be required to input in order to call a song his or her own. In this case, Southern generated backing music for her vocals using AI. If she instead released the album with only backing music (without vocals), could the music be considered her own? How much of a “composer” is Southern since she did not truly compose her own backing music? These are interesting questions that will likely become more prevalent as more artists begin to use AI in their music.

    [1] Love, Tirhakah. “Do Androids Dream of Electric Beats? How AI Is Changing Music for Good.” The Guardian, Guardian News and Media, 22 Oct. 2018,

  3. What I find most striking about the use of algorithms in music creation is the way in which new features are incorporated. As you mention, evolution as an algorithmic process involves passing on traditions while incorporating new features. New features are more difficult to derive from algorithms than previous patterns are. However, in fine arts, we have seen artists create new works using algorithms. For example, the canvas print of “Portrait of Edmond de Belamy” was the star of Christie’s autumn print sale this year. The Financial Times [1] discusses how artists are exploring AI’s potential for independent creativity, which would allow for algorithms to come up with “new features” without human intervention.

    [1] “The world’s hottest new artist: an algorithm?”(2018). Financial Times. []

  4. Although I do believe music follows the laws of evolution, it seems that the fundamentals of western music have persevered over centuries of new “styles”. As an extreme example, popular mumble rap artists like Lil’ Pump and Migos often use triplets to change up the flow of a song. Mozart and Bach also commonly used the triplet to convey the same end goal in their concertos. Since there are many common building blocks to western music regardless of time, I believe that machine learning is in a particular interesting spot to become a natural tool for artists. By leveraging crowd-sourced ontologies paired with listener’s opinions of existing pieces, the evolutionary process would be more data-driven and lead to even better outcomes.

  5. This article really has me grappling with the concept and definition of creativity. If we are able to use machines to help us generate music, especially in helping composers get past writer’s block, then are they really composing? Isn’t the machine just generating an iteration of a pattern, based off previous patterns, and now the composer is claiming it as their own?

    I think a lot of people have always had the thought that nothing is truly original — you could argue that everything is inspired by or based off something that exists, and this really lends itself to that argument. And if machines can generate songs as well as any composer or music producer (which they can — we’re already generating new Christmas carols with them!), I’m not sure if the latter will have jobs in the near future! I’m very interested to see where the music industry goes over the next decade.

Leave a comment