Jukebox is a type of neural net – an network of artificial nodes which is ‘trained’ on a series of data, and can then be taught to use this data to generate new strings. These artificial intelligence networks have been used to create unique images, poetry, scripts, and music. Essentially, they work from one data-point to the next and try to work out what letter, pixel, or note should come next, based on its training. I first encountered them on the wonderful AI Weirdness blog, which is a rabbit-hole of the hilarious and surreal things that can now be done with this technology.
What makes Jukebox different from many of the varieties of generative music that have come before is that it’s trained not on symbolic datasets – for example MIDI files which encode digital musical instructions into code – but actual audio. Not only that, but it has also been conditioned to recognise the shape of words, meaning it can – sort of – generate these sounds too.
This means that you can feed it an audio sample, give it a few parameters such as a genre or artist to emulate, specify the words, and then ask it to predict what should come next. It bases these choices on what it has learned about the 1.2 million real songs that formed its ‘training’ dataset.
The results, as one might expect, vary wildly in quality. On the aforementioned blog, Janelle Shane posts some creations which are exciting and not a little horrifying – for example, a pastiche Frank Sinatra Christmas song which should belong to an album entitled ‘Music from the Uncanny Valley’.
Most of the results that have so far been posted by researchers have the flavour of I’m Sorry I Haven’t A Clue’s ‘One Song to the Tune of Another’ (see here if you need a description of this very complicated game). Thus you can get the AI to do Queen in the style of Nirvana, for example.
Inevitably, a large majority of its training data is non-classical in nature, but I still thought it would be interesting to prompt it with some choral music, to see what it would come up with. The results are surprisingly impressive, though naturally very odd.
Jukebox was primed with about twelve seconds of a recording of the classic Thomas Tallis banger ‘If ye love me’, and given the full lyrics. Now, it has a limited dataset of genres and artists to use as a template, and the closest I could find were ‘Classical’ for the genre and, yes, ‘Mormon Tabernacle Choir’ for the artist. Already the mind boggles.
It had three goes at generating 40 more seconds of the piece, transforming the input through a process of ‘upsampling’ at three different levels. Let’s have a listen to what it came up with after some four hours of labour:
1. If ye love meh
The neural net takes over on the last syllable of ‘commandments’, and in each sample it has a different idea of what chord should follow. Here, it plays it safe and repeats the chord, which works. It’s cool that it makes the phrase lengths broadly ‘vocal’ in nature, and simulates breaths before them too, presumably learning to ape the opening of the prompt.
Some extraneous, non-vocal sounds start to appear in the middle, including at one point what sounds like a train passing, or perhaps a snare drum. I wonder if that’s due to it using the Mormon Tabernacle Choir, with their often quite elaborate arrangements, as a model. For all it knows, the piece starts acappella and then goes on to become instrumental. It could also be misinterpreting the acoustical reverb as ‘new sounds’ in their own right, and trying to work out what they could mean.
It also mostly stays in key, until the very end, which normally unremarkable thing I point out as it is not a given in the other samples…
2. If ye…love…meeee….
This one’s ‘-ment’ chord is actually a cool choice – A minor rather than original F major. Afterwards, however, it goes off the rails a little earlier than the previous one. I like the little cymbal ‘ting’ after the second phrase. The choir’s vocal production becomes very slurred, and the AI forgets the key, if it ever knew what that was in the first place. The end becomes rather worrying and distorted, and the harmony is bizarre.
Presumably, because it isn’t given any information about what harmony actually is, it doesn’t know the rules except by what it’s heard before. It must base its moment-to-moment choices about what audio to generate on what previous bits of audio it knows are usually followed by. However, I can’t imagine there are very many examples in the dataset of an audio progression of the sort that happens at the end of this excerpt. How did Jukebox come up with it?
3. If ye love me, keep in the same key..?
Uh. Pretty out-there choice of a continuation chord on ‘commandments’, but it recovers pretty successfully and sticks the landing. The words also feel a little more present in this one, and it stays in a key and sort of in tune longer than the others, at least until a demonic final entry before the file mercifully ends. There’s some intriguing parallelism in the middle, during the extension of a word that I think might be ‘you’. And it remembers to be acappella throughout, which the other two didn’t manage. Probably the most successful.
What’s impressive is that, in all three of its goes, the AI learns that the phrases are preceded by breaths, and apes the length of the first phrase for most of the following ones, varying them subtly but plausibly. But the overall effect of the continuations (if one can ignore the ghostly distorting of the voices) is of someone dreaming a conclusion to a piece to which they only remember the opening. Like dreams, they lose coherence and stop making sense at various points. Still, given that the vast majority of its training is on popular music and other styles, it does a pretty creditable if slightly meandering job.
For me, the results of this are roughly equal parts disturbing, exciting, and hilarious. Disturbing, because the distorted voices end up sounding like something from a horror film. Exciting, because the computer isn’t bound by our conception of harmony or structure – it dreams up new combinations that we might never have thought of. Insomuch as it has worked out the rules, it’s done so by simply listening to a lot of music, like an alien tuning in from another planet and trying to understand how our music works.
As a tool for inspiring creativity, it has limitless potential, because it can always surprise us with its choices. It won’t be long before it gets better at understanding different genres and is able to produce highly competent pastiches – the musical equivalent of these non-existent people.
In the meantime it’s more likely to make me giggle than reflect on the mysteries of human existence. But it won’t be long. I, for one, welcome our new robotic musical overlords.