Creating noodling piano tunes and endless configurations of cat drawings with AI may not sound like an obvious project for Google, but it makes a lot of sense to Douglas Eck.
Eck has spent about 15 years studying AI and music, and these days he’s a research scientist on the Google Brain team, leading Magenta—Google’s open-source research project that’s aimed at making art and music with machine learning.
He spoke to MIT Technology Review about how Google is producing new sounds with deep neural networks, where Magenta is taking AI music, and why computers suck at telling jokes.
Below is an edited excerpt of the interview. Premium MIT Technology Review subscribers can listen to the full interview.
Using AI to make art isn’t new, so what’s unique about Google’s approach?
We’re exploring this very specific direction having to do with deep neural networks and recurrent neural networks and other kinds of machine learning. And we’re also trying really hard to engage both the artistic community and creative coders and open-source developers at the same time, so we’ve made it an open-source project.
A lot of Magenta is focused on music. Why is AI good for making and augmenting music?
To be honest, it’s just a bias of mine. My whole research career has been about music and audio. I think the scope of Magenta has always been about art in general, storytelling, music, narrative, imagery, and trying to understand how to use AI as a creative tool. But you have to start somewhere. And I think if you make serious progress on something as complicated as music, and as important to us as music, then my hope is that some of that will map over into other domains as well.
Can we listen to some music that’s been made with Magenta?
Listen and just pay attention to the texture and everything there. This is a kind of music composition but it’s also at the same time a music performance, because the model is not only generating quarter notes—it’s deciding how fast they’re going to be played, how loudly they’re going to be played, and in fact it’s reproducing what it was trained on, which was a bunch of piano performances done as part of a piano competition.
As that piece shows, music that’s been created thus far with Magenta is essentially improvisation. Can AI be used to create a coherent piece of music with structure?
We’re working on that. So one of the major future research directions for us and, frankly, for the whole field of generative models—by that I mean machine-learning models that can try to generate something new—is learning structure. And that shows up in music here. You hear that there’s no overarching model that’s kind of deciding where things should go.
If we wanted to give it chord changes, even the symbols of the chord change, and learn contextually how to take advantage of those chord changes, we could do that. We could even have a separate model that generates chord changes. Our goal is to come up with this end-to-end model that figures out all of these levels of structure on its own.
Tell me about Sketch-RNN, which is a recent Magenta experiment that lets you draw with a recurrent neural network—basically, you start drawing a pineapple and then Sketch-RNN takes over and completes it, over and over, in many different styles.
We were able to use a bunch of drawings done by people playing Pictionary against a machine-learning algorithm—this was [data from another Google AI drawing experiment made by Google Creative Lab,] Quick, Draw!
There are limits on the data. There’s only so much you’re going to get out of these tiny little 20-second drawings. But I think the work done by the main [Sketch-RNN] researcher, David Ha, was really beautiful. He basically trained a recurrent neural network to learn how to reproduce these drawings. He sort of forced the model to learn what’s important. The model wasn’t powerful enough to memorize the entire drawing. Because it can’t memorize all the strokes it’s seeing, its job is just to reproduce lots of cats or whatever, it’s forced to learn what’s important about cats—what are the shared aspects of cat drawings across millions of cat drawings? And so when you play with this model you can ask it to generate new cats out of thin air. It generates really interesting looking cats that look, I think, uncannily like how people would draw cats.
I read that you’re working with Magenta to teach computers to tell jokes. What kind of jokes do computers generate? (That was not itself the first line of a joke.)
The project was very preliminary, very exploratory, asking the question: can we understand that component of joke telling which is about surprise? Especially punch-line-related jokes, and puns, there’s clearly a point where everything’s running along as normal, I think I know what’s going on with this sentence, and then, boom! Right? And also I think, intuitively, there’s a geometry to the punch line. It’s surprising if the building collapses on your head; [a punch line is] not that kind of surprise. It’s, like, oh, right, I get it! You know? And that sense of “I get it” is, I think, a kind of backtracking you’re forced to do to get it. So we were looking at particular kinds of machine-learning models that can generate these things called truth vectors that are trying to understand what’s happening semantically in a sentence and then, can we actively manipulate those to get a different effect?
And the kind of joke we were hearing about was … “The magician was so angry she pulled her hare out.” And the pun of hare and hair, and rabbit—you get it, right?
Yeah. But you have to know a lot about words and language to understand it.
Yeah, you have to know a lot. Not only did this model not tell any jokes, funny or not, but we didn’t actually get the code to converge.
What are you in the middle of trying to figure out with Magenta right now?
Trying to understand more of the long-term structure with music and also trying to branch out into another interesting question, which is: can we learn from the feedback, not from an artist, but from an audience?
This is looking at the artistic process as kind of iterative. The Beatles had 12 albums and every one of them was different. And they were all showing that these musicians are learning from feedback they’re getting from peers and from crowds, but also other things that are happening with other artists. They’re really tied in with culture. Artists are not static.
And this very simple idea: can you have someone making something with a generative model, putting it out there, but then taking advantage of the fact that the feedback they get? “Oh, that was good, that was bad.” That feedback that we get, the artist can learn from that in one way, but maybe the machine-learning model can learn from it as well, and say, “Oh, I see, here are all the people and here’s what they think of what I’m doing, and I have these parameters.” And we can set those parameters vis-à-vis the feedback, using reinforcement learning, and we’re working on that, too.
As I listen to music created with Magenta, I wonder: if you’re using data to train artificial intelligence, can the AI then create anything truly original, or will it just be derivative of what it’s been trained on, whether that’s Madonna songs or impressionist paintings, or both?
I think it depends on what we mean by original. I think it’s unlikely to me that a machine-learning algorithm is going to come along and generate some transformative new way of doing art. I think a person working with this technology might be able to do that. And I think we’re just so, so, so far from this AI having a sense of what the world is really like. Like it’s just so, so far away. At the same time, I think that a lot of art is original in another sense. Like, I do one more cool EDM song with the drop at the right place, that’s fun to dance to and is new, but maybe is not, like, creating a completely new genre. And I think that kind of creativity is really interesting anyway. That by and large most of what we do is sitting in a genre we kind of understand, and we’re trying new things, and that kind of creativity I think the AI that we have now can play a huge role in. It’s not reproducing the data set, right? It’s mixing things up.