To see what makes AI hard to use, ask it to write a pop song

If AI is to be helpful it needs to be easier to control. A contest in which humans and machines teamed up to write a song shows what needs fixing.

Will Douglas Heavenarchive page

October 29, 2020

Uncanny Valley via UNSW

Welcome home welcome home oh oh oh the world is beautiful the world. They’re not the most catchy lyrics. But after I’ve listened to “Beautiful the World” half a dozen times, the chorus is stuck in my head and my foot is tapping. Not bad for a melody generated by an AI trained on a data set of Eurovision songs and koala and kookaburra cries.

Back in May, “Beautiful the World” won the AI Song Contest, a competition run by Dutch broadcaster VPRO, in which 13 teams from around the world tried to produce a hit pop song with the help of artificial intelligence.

The winning entry was created by Uncanny Valley, a team of musicians and computer scientists from Australia that used both human songwriting and AI contributions. “Their music was exciting,” says Anna Huang, an AI researcher at Google Brain, who was one of the competition judges. “The hybrid effort really shined.”

Many believe that the near-term usefulness of AI will come via collaboration, with teams of humans and machines working together, each playing to their strengths. “AI can sometimes be an assistant, merely a tool,” says Carrie Cai, a colleague of Huang’s at Google Brain who studies human-computer interaction. “Or AI could be a collaborator, another composer in the room. AI could even level you up, give you superpowers. It could be like composing with Mozart.”

But for this to happen, AI tools will need to be easy to use and control. And the AI Song Contest proved a useful test of how to achieve that.

Huang, Cai, and their colleagues have looked at the various strategies different teams used to collaborate with the AIs. In many cases, the humans struggled to get the machines to do what they wanted and ended up inventing workarounds and hacks. The researchers identify several ways that AI tools could be improved to make collaboration easier.

A common problem was that large AI models are hard to interact with. They might produce a promising first draft for a song. But there was no way to give the model feedback for a second pass. The teams could not go in and tweak individual parts or instruct the AI to make the melody happier.

In the end most teams used smaller models that produced specific parts of a song, like the chords or melodies, and then stitched these together by hand. Uncanny Valley used an algorithm to match up lyrics and melodies that had been produced by different AIs, for example.

Another team, Dadabots x Portrait XO, did not want to repeat their chorus twice but couldn’t find a way to direct the AI to change the second version. In the end the team used seven models and cobbled together different results to get the variation they wanted.

It was like assembling a jigsaw puzzle, says Huang: “Some teams felt like the puzzle was unreasonably hard, but some found it exhilarating, because they had so many raw materials and colorful puzzle pieces to put together.”

Uncanny Valley used the AIs to provide the ingredients, including melodies produced by a model trained on koala, kookaburra, and Tasmanian devil noises. The people on the team then put these together.

“It’s like having a quirky human collaborator that isn't that great at songwriting but very prolific,” says Sandra Uitdenbogerd, a computer scientist at RMIT University in Melbourne and a member of Uncanny Valley. “We choose the bits that we can work with.”

But this was more compromise than collaboration. “Honestly, I think humans could have done it equally well,” she says.

Generative AI models produce output at the level of single notes—or pixels, in the case of image generation. They don’t perceive the bigger picture. Humans, on the other hand, typically compose in terms of verse and chorus and how a song builds. “There's a mismatch between what AI produces and how we think,” says Cai.

Cai wants to change how AI models are designed to make them easier to work with. “I think that could really increase the sense of control for users,” she says.

It’s not just musicians and artists who will benefit. Making AIs easier to use, by giving people more ways to interact with their output, will make them more trustworthy wherever they’re used, from policing to health care.

“We've seen that giving doctors the tools to steer AI can really make a difference in their willingness to use AI at all,” says Cai.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.