To see what makes AI hard to use, ask it to write a pop song
Welcome home welcome home oh oh oh the world is beautiful the world. They’re not the most catchy lyrics. But after I’ve listened to “Beautiful the World” half a dozen times, the chorus is stuck in my head and my foot is tapping. Not bad for a melody generated by an AI trained on a data set of Eurovision songs and koala and kookaburra cries.
Back in May, “Beautiful the World” won the AI Song Contest, a competition run by Dutch broadcaster VPRO, in which 13 teams from around the world tried to produce a hit pop song with the help of artificial intelligence.
The winning entry was created by Uncanny Valley, a team of musicians and computer scientists from Australia that used both human songwriting and AI contributions. “Their music was exciting,” says Anna Huang, an AI researcher at Google Brain, who was one of the competition judges. “The hybrid effort really shined.”
Many believe that the near-term usefulness of AI will come via collaboration, with teams of humans and machines working together, each playing to their strengths. “AI can sometimes be an assistant, merely a tool,” says Carrie Cai, a colleague of Huang’s at Google Brain who studies human-computer interaction. “Or AI could be a collaborator, another composer in the room. AI could even level you up, give you superpowers. It could be like composing with Mozart.”
But for this to happen, AI tools will need to be easy to use and control. And the AI Song Contest proved a useful test of how to achieve that.
Huang, Cai, and their colleagues have looked at the various strategies different teams used to collaborate with the AIs. In many cases, the humans struggled to get the machines to do what they wanted and ended up inventing workarounds and hacks. The researchers identify several ways that AI tools could be improved to make collaboration easier.
A common problem was that large AI models are hard to interact with. They might produce a promising first draft for a song. But there was no way to give the model feedback for a second pass. The teams could not go in and tweak individual parts or instruct the AI to make the melody happier.
In the end most teams used smaller models that produced specific parts of a song, like the chords or melodies, and then stitched these together by hand. Uncanny Valley used an algorithm to match up lyrics and melodies that had been produced by different AIs, for example.
Another team, Dadabots x Portrait XO, did not want to repeat their chorus twice but couldn’t find a way to direct the AI to change the second version. In the end the team used seven models and cobbled together different results to get the variation they wanted.
It was like assembling a jigsaw puzzle, says Huang: “Some teams felt like the puzzle was unreasonably hard, but some found it exhilarating, because they had so many raw materials and colorful puzzle pieces to put together.”
Uncanny Valley used the AIs to provide the ingredients, including melodies produced by a model trained on koala, kookaburra, and Tasmanian devil noises. The people on the team then put these together.
“It’s like having a quirky human collaborator that isn't that great at songwriting but very prolific,” says Sandra Uitdenbogerd, a computer scientist at RMIT University in Melbourne and a member of Uncanny Valley. “We choose the bits that we can work with.”
But this was more compromise than collaboration. “Honestly, I think humans could have done it equally well,” she says.
Generative AI models produce output at the level of single notes—or pixels, in the case of image generation. They don’t perceive the bigger picture. Humans, on the other hand, typically compose in terms of verse and chorus and how a song builds. “There's a mismatch between what AI produces and how we think,” says Cai.
Cai wants to change how AI models are designed to make them easier to work with. “I think that could really increase the sense of control for users,” she says.
It’s not just musicians and artists who will benefit. Making AIs easier to use, by giving people more ways to interact with their output, will make them more trustworthy wherever they’re used, from policing to health care.
“We've seen that giving doctors the tools to steer AI can really make a difference in their willingness to use AI at all,” says Cai.
The inside story of how ChatGPT was built from the people who made it
Exclusive conversations that take us behind the scenes of a cultural phenomenon.
AI is dreaming up drugs that no one has ever seen. Now we’ve got to see if they work.
AI automation throughout the drug development pipeline is opening up the possibility of faster, cheaper pharmaceuticals.
GPT-4 is bigger and better than ChatGPT—but OpenAI won’t say why
We got a first look at the much-anticipated big new language model from OpenAI. But this time how it works is even more deeply under wraps.
The original startup behind Stable Diffusion has launched a generative AI for video
Runway’s new model, called Gen-1, can change the visual style of existing videos and movies.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.