Skip to Content
Artificial intelligence

Facebook wants to make AI better by asking people to break it

The new kind of test pits machine-learning models against humans who do their best to fool them.
September 24, 2020
Yatheesh Gowda / Pixabay

The explosive successes of AI in the last decade or so are typically chalked up to lots of data and lots of computing power. But benchmarks also play a crucial role in driving progress—tests that researchers can pit their AI against to see how advanced it is. For example, ImageNet, a public data set of 14 million images, sets a target for image recognition. MNIST did the same for handwriting recognition and GLUE (General Language Understanding Evaluation) for natural-language processing, leading to breakthrough language models like GPT-3.

A fixed target soon gets overtaken. ImageNet is being updated and GLUE has been replaced by SuperGLUE, a set of harder linguistic tasks. Still, sooner or later researchers will report that their AI has reached superhuman levels, outperforming people in this or that challenge. And that’s a problem if we want benchmarks to keep driving progress.

So Facebook is releasing a new kind of test that pits AIs against humans who do their best to trip them up. Called Dynabench, the test will be as hard as people choose to make it.

Benchmarks can be very misleading, says Douwe Kiela at Facebook AI Research, who led the team behind the tool. Focusing too much on benchmarks can mean losing sight of wider goals. The test can become the task.

“You end up with a system that is better at the test than humans are but not better at the overall task,” he says. “It’s very deceiving, because it makes it look like we’re much further than we actually are.”

Kiela thinks that’s a particular problem with NLP right now. A language model like GPT-3 appears intelligent because it is so good at mimicking language. But it is hard to say how much these systems actually understand.

Think about trying to measure human intelligence, he says. You can give people IQ tests, but that doesn’t tell you if they really grasp a subject. To do that you need to talk to them, ask questions.

Dynabench does something similar, using people to interrogate AIs. Released online today, it invites people to go to the website and quiz the models behind it. For example, you could give a language model a Wikipedia page and then ask it questions, scoring its answers.

In some ways, the idea is similar to the way people are playing with GPT-3 already, testing its limits, or the way chatbots are evaluated for the Loebner Prize, a contest where bots try to pass as human. But with Dynabench, failures that surface during testing will automatically be fed back into future models, making them better all the time.

For now Dynabench will focus on language models because they are one of the easiest kinds of AI for humans to interact with. “Everybody speaks a language,” says Kiela. “You don’t need any real knowledge of how to break these models.”

But the approach should work for other types of neural network too, such as speech or image recognition systems. You’d just need a way for people to upload their own images—or have them draw things—to test it, says Kiela: “The long-term vision for this is to open it up so that anyone can spin up their own model and start collecting their own data.”

“We want to convince the AI community that there’s a better way to measure progress,” he adds. “Hopefully, it will result in faster progress and a better understanding of why machine-learning models still fail.” 

Deep Dive

Artificial intelligence

storm front
storm front

DeepMind’s AI predicts almost exactly when and where it’s going to rain

The firm worked with UK weather forecasters to create a model that was better at making short term predictions than existing systems.

conceptual illustration showing various women's faces being scanned
conceptual illustration showing various women's faces being scanned

A horrifying new AI app swaps women into porn videos with a click

Deepfake researchers have long feared the day this would arrive.

computation concept
computation concept

How AI is reinventing what computers are

Three key ways artificial intelligence is changing what it means to compute.

digital twins concept
digital twins concept

How AI digital twins help weather the world’s supply chain nightmare

Just-in-time shipping is dead. Long live supply chains stress-tested with AI digital twins.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.