Skip to Content
Artificial intelligence

AI assistants say dumb things, and we’re about to find out why

A new test could prove that when it comes to language, today’s best AI systems are fundamentally limited.
March 14, 2018
<a href="www.brotherspark.co.uk/evolution-office-technology">Brother UK | Flickr</a>

Siri and Alexa are clearly far from perfect, but there is hope that steady progress in machine learning will turn them into articulate helpers before long. A new test, however, may help show that a fundamentally different approach is required for AI systems to actually master language.

Developed by researchers at the Allen Institute for AI (AI2), a nonprofit based in Seattle, the AI2 Reasoning Challenge (ARC) will pose elementary-school-level multiple-choice science questions. Each question will require some understanding of how the world works. The project is described in a related research paper (pdf).

Here’s one question: Which item below is not made from a material grown in nature? (A) a cotton shirt (B) a wooden chair (C) a plastic spoon (D) a grass basket”

Such a question is easy for anyone who knows plastic is not something that grows. The answer taps into a common-sense picture of the world that even young children possess. 

It is this common sense that the AI behind voice assistants, chatbots, and translation software lacks. And it’s one reason they are so easily confused.  

Language systems that rely on machine learning can often provide convincing answers to questions if they have seen lots of similar examples before. A program trained on many thousands of IT support chats, for instance, might be able to pass itself off as a tech support helper in limited situations. But such a system would fail if asked something that required broader knowledge.

 “We need to use our common sense to fill in the gaps around the language we see to form a coherent picture of what is being stated,” says Peter Clark, the lead researcher on the ARC project. “Machines do not have this common sense, and thus only see what is explicitly written, and miss the many implications and assumptions that underlie a piece of text.”

The new test is part of an initiative at AI2  to imbue AI systems with such an understanding of the world. And it is important because determining how well a language system understands what it is saying can be tricky.

For instance, in January researchers at Microsoft and another group at Alibaba developed question-and-answer programs that outperformed humans in a simple test called the Stanford Question Answering Dataset. These advances were accompanied by headlines proclaiming that AI programs could now read better than humans. But the programs could not answer more complex questions or draw on other sources of knowledge.

Tech companies will continue to tout the capabilities of AI systems in this way. Microsoft announces today that it has developed software capable of translating English news stories into Chinese, and vice versa, with results that independent volunteers deem equal to the work of professional translators.  The company’s researchers used advanced deep-learning techniques to reach a new level of accuracy. While this is potentially very useful, the system would struggle if asked to translate free-ranging conversation or text from an unfamiliar domain, such as medical notes.

Gary Marcus, a professor at NYU who has argued for the importance of common sense in AI, is encourage by the AI2 challenge. “I think this is a great antidote to the kind of superficial benchmarks that have become so common in the field of machine learning,” he says. “It should really force AI researchers to up their game.”

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.