Skip to Content

AI Is Learning to See the World—But Not the Way Humans Do

AI systems are modeled after human biology, but their vision systems still work quite differently.

Computer vision has been having a moment. No more does an image recognition algorithm make dumb mistakes when looking at the world: these days, it can accurately tell you that an image contains a cat. But the way it pulls off the party trick may not be as familiar to humans as we thought.

Most computer vision systems identify features in images using neural networks, which are inspired by our own biology and are very similar in their architecture—only here, the biological sensing and neurons are swapped out for mathematical functions. Now a study by researchers at Facebook and Virginia Tech says that despite those similarities, we should be careful in assuming that both work in the same way.

To see exactly what was happening as both humans and AI analyzed an image, the researchers studied where the two focused their attention. Both were provided with blurred images and asked questions about what was happening in the picture—“Where is the cat?” for instance. Parts of the image could be selectively sharpened, one at a time, and both human and AI did so until they could answer the question. The team repeated the tests using several different algorithms.

Obviously they could both provide answers—but the interesting result is how they did so. On a scale of 1 to -1, where 1 is total agreement and -1 total disagreement, two humans scored on average 0.63 in terms of where they focused their attention across the image. With a human and an AI, the average dropped to 0.26.

In other words: the AI and human were both looking at the same image, both being asked the same question, both getting it right—but using different visual features to arrive at those same conclusions.

This is an explicit result about a phenomenon that researchers had already hinted at. In 2014, a team from Cornell University and the University of Wyoming showed that it was possible to create images that fool AI into seeing something, simply by creating a picture made up of the strong visual features that the software had come to associate with an object. Humans have a large pool of common-sense knowledge to draw on, which means they don’t get caught out by such tricks. That's something researchers are trying to incorporate into a new breed of intelligent software that understands the semantic visual world.

But just because computers don’t use the same approach doesn’t necessarily mean they’re inferior. In fact, they may be better off ignoring the human approach altogether.

The kinds of neural networks used in computer vision usually employ a technique known as supervised learning to work out what’s happening in an image. Ultimately, their ability to associate a complex combination of patterns, textures, and shapes with the name of an object is made possible by providing the AI with a training set of images whose contents have already been labeled by a human.

But teams at Facebook and Google’s DeepMind have been experimenting with unsupervised learning systems that ingest content from video and images to learn what human faces and everyday objects look like, without any human intervention. Magic Pony, recently bought by Twitter, also shuns supervised learning, instead learning to recognize statistical patterns in images to teach itself what edges, textures, and other features should look like.

In these cases, it’s perhaps even less likely that the knowledge of the AI will be generated through a process aping that of a human. Once inspired by human brains, AI may beat us by simply being itself.

(Read more: New Scientist, “The Missing Link of Artificial Intelligence”, “‘Smart’ Software Can Be Tricked into Seeing What Isn’t There”)

Keep Reading

Most Popular

conceptual illustration showing various women's faces being scanned
conceptual illustration showing various women's faces being scanned

A horrifying new AI app swaps women into porn videos with a click

Deepfake researchers have long feared the day this would arrive.

2021 tech fails concept
2021 tech fails concept

The worst technology of 2021

Face filters, billionaires in space, and home-buying algorithms that overpay all made our annual list of technology gone wrong.

glacier near Brown Station
glacier near Brown Station

The radical intervention that might save the “doomsday” glacier

Researchers are exploring whether building massive berms or unfurling underwater curtains could hold back the warm waters degrading ice sheets.

Professor Gang Chen of MIT
Professor Gang Chen of MIT

In a further blow to the China Initiative, prosecutors move to dismiss a high-profile case

MIT professor Gang Chen was one of the most prominent scientists charged under the China Initiative, a Justice Department effort meant to counter economic espionage and national security threats.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.