Machine vision that sees things more the way we do is easier for us to understand

Karen Haoarchive page

December 6, 2019

A red-bellied woodpeckerSpinus Nature Photography/Wikipedia; Ms. Tech

A new image recognition algorithm uses the way humans see things for inspiration.

The context: When humans look at a new image of something, we identify what it is based on a collection of recognizable features. We might identify the species of a bird, for example, by the contour of its beak, the colors of its plume, and the shape of its feet. A neural network, however, simply looks for pixel patterns across the entire image without discriminating between the actual bird and its background. This makes the neural network more vulnerable to mistakes and makes it harder for humans to diagnose them.

How it works: Researchers from Duke University and MIT Lincoln Laboratory trained a neural network to recognize distinguishing features across bird species. They did so by showing it many examples of each species and having it find the parts of the images that looked similar within species but different across them. Through this process, the network might learn, for example, that a distinguishing feature of a cardinal is its black mask against its red feathers, while a distinguishing feature of a Florida jay is its blue wings and white body. Presented with a new image of a bird, the network then searches for those recognizable features and makes predictions about which species they belong to. It uses the cumulative evidence to make a final decision.

An example: For a picture of a red-bellied woodpecker, the algorithm might find two recognizable features that it’s been trained on: the black-and-white pattern of its feathers and the red coloring of its head. The first feature could match with two possible bird species: the red-bellied or the red-cockaded woodpecker. But the second feature would match best with the former.

From the two pieces of evidence, the algorithm then reasons that the picture is more likely of the former. It then displays the pictures of the features it found to explain to a human how it came to its decision.

Why it matters: In order for image recognition algorithms to be more useful in high-stakes environments such as hospitals, where they might help a doctor classify a tumor, they need to be able to explain how they arrived at their conclusion in a human-understandable way. Not only is it important for humans to trust them, but it also helps humans more easily identify when the logic is wrong.

Through testing, the researchers also demonstrated that incorporating this interpretability into their algorithm didn’t hurt its accuracy. On both the bird species identification task and a car model identification task, they found that their method neared—and in some cases exceeded—state-of-the-art results achieved by non-interpretable algorithms.

Correction: A previous version of the "How it works" section incorrectly described the training process of the neural network. It has now been updated.

To have more stories like this delivered directly to your inbox, sign up for our Webby-nominated AI newsletter The Algorithm. It's free.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.