Skip to Content

How Computers Can Tell What They’re Looking At

Images from inside an artificial neural network help explain why a technique called deep learning is enabling software to see.
April 11, 2016

Software has lately become much, much better at understanding images. Last year Microsoft and Google showed off systems more accurate than humans at recognizing objects in photos, as judged by the standard benchmark researchers use.

That became possible thanks to a technique called deep learning, which involves passing data through networks of roughly simulated neurons to train them to filter future data (see “Teaching Machines to Understand Us”). Deep learning is why you can search images stored in Google Photos using keywords, and why Facebook recognizes your friends in photos before you’ve tagged them. Using deep learning on images is also making robots and self-driving cars more practical, and it could revolutionize medicine.

That power and flexibility come from the way an artificial neural network can figure out which visual features to look for in images when provided with lots of labeled example photos. The neural networks used in deep learning are arranged into a hierarchy of layers that data passes through in sequence. During the training process, different layers in the network become specialized to identify different types of visual features. The type of neural network used on images, known as a convolutional net, was inspired by studies on the visual cortex of animals.

“These networks are a huge leap over traditional computer vision methods, since they learn directly from the data they are fed,” says Matthew Zeiler, CEO of Clarifai, which offers an image recognition service used by companies including BuzzFeed to organize and search photos and video. Programmers used to have to invent the math software needed to look for visual features, and the results weren’t good enough to build many useful products.

Zeiler developed a way to visualize the workings of neural networks as a grad student working with Rob Fergus at NYU. The images in the slideshow above take you inside a deep-learning network trained with 1.3 million photos for the standard image recognition test on which systems from Microsoft and others can now beat humans. It asks software to spot 1,000 different objects as diverse as mosquito nets and mosques. Each image shows visual features that most strongly activate neurons in one layer of the network.

 

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.