Skip to Content

How Computers Can Tell What They’re Looking At

Images from inside an artificial neural network help explain why a technique called deep learning is enabling software to see.
April 11, 2016

Software has lately become much, much better at understanding images. Last year Microsoft and Google showed off systems more accurate than humans at recognizing objects in photos, as judged by the standard benchmark researchers use.

That became possible thanks to a technique called deep learning, which involves passing data through networks of roughly simulated neurons to train them to filter future data (see “Teaching Machines to Understand Us”). Deep learning is why you can search images stored in Google Photos using keywords, and why Facebook recognizes your friends in photos before you’ve tagged them. Using deep learning on images is also making robots and self-driving cars more practical, and it could revolutionize medicine.

That power and flexibility come from the way an artificial neural network can figure out which visual features to look for in images when provided with lots of labeled example photos. The neural networks used in deep learning are arranged into a hierarchy of layers that data passes through in sequence. During the training process, different layers in the network become specialized to identify different types of visual features. The type of neural network used on images, known as a convolutional net, was inspired by studies on the visual cortex of animals.

“These networks are a huge leap over traditional computer vision methods, since they learn directly from the data they are fed,” says Matthew Zeiler, CEO of Clarifai, which offers an image recognition service used by companies including BuzzFeed to organize and search photos and video. Programmers used to have to invent the math software needed to look for visual features, and the results weren’t good enough to build many useful products.

Zeiler developed a way to visualize the workings of neural networks as a grad student working with Rob Fergus at NYU. The images in the slideshow above take you inside a deep-learning network trained with 1.3 million photos for the standard image recognition test on which systems from Microsoft and others can now beat humans. It asks software to spot 1,000 different objects as diverse as mosquito nets and mosques. Each image shows visual features that most strongly activate neurons in one layer of the network.

 

Keep Reading

Most Popular

wet market selling fish
wet market selling fish

This scientist now believes covid started in Wuhan’s wet market. Here’s why.

How a veteran virologist found fresh evidence to back up the theory that covid jumped from animals to humans in a notorious Chinese market—rather than emerged from a lab leak.

light and shadow on floor
light and shadow on floor

How Facebook and Google fund global misinformation

The tech giants are paying millions of dollars to the operators of clickbait pages, bankrolling the deterioration of information ecosystems around the world.

masked travellers at Heathrow airport
masked travellers at Heathrow airport

We still don’t know enough about the omicron variant to panic

The variant has caused alarm and immediate border shutdowns—but we still don't know how it will respond to vaccines.

This new startup has built a record-breaking 256-qubit quantum computer

QuEra Computing, launched by physicists at Harvard and MIT, is trying a different quantum approach to tackle impossibly hard computational tasks.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.