To make sense of the visual world, it’s not enough to know that you are looking at, say, a cat. You need to know where the cat stops and the background begins.
A computer vision algorithm developed by Facebook and made publicly available to other researchers today gives computers this ability. It can identify not only what’s in an image, but also the shapes that correspond to particular objects. That might seem like a simple trick, but it’s devilishly difficult to program a computer to do it correctly, and is beyond the capabilities of existing vision systems.
For now, Facebook’s algorithm is just a research tool. Ultimately, though, it could have a range of important applications: enabling an image-editing program to automatically change the background or brighten the people shown in a picture; providing ways of describing images in detail to blind computer users; even making augmented reality games like Pokémon Go far more realistic by recognizing objects for Pikachu to climb on.
There have been significant advances in computer vision in recent years, but the progress has mainly been in recognizing objects or types of scenes. Researchers have begun to turn their attention toward deeper image understanding, however, and this is important for making machines more intelligent overall (see “The Next Big Test for AI: Making Sense of the World”).
“One of the hardest things [for computers to do] is to understand reality—what’s actually out there,” says Larry Zitnick, a research manager at Facebook who was involved with the work. “Image segmentation is a critical part of scene reasoning.”
Zitnick says the algorithm might eventually be used to develop a system that automatically highlights the products in an image posted to Facebook, or to create more realistic augmented reality apps. “If you want to put a [virtual] puppy in a room,” he says, “you actually want to put it on a sofa, and on a particular part of that sofa.”
Much progress has been made in computer vision over the past few years using large simulated neural networks trained to categorize images using numerous examples. These “deep learning” systems typically recognize a range of features, such as color and texture, but do not necessarily recognize the outline of an object.
Facebook’s algorithm combines a series of neural networks to perform this sort of “image segmentation.” The first couple of networks are used to determine whether individual pixels are part of one object or another; a third network is then used to determine what those particular objects are.
Stefano Soatto, a professor at UCLA who specializes in computer vision, says the work is “very significant” and could have many applications because image segmentation is deceptively difficult: “Every two-year-old can point to objects and trace their outline in a picture,” Soatto says. “This, however, is deceptive. There are millions of years of evolution and half of the real estate of the brain that goes into accomplishing this feat.”
The gene-edited pig heart given to a dying patient was infected with a pig virus
The first transplant of a genetically-modified pig heart into a human may have ended prematurely because of a well-known—and avoidable—risk.
Saudi Arabia plans to spend $1 billion a year discovering treatments to slow aging
The oil kingdom fears that its population is aging at an accelerated rate and hopes to test drugs to reverse the problem. First up might be the diabetes drug metformin.
Yann LeCun has a bold new vision for the future of AI
One of the godfathers of deep learning pulls together old ideas to sketch out a fresh path for AI, but raises as many questions as he answers.
The dark secret behind those cute AI-generated animal images
Google Brain has revealed its own image-making AI, called Imagen. But don't expect to see anything that isn't wholesome.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.