AI image recognition has made some stunning advances, but as new research shows, the systems can still be tripped up by examples that would never fool a person.
Labsix, a group of MIT students who recently tricked an image classifier developed by Google to think a 3-D-printed turtle was a rifle, released a paper on Wednesday that details a different technique that could fool systems even faster. This time, however, they managed to trick a “black box,” where they only had partial information on how the system was making decisions.
The team’s new algorithm starts with an image it wants to use to trick another system—in the example from their paper, it’s a dog—and then starts altering pixels to make the image look more like the source image; in this case, skiers. As it works, the “adversarial” algorithm challenges the image recognition system with versions of the picture that quickly move into territory any human would recognize as skiers (check out the gif, above). But all the while, the algorithm maintains just the right combination of sabotaged pixels to make the system think it’s looking at a dog.
The researchers tested their method on Google’s Cloud Vision API—a good test case in part because Google has not published anything about how the computer vision software works, or even all the labels the system uses to classify images. The team says that they’ve only tried foiling Google’s system so far, but that their technique should work on other image recognition systems as well.
There are plenty of researchers working on countering adversarial examples like this, but for safety-critical uses, such as autonomous vehicles, artificial intelligence won’t be trusted until adversarial attacks are impossible, or at least much more difficult, to pull off.