A new experimental mobile app developed by Facebook’s artificial intelligence researchers can answer spoken queries about the content of photos.
Yann LeCun, director of Facebook’s artificial intelligence research group, showed off the app, which could one day help visually impaired people, in a talk at MIT Technology Review’s EmTech conference in Cambridge, Massachusetts. He showed the app answering a series of questions about several photos. In one, a cat was sniffing a large bunch of unripe bananas. The app correctly answered spoken queries asking whether there was a cat in the image, what it was doing, what the bananas were on, and about the color of the bananas (green) and the cat (black and white). In another example a dog held a toy in its mouth. When asked what game the dog was playing, the app correctly answered “Frisbee.”
“What you’re seeing is not fake; it’s a real system, and it’s able to answer basic questions about images,” said LeCun. “A system that actually describes an image could be a very useful thing for the visually impaired.”
The app is an example of how artificial intelligence researchers at Facebook and elsewhere are trying to build systems that combine an understanding of images with an ability to understand language (see “Google’s Brain-Inspired Software Describes What It Sees in Complex Images”). Historically those areas have been worked on separately, but combining them could create systems better able to understand our world and help us manage it.
LeCun’s research group is mostly focused on using a technique called deep learning, which involves crafting software that learns from data and is roughly modeled on the way brain cells connect and work together. The technique has produced significant jumps in the ability of machines to understand speech and recognize objects in images. LeCun believes that it will soon allow computers to grasp many nuances of language and be capable of basic conversation (see “Teaching Machines to Understand Us”).
The app LeCun showed off Monday is powered by a technique, developed by his group, called memory networks, which has previously been shown to be able to learn basic verbal reasoning by reading through simple stories.
Eric Horvitz, director of Microsoft’s Redmond, Washington, research lab, also spoke at the EmTech event Monday, and said systems that combine different skills and machine intelligence techniques are an important next step for artificial intelligence that would allow it to be more powerful.
“We can start weaving these things together now to build larger experiences,” he said. “I’m pretty sure that the next big leaps in AI will come from integrative solutions rather than the great work to date on specific wedges.”
A quick guide to the most important AI law you’ve never heard of
The European Union is planning new legislation aimed at curbing the worst harms associated with artificial intelligence.
It will soon be easy for self-driving cars to hide in plain sight. We shouldn’t let them.
If they ever hit our roads for real, other drivers need to know exactly what they are.
This is the first image of the black hole at the center of our galaxy
The stunning image was made possible by linking eight existing radio observatories across the globe.
The gene-edited pig heart given to a dying patient was infected with a pig virus
The first transplant of a genetically-modified pig heart into a human may have ended prematurely because of a well-known—and avoidable—risk.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.