A new experimental mobile app developed by Facebook’s artificial intelligence researchers can answer spoken queries about the content of photos.
Yann LeCun, director of Facebook’s artificial intelligence research group, showed off the app, which could one day help visually impaired people, in a talk at MIT Technology Review’s EmTech conference in Cambridge, Massachusetts. He showed the app answering a series of questions about several photos. In one, a cat was sniffing a large bunch of unripe bananas. The app correctly answered spoken queries asking whether there was a cat in the image, what it was doing, what the bananas were on, and about the color of the bananas (green) and the cat (black and white). In another example a dog held a toy in its mouth. When asked what game the dog was playing, the app correctly answered “Frisbee.”
“What you’re seeing is not fake; it’s a real system, and it’s able to answer basic questions about images,” said LeCun. “A system that actually describes an image could be a very useful thing for the visually impaired.”
The app is an example of how artificial intelligence researchers at Facebook and elsewhere are trying to build systems that combine an understanding of images with an ability to understand language (see “Google’s Brain-Inspired Software Describes What It Sees in Complex Images”). Historically those areas have been worked on separately, but combining them could create systems better able to understand our world and help us manage it.
LeCun’s research group is mostly focused on using a technique called deep learning, which involves crafting software that learns from data and is roughly modeled on the way brain cells connect and work together. The technique has produced significant jumps in the ability of machines to understand speech and recognize objects in images. LeCun believes that it will soon allow computers to grasp many nuances of language and be capable of basic conversation (see “Teaching Machines to Understand Us”).
The app LeCun showed off Monday is powered by a technique, developed by his group, called memory networks, which has previously been shown to be able to learn basic verbal reasoning by reading through simple stories.
Eric Horvitz, director of Microsoft’s Redmond, Washington, research lab, also spoke at the EmTech event Monday, and said systems that combine different skills and machine intelligence techniques are an important next step for artificial intelligence that would allow it to be more powerful.
“We can start weaving these things together now to build larger experiences,” he said. “I’m pretty sure that the next big leaps in AI will come from integrative solutions rather than the great work to date on specific wedges.”
A Roomba recorded a woman on the toilet. How did screenshots end up on Facebook?
Robot vacuum companies say your images are safe, but a sprawling global supply chain for data from our devices creates risk.
A startup says it’s begun releasing particles into the atmosphere, in an effort to tweak the climate
Make Sunsets is already attempting to earn revenue for geoengineering, a move likely to provoke widespread criticism.
10 Breakthrough Technologies 2023
These exclusive satellite images show that Saudi Arabia’s sci-fi megacity is well underway
Weirdly, any recent work on The Line doesn’t show up on Google Maps. But we got the images anyway.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.