A new experimental mobile app developed by Facebook’s artificial intelligence researchers can answer spoken queries about the content of photos.
Yann LeCun, director of Facebook’s artificial intelligence research group, showed off the app, which could one day help visually impaired people, in a talk at MIT Technology Review’s EmTech conference in Cambridge, Massachusetts. He showed the app answering a series of questions about several photos. In one, a cat was sniffing a large bunch of unripe bananas. The app correctly answered spoken queries asking whether there was a cat in the image, what it was doing, what the bananas were on, and about the color of the bananas (green) and the cat (black and white). In another example a dog held a toy in its mouth. When asked what game the dog was playing, the app correctly answered “Frisbee.”
“What you’re seeing is not fake; it’s a real system, and it’s able to answer basic questions about images,” said LeCun. “A system that actually describes an image could be a very useful thing for the visually impaired.”
The app is an example of how artificial intelligence researchers at Facebook and elsewhere are trying to build systems that combine an understanding of images with an ability to understand language (see “Google’s Brain-Inspired Software Describes What It Sees in Complex Images”). Historically those areas have been worked on separately, but combining them could create systems better able to understand our world and help us manage it.
LeCun’s research group is mostly focused on using a technique called deep learning, which involves crafting software that learns from data and is roughly modeled on the way brain cells connect and work together. The technique has produced significant jumps in the ability of machines to understand speech and recognize objects in images. LeCun believes that it will soon allow computers to grasp many nuances of language and be capable of basic conversation (see “Teaching Machines to Understand Us”).
The app LeCun showed off Monday is powered by a technique, developed by his group, called memory networks, which has previously been shown to be able to learn basic verbal reasoning by reading through simple stories.
Eric Horvitz, director of Microsoft’s Redmond, Washington, research lab, also spoke at the EmTech event Monday, and said systems that combine different skills and machine intelligence techniques are an important next step for artificial intelligence that would allow it to be more powerful.
“We can start weaving these things together now to build larger experiences,” he said. “I’m pretty sure that the next big leaps in AI will come from integrative solutions rather than the great work to date on specific wedges.”
This artist is dominating AI-generated art. And he’s not happy about it.
Greg Rutkowski is a more popular prompt than Picasso.
This nanoparticle could be the key to a universal covid vaccine
Ending the covid pandemic might well require a vaccine that protects against any new strains. Researchers may have found a strategy that will work.
How do strong muscles keep your brain healthy?
There’s a robust molecular language being spoken between your muscles and your brain.
The 1,000 Chinese SpaceX engineers who never existed
LinkedIn users are being scammed of millions of dollars by fake connections posing as graduates of prestigious universities and employees at top tech companies.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.