Late last month a few hundred lucky users of Facebook’s mobile messaging app got an unusual new contact to talk with: M, a virtual assistant powered by a mixture of algorithms and human operators.
That cyborg design makes M capable of handling much more complex requests than the mobile app assistants that Apple, Microsoft, and Google offer in their smartphone software. Siri, Cortana, and Google’s search app can interpret simple commands or factual queries, such as “What’s the weather forecast for London?” But they can’t field more complex questions such as “Where can I get a good burger in Chicago?” They can’t enter into back-and-forth conversation or book a hotel.
M can do those things because the software hands off things it can’t do to human operators known as “trainers.” Sometimes a trainer has to do all the work, but M is also capable of digesting queries it recognizes but can’t handle into easy-to-process summaries that make a trainer’s work more efficient.
Right now this model is not efficient enough for M to be more than just an experiment, because it requires too many human workers. But Alex Lebrun, who leads the team working on Facebook’s assistant, says that it can become a real product because the work of the human trainers is gradually teaching the software how to do a greater share of the work. LeBrun and his team joined Facebook when the social network acquired the startup he cofounded, Wit.ai (see “Making the Internet of Things Understand Your Voice”). He recently met with MIT Technology Review. What follows is an edited transcript of the conversation.
Automated virtual assistants such as Siri have been offered for a while. Why make an assistant where humans do some of the work?
Virtual assistants on the market, like Siri and Cortana, are like search – you can ask a question and get an answer, but it’s limited. People have always been frustrated. On average, people who use Siri every day or week only use it for three or four questions. It’s because they’ve been burned. Users stop using it, or use the things they know work.
We wanted to focus on tasks that no AI in the world can do. To do that you need to understand what people want but also make a plan to fulfill it. Nobody has the data to train machine learning to do that. We decided to have AI and humans working together. The AI helps the humans and in turn they train the AI.
Can you give me an example of what people are using it for?
Some people say “Send me an alert every morning at 7 if it’s going to rain.” That’s still out of Siri’s scope.
I use M a lot to plan weekends. I choose a city to spend a weekend in and ask M to book a hotel and make suggestions for things I can do with my three- and one-year-old kids. M will use search or Facebook pages to generate a list and show it to a trainer. If the AI got something wrong, the trainer will eliminate some suggestions. M comes back and says, “Here’s your hotel, and I recommend you do this in the morning, and go to this museum in the afternoon.” You can do this with Google but it takes a lot of browsing. Instead of sending you a list of 50 suggestions, M will prune and prioritize the list, sending the top three or five things. That’s the kind of thing a good human executive assistant does. Once you start using M, it’s very addictive.
Can this become a real product that everyone can use?
I’m sure we can scale it to a generally available product. We have a few dozen trainers. That’s a high ratio [to the number of users] but we are learning a lot of new things. There are a lot of requirements that come up again and again, and we can learn those. We’ll still need trainers for the long tail.
Facebook’s Artificial Intelligence Research group is trying to create software capable of carrying out dialogues like M’s without human input (see “Teaching Machines to Understand Us”). Do you work with them?
We have a close relationship with FAIR. We are working with them on some modules for M. It is a very good opportunity for them to surface what they do. The modules are all built on machine learning. From experience we know that if you start with hard-coded rules, you hit a wall; we want to avoid that.
Can the data from M help FAIR’s effort to produce software able to carry out dialogue by itself?
They are very hungry for any data. For question answering there’s not a lot of data, and it’s simple factual questions that you can use Wikipedia to answer. Because of their limitations, assistants like Siri don’t provide very good data. Here we have full dialogue with a goal and execution. The only way to build this data set is to have real users asking real questions.