Alexa needs a robot body to escape the confines of today’s AI

The man behind Amazon’s voice assistant says AI programs need to see and explore the world if they’re ever going to attain real understanding.

Will Knightarchive page

March 26, 2019

Ms. Tech | Amazon

“Alexa, why aren’t you smarter?”

It’s question that Rohit Prasad, head scientist of the Alexa artificial-intelligence group at Amazon, keeps asking himself. It’s also a conundrum that tells us how much progress we’ve really made in AI—and how much farther there is to go.

Prasad outlined the technology behind Alexa, as well as the intellectual limits of all AI assistants, Tuesday at EmTech Digital, MIT Technology Review’s AI conference.

Amazon’s peppy virtual helper has hardly been a flop. The company introduced Alexa in 2014 as the ever patient, relentlessly cheerful female interface for its Echo smart speaker, a tabletop device that zeroes in on your voice from across a room and responds to spoken queries and commands.

Over 100 million Echo products have been sold since 2014, and the success of the product line prompted Google and Apple to rush out competitors. Virtual assistants are now available through hundreds of different devices, including TVs, cars, headphones, baby monitors, and even toilets.

Such popularity is a testament to how good software has become at responding to simple requests. Users have little patience for overly dumb virtual helpers. But spend much time with them and the technology’s shortcomings quickly reveal themselves. Alexa is easily confused by follow-on questions or a misplaced “umm,” and it cannot hold a proper conversation because it’s baffled by the ambiguity of language.

The reason Alexa gets tripped up, Prasad said, is that the words we use contain more power and meaning than we often realize. Every time you say something to another person, that person must use preexisting understanding of the world to construct the meaning of what you are saying. “Language is complicated and ambiguous by definition,” he said in an interview before the conference. “Reasoning and context have to come in.”

Alexa has some advantages over an analog human brain—like access to a vast encyclopedia of useful facts. By querying this knowledge base, Alexa can determine if you’re talking about a person, a place, or a product. This is more of a hack than a route to real intelligence, though. There are many situations where the meaning of a statement will still be ambiguous.

Even a simple-looking question like “What’s the temperature?” requires Alexa to do some reasoning. You could be asking what the weather is like outside, or maybe you want a reading from an internet-connected thermostat or oven.

Prasad explains that Alexa has ways to try to iron out such wrinkles—it knows your location and the time of day, and it can access every question you’ve ever asked, as well as queries from other people in the same city. If you ask it to play a particular song, for example, Alexa might guess that you’re after a cover version rather than the original, if enough people nearby are listening to that song.

But this kind of contextual information takes Alexa only so far. To be decoded, some statements require a much deeper understanding of the world—what we refer to as “common sense.”

Some researchers are now working on ways to let computers build and maintain their own sources of common sense. A growing number of practitioners also believe that machines will not master language unless they experience the world.

This could mean that Alexa will one day live inside something resembling a robot with eyes, limbs, and a way of moving around. “The only way to make smart assistants really smart is to give it eyes and let it explore the world,” Prasad said. Amazon has already created versions of Alexa with a camera. Other companies are developing personal robots capable of responding to spoken queries. Amazon is rumored to be working on some kind of home robot as well.

Although Prasad wouldn’t comment specifically on that, his comments show how deeply Amazon is thinking about the AI behind its voice helper. Indeed, if AI assistants do assume a physical presence, it could create a virtuous feedback cycle. Bringing together different capabilities—speech, vision, and physical manipulation—should create AI programs with much better language skills. It might also make robots that are a lot smarter and more helpful.

The question, then, may be: “Alexa, how smart are you going to get?”