“Hey, Alexa”—a phrase that millions of people call out at home just before telling Amazon their desires at that moment. All those people asking Alexa to order kitchen supplies, turn on the lights, or play music gives Amazon a valuable stockpile of data that it could use to fend off competitors and make breakthroughs in what voice-operated assistants can do.
“There are millions of these in households, and they’re not collecting dust,” Nikko Strom, a speech-recognition expert and founding member of the team at Amazon that built Alexa and Echo, said at the AI Frontiers conference in Santa Clara, California, last week. “We get an insane amount of data coming in that we can work on.”
Strom said that data had already helped the company make progress on a longstanding challenge in speech recognition known as the cocktail party problem, where the challenge is to pick out a single voice from a hubbub of many people talking.
Initially Alexa could easily tell that someone had called out its name, but—like other voice-recognition systems—it struggled to know which words being said around it were the request being issued. Then Strom’s team developed a system that notes characteristics of a voice that calls out “Alexa” and uses them to home in on the words of the person asking for help.
The data Amazon is amassing to take on problems like that could be unique. Standard datasets available for training and testing speech recognition systems don’t usually include audio captured in home environments, or using microphone arrays like that the Echo uses to focus on speech from a particular direction, says Abeer Alwan, a professor at University of California, Los Angeles, who works on speech recognition.
“People have been toying with microphone arrays for a long time but I don’t think there has been a deployment at the scale Amazon is talking about,” says Alwan. More data on a particular scenario or type of speech usually translates into better performance, she says.
Strom said he also hopes that his team’s data trove could eventually help upgrade Alexa to being able to follow two people speaking simultaneously. “It’s hard, but there’s been some progress,” he said. “It’s super interesting for us if we could solve that problem.”
Strom didn’t say what Alexa might be able to do once that problem is solved. But it might make it more natural for multiple people to interact with an Echo or other device at once, whether that’s kids peppering Alexa with questions or their parents rattling off a shopping list.
The data piling up from Alexa could also help Amazon fend off Google’s Echo competitor, Google Home, which launched late last year. Google can draw on years of work in Web search and voice search, and sizeable investments in artificial intelligence. But its previous products and businesses don’t naturally collect speech like that of a person calling out to a device in the home, or on the same type of requests people ask home assistants to serve.
Amazon is probably hoping that this contest turns out like the Web search market. Research has suggested that one reason Google’s dominance couldn’t be shaken by startups or well-funded competitors such as Microsoft was that Google had piles more data on what people search for and click on.
Early reviews of Google Home have generally said that it and Amazon’s products are broadly similar, each with their own strong points. And Google is presumably working hard to learn all it can from the data coming in from its new product. But it will take some time for that flow of information to rival what Amazon is getting.
Analysts estimated last November that over five million Echo devices had been sold since its launch two years prior, and Amazon said last month that Echo devices were the top seller over the holiday season. Alexa is also set to start appearing in products, such as speakers, cars, and fridges, from other companies.