It can be incredibly frustrating when a virtual assistant repeatedly misunderstands what you’re saying. Soon, though, some of them might at least be able to hear the irritation in your voice, and offer an apology.
Amazon is working on significant updates to Alexa, the virtual helper that lives inside the company’s voice-controlled home appliance, called Amazon Echo. These will include better language skills and perhaps the ability to recognize the emotional tenor of your voice.
A source familiar with the Echo project says Amazon’s researchers are looking at ways to stay ahead of the competition, primarily through a better understanding of a user’s intent. Researchers are exploring new natural-language processing techniques, but also ways to sense the emotion in a person’s voice. “How human affect is recognized and then reflected by [Alexa’s] voice will be a key area of [Amazon’s] R&D,” the source says.
Amazon launched the Echo, with relatively little fanfare, in November 2014. The device has proven a surprise hit, and competitors have clearly taken notice.
The device seems to realize the promise of voice as a more natural and frictionless way to interact with technology.
Key improvements might help Amazon maintain an edge as Google and Apple ramp up their own voice-controlled home devices. Google recently announced a new virtual assistant and an Echo-like home device, called Google Home (see “Google Finally Launches Siri Killer in Pivot Away from Conventional Search”). And Apple is rumored to be working on opening Siri up to app developers, and also to be developing its own answer to the Echo.
Although other voice-controlled software assistants, such as Apple’s Siri and Microsoft’s Cortana, pre-date Alexa, these are only optional interfaces. Indeed, studies suggest that Siri is mainly only used for a few tasks, primarily calling people, sending texts, and setting alarms. The Echo is the first computer for which the main interface is your voice. The only physical controls are an on-off switch, a button to mute the microphone, and a knob for the volume; but it can also be controlled using an app.
Overall improvements to Alexa’s natural-language understanding are likely to help the device interpret ambiguous requests more accurately, by applying probabilities techniques, the source says. For example, a person who is located in Seattle may be judged more likely to be referring to the Seahawks when he or she asks, “How are the Hawks doing?”
Already, Amazon uses data about a user’s interests to prime the voice recognition system. Alexa is more likely to recognize requests to hear jazz artists from users who have previously added jazz to their digital music library, for instance.
Further improvements will see Alexa better able to hold a conversation—remembering what a person has said previously, and applying that knowledge to subsequent interactions. “That’s one of the active areas,” the source familiar with Amazon’s research says. “It is super-vital for the conversation to be magical.”
Researchers have long predicted that emotional cues could make machine interfaces much smarter, but so far such technology has not been incorporated into any consumer technology.
Rosalind Picard, a professor at MIT’s Media Lab, says adding emotion sensing to personal electronics could improve them: “Yes, definitely, this is spot on.” In a 1997 book, Affective Computing, Picard first mentioned the idea of changing the voice of a virtual helper in response to a user’s emotional state. She notes that research has shown how matching a computer’s voice to that of a person can make communication more efficient and effective. “There are lots of ways it could help,” she says.
The software needed to detect the emotional state in a person’s voice exists already. For some time, telephone support companies have used such technology to detect when a customer is becoming irritated while dealing with an automated system. In recent years, new machine-learning techniques have improved the state of the art, making it possible to detect more emotional states with greater accuracy, although the approach is far from perfect.
Even so, the relevance of emotion has evidently come to the attention of some big tech companies. In January, Apple bought Emotient, a company specializing in emotion detection, primarily through facial expressions.
Rob May, CEO of Talla, a company that is developing software agents for businesses, says better language parsing and detecting emotional states could improve virtual assistants, but letting users train them to do new tasks themselves would be even better. “If I was in Apple’s shoes, I would find a way to give people the ability to train Siri,” he says.