Technology That Knows When to Hand You a Hankie

Happy? Sad? A startup called Beyond Verbal has developed technology that can understand how you’re feeling just by listening to your voice.

Rachel Metzarchive page

June 3, 2013

Yuval Mor might make it possible for your stereo to set the mood automatically, simply by listening to the sound of your voice.

Mor is the CEO of Beyond Verbal, a Tel Aviv–based startup that says it has built technology that can analyze the tones of the human voice to determine what people are feeling, regardless of the language they speak.

This is extremely difficult to do at all, let alone to do well. But it could have a big payoff. Machines that can detect and recognize emotions could help in a wide variety of applications, such as driver-assistance services in cars or programs that help police with interrogations. Imagine a version of Windows that understood when it was confusing users and then presented them with a simpler interface. Beyond Verbal’s software is already being tested in call centers, Mor says.

The company says it can determine aspects of our vocal intonations that machines have thus far been unable to track. Such signals, which babies seem to understand even though they can’t yet understand language, offer indications of a speaker’s mood, attitude, and personality.

The company is one of several on a quest to use tech to understand our emotions—a field called affective computing. For instance, the startup Affectiva analyzes facial expressions to discover how people feel about ads. Simple Emotion detects emotion in voice to help autistic people understand how other people are feeling.

Beyond Verbal works by analyzing voice modulation, Mor says, and by seeking specific patterns in the way people talk. This information, which is taken from 10- to 15-second voice snippets, is run through the company’s software. The effectiveness is hard to judge from a canned demo, but a short video showing off Beyond Verbal’s technology gives a taste: as President Obama speaks during a recording of a 2012 presidential debate, Beyond Verbal’s analysis determines that he is revealing “practicality,” “anger,” “cynicism,” and “extrovert egocentricity.”

The company claims to detect emotions with 80 percent accuracy. It might be possible to improve that by combining the technology with others, such as systems that can understand words and context.

James Lester, a professor at North Carolina State University whose research includes affective computing, says that it isn’t far-fetched to think that Beyond Verbal can properly identify emotions, but moving beyond simply defining emotions as positive or negative gets much harder. “For example, if you go with a system that has something like eight or 10 or 12 or even more categories of affected states,” he says, “then it becomes considerably more difficult because the ability to correctly classify is not going up linearly, it’s going up fairly steeply.”

And Clifford Nass, a Stanford University professor who studies human-computer interaction, says that while it is possible to use a machine to detect some emotion in the human voice, no technology is nearly as good at it as the human brain is. It’s also very hard to account for the differences that arise in tonal languages like Chinese. (Beyond Verbal acknowledges that its system does need to be calibrated to properly detect pitch for some Asian languages in particular.)

“Knowing emotion is awesome. It’s a great problem to try,” Nass says. “But it’s just so hard.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.