Skip to Content

Why Talking Computers Are Tough to Listen To

The subtle social and emotional cues in our voices are vital to how we communicate—and making a computer reproduce them is really hard.
February 16, 2016

Maybe you watched the unveiling of IBM’s Watson live on Jeopardy! in 2009. Or perhaps you caught the tech firm’s latest ad campaign on TV, which features goofy dialogues between Watson and Serena Williams, Richard Thaler, or Bob Dylan.

Even if not, chances are you’ve interacted with a talking computer at some point. But creating a convincing talking computer is actually really hard. In an interesting story in the New York Times on Monday, tech writer John Markoff discussed the effort that went into creating the voice for IBM’s Watson and used that as a way into a discussion of the efforts under way to create more natural and acceptable computer voices.

This is one of the fascinating challenges of human-computer interaction: social and emotional cues are vitally important when it comes to vocal communications. It’s not only jarring if the voice of an assistant such as Apple’s Siri or Amazon’s Alexa sounds unnatural. It can also be vexing when such a system fails to recognize your tone and modulate its own voice accordingly. After you ask the same question with increasing frustration, for instance, it feels like an affront for an artificial voice to continually produce the same deadpan response.

A little while after Siri came out, I wrote about the importance of trying to capture humor for creating something capable of entertaining users while avoiding annoying them. Indeed, the need to fit artificial intelligence into an existing social framework may explain why we find it necessary to assign characteristics such as gender to even fictional robots. Perhaps this even explains why Apple recently acquired Emotient, a company that focuses on reading and responding to human emotions.

Joaquin Phoenix falls in love with a computer in the movie "Her."

It’s also interesting to consider the potential of truly engaging, emotionally powerful computer interfaces, of the kind portrayed so well in the Spike Jonze movie Her. But it’s still very difficult to decode and mimic all of the subtleties of human communication. As Michael Picheny, a senior manager at the Watson Multimodal Lab for IBM Research, says in the NYT piece: “A good computer-machine interface is a piece of art, and should be treated as such.”

(Source: New York Times)

Keep Reading

Most Popular

DeepMind’s cofounder: Generative AI is just a phase. What’s next is interactive AI.

“This is a profound moment in the history of technology,” says Mustafa Suleyman.

What to know about this autumn’s covid vaccines

New variants will pose a challenge, but early signs suggest the shots will still boost antibody responses.

Human-plus-AI solutions mitigate security threats

With the right human oversight, emerging technologies like artificial intelligence can help keep business and customer data secure

Next slide, please: A brief history of the corporate presentation

From million-dollar slide shows to Steve Jobs’s introduction of the iPhone, a bit of show business never hurt plain old business.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.