Skip to Content

Why Talking Computers Are Tough to Listen To

The subtle social and emotional cues in our voices are vital to how we communicate—and making a computer reproduce them is really hard.
February 16, 2016

Maybe you watched the unveiling of IBM’s Watson live on Jeopardy! in 2009. Or perhaps you caught the tech firm’s latest ad campaign on TV, which features goofy dialogues between Watson and Serena Williams, Richard Thaler, or Bob Dylan.

Even if not, chances are you’ve interacted with a talking computer at some point. But creating a convincing talking computer is actually really hard. In an interesting story in the New York Times on Monday, tech writer John Markoff discussed the effort that went into creating the voice for IBM’s Watson and used that as a way into a discussion of the efforts under way to create more natural and acceptable computer voices.

This is one of the fascinating challenges of human-computer interaction: social and emotional cues are vitally important when it comes to vocal communications. It’s not only jarring if the voice of an assistant such as Apple’s Siri or Amazon’s Alexa sounds unnatural. It can also be vexing when such a system fails to recognize your tone and modulate its own voice accordingly. After you ask the same question with increasing frustration, for instance, it feels like an affront for an artificial voice to continually produce the same deadpan response.

A little while after Siri came out, I wrote about the importance of trying to capture humor for creating something capable of entertaining users while avoiding annoying them. Indeed, the need to fit artificial intelligence into an existing social framework may explain why we find it necessary to assign characteristics such as gender to even fictional robots. Perhaps this even explains why Apple recently acquired Emotient, a company that focuses on reading and responding to human emotions.

Joaquin Phoenix falls in love with a computer in the movie "Her."

It’s also interesting to consider the potential of truly engaging, emotionally powerful computer interfaces, of the kind portrayed so well in the Spike Jonze movie Her. But it’s still very difficult to decode and mimic all of the subtleties of human communication. As Michael Picheny, a senior manager at the Watson Multimodal Lab for IBM Research, says in the NYT piece: “A good computer-machine interface is a piece of art, and should be treated as such.”

(Source: New York Times)

Keep Reading

Most Popular

A Roomba recorded a woman on the toilet. How did screenshots end up on Facebook?

Robot vacuum companies say your images are safe, but a sprawling global supply chain for data from our devices creates risk.

A startup says it’s begun releasing particles into the atmosphere, in an effort to tweak the climate

Make Sunsets is already attempting to earn revenue for geoengineering, a move likely to provoke widespread criticism.

10 Breakthrough Technologies 2023

Every year, we pick the 10 technologies that matter the most right now. We look for advances that will have a big impact on our lives and break down why they matter.

These exclusive satellite images show that Saudi Arabia’s sci-fi megacity is well underway

Weirdly, any recent work on The Line doesn’t show up on Google Maps. But we got the images anyway.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.