We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Intelligent Machines

Something Lost in Skype Translation

Skype’s real-time translation software highlights remarkable progress in machine learning—but it still struggles with the subtleties of human communication.

Breaking down language barriers could have major cultural and economic implications.

It sometimes seems as if the highest praise an innovative new technology can earn is a credulous comparison to Star Trek. The Oculus Rift is like the Holodeck; 3-D printers are like matter replicators; Qualcomm is even sponsoring an X-Prize contest to build a working tricorder.

And now Skype Translator, a real-time voice and text language translation app currently available to Windows 8.1 users as a public beta, is being widely compared to the “universal translator” that Captains Kirk and Picard used to effortlessly communicate with alien interlocutors. Skype Translator is less capable than that pat sci-fi analogy implies, but its limitations are as fascinating as its formidable technical achievements.

Skype Translator performs instant translation of text chats in over 40 languages, but its marquee feature is real-time, spoken translation between English and Spanish speakers. (Microsoft, which owns Skype, would not comment on what other languages it is planning to incorporate into the software or when we might expect them.)

Unlike Star Trek’s fictional translator, Skype Translator is designed to emulate a human interpreter who acts as an intermediary between the two primary speakers. This virtual interpreter is customizable: I could select a male or female voice and even set its tolerance for translating profanity (I didn’t put that feature to the test). Then, much as a human translator would, it “listened” to my speech, waited for a pause, and spoke my words in Spanish to the Microsoft consultant on the other end of the call. The spoken translation was audible to both of us. And it was often surprisingly accurate.

In theory, Skype translation could be transformative. It’s like a version of the discreet live translation that world leaders enjoy when visiting the United Nations. In practice, though, it can be more like having Apple’s Siri (or Microsoft’s Cortana) constantly interrupting your conversation and talking over you.

Even such crude automated translation is fairly remarkable. It is notoriously difficult for machines to recognize words and phrases quickly and accurately, and Skype Translator achieves a high level of accuracy using a technique known as deep learning. Software running on Microsoft’s servers was trained to recognize words using methods of information processing loosely modeled on the way a biological brain functions (see “10 Breakthrough Technologies 2013: Deep Learning”).

Deep learning lets Microsoft’s computers reliably transform a stream of audio speech into chunks of text, which can then be analyzed using standard translation methods. As more people use the software, this system should become more effective at recognizing idiosyncrasies of accent and cadence, potentially making Skype Translator—and Skype itself—more useful.

Microsoft’s software tries to filter out “disfluencies” (such as “um,” “ah,” and repetitions) on the word and sentence level. Some of these disfluencies made it through during my conversation, but the translation still occurred with impressive speed and accuracy.

The limitations of Skype’s translation software are also revealing, since they show how difficult it is for even the smartest machine to mimic the subtleties of effective human conversation. Determining which meaning of a word is appropriate in different contexts can be vexing. “If software is translating between American and British English, and it recognizes the word ‘football,’ it also needs to know when to change it to ‘soccer’ and when to keep it as ‘football’ or ‘gridiron,’” says Christopher Manning, a professor of linguistics and computer science in Stanford University’s Natural Language Processing Group.

Skype Translator is also deaf to the rhythms of normal spoken conversation, so you can’t be quite sure when its disembodied robot voice is going to break in and start blurting out its translated version. This is something we humans sometimes find challenging, too. “Even with human translators, you need to learn when to pause to let the interpreter absorb what you just said and repeat it,” says Vikram Dendi, strategy director at Microsoft Research.

With practice I could probably learn Skype Translator’s “rhythm” in the same way, which could make the audio experience less distracting. Introducing an on-screen avatar for the “bot” might also help reinforce the metaphor of a third person on the call, perhaps making it easier for the two human speakers to modulate their conversation in a way that makes room for the software speaking on their behalf.

But Skype Translator actually has a fairly elegant solution built in already: on-screen translated text of the spoken conversation, generated in real time. This interface is less overtly futuristic than spoken translation, but it feels more natural. And obvious mistakes are easy to correct, since either party can type into the chat window where the translations appear.

Dendi admits that Skype and Microsoft still don’t know yet what an ideal user experience for the software looks like. “When we watch these things in action on TV [as on Star Trek], it seems so obvious: you just speak and it comes out translated,” he says. “But when you start digging into the actual implementation and put it in people’s hands to use, there are so many little details that can make or break the experience.”

Other efforts to harness deep learning could help. Researchers at Google and the University of Montreal are applying such methods to speech translation itself (as opposed to just speech recognition) “with stunning success,” according to Stanford’s Manning. Further advances could someday make real-time machine translation virtually perfect. Or progress could hit a wall. “The jury is still out,” Manning says. “I think it’s still unclear where the limits of deep learning are for solving higher-level cognitive processing problems.”

Skype Translator certainly hasn’t solved the problem just yet. But it’s a great start on breaking down some language barriers for now.

Cut off? Read unlimited articles today.

Become an Insider
Already an Insider? Log in.
More from Intelligent Machines

Artificial intelligence and robots are transforming how we work and live.

Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.