Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

Researchers at Microsoft have made software that can learn the sound of your voice, and then use it to speak a language that you don’t. The system could be used to make language tutoring software more personal, or to make tools for travelers.

In a demonstration at Microsoft’s Redmond, Washington, campus on Tuesday, Microsoft research scientist Frank Soong showed how his software could read out text in Spanish using the voice of his boss, Rick Rashid, who leads Microsoft’s research efforts. In a second demonstration, Soong used his software to grant Craig Mundie, Microsoft’s chief research and strategy officer, the ability to speak Mandarin.

Hear Rick Rashid’s voice in his native language and then translated into several other languages:

English:

Spanish:

Italian:

Mandarin:

In English, a synthetic version of Mundie’s voice welcomed the audience to an open day held by Microsoft Research, concluding, “With the help of this system, now I can speak Mandarin.” The phrase was repeated in Mandarin Chinese, in what was still recognizably Mundie’s voice.

“We will be able to do quite a few scenario applications,” said Soong, who created the system with colleagues at Microsoft Research Asia, the company’s second-largest research lab, in Beijing, China.

“For a monolingual speaker traveling in a foreign country, we’ll do speech recognition followed by translation, followed by the final text to speech output [in] a different language, but still in his own voice,” said Soong.

The new technique could also be used to help students learn a language, said Soong. Providing sample foreign phrases in a person’s own voice could be encouraging, or easier to imitate. Soong also showed how his new system could improve a navigational directions phone app, allowing a stock synthetic English voice to seamlessly read out text written on Chinese road signs as it relayed instructions for a route in Beijing.

The system needs around an hour of training to develop a model able to read out any text in a person’s own voice. That model is converted into one able to read out text in another language by comparing it with a stock text-to-speech model for the target language. Individual sounds used by the first model to build up words using a person’s voice in his or her own language are carefully tweaked to give the new text-to-speech model a full ability to sound out phrases in the second language.

30 comments. Share your thoughts »

Tagged: Computing, Microsoft, translation

Reprints and Permissions | Send feedback to the editor

From the Archives

Close

Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me