Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo


Unsupported browser: Your browser does not meet modern web standards. See how it scores »

{ action.text }

Masters of Multimedia

Eric Chang is a sultan of speech. He talks fast, asks lots of questions, and seems to know what you’re going to say before you say it. It’s a bit unnerving at first, but given his graduate training in speech recognition at MIT, it makes sense. And since computer keyboards have trouble accommodating Asian languages-thousands of characters, in contrast to a few dozen letters-part of the motivation for Chang’s speech group in Beijing is to develop better interfaces for Asian users. Speech-based systems are part of Microsoft’s plan to enable legions of Chinese, for starters, to access information and communicate more effectively.

Chang walks into the office of a young researcher, Min Chu, and asks her to fire up the text-to-speech demo. Chu types in a sentence-in Chinese but sprinkled with English words, as is common in technical passages and discussions. After a few seconds, the computer generates a natural-sounding female voice, which sounds perfectly bilingual as it repeats the typed sentence over speakers on the desktop.

The trick is to get the inflections, timing, and transitions from word to word to sound just right-and not like a robotic monotone. Unlike other speech synthesizers, Chang and Chu’s software breaks text into different-size chunks-phonemes, syllables, or whole words-and uses a database of more than 10,000 spoken sentences to select and piece together the right sounds. This bilingual synthesizer is “really head and shoulders above anything I’ve heard,” says MIT’s Zue, an expert on spoken-language systems.

It’s an example of how the lab’s cultural perspective has been instrumental in solving problems. The first goal of the project was to create a Mandarin speech synthesizer for the Chinese market. “In 2001, we had our first Bill G.’ review,” says Chang. “He said, That’s good, but I don’t understand Chinese.’” That reaction from Microsoft’s chairman motivated Chang’s group to apply the same mathematical models to English. Because pitch matters so much in Mandarin-a subtle change of tone is all that distinguishes the word for “mother” from the word for “horse”-the system was better able to capture the inflections of English and other languages as well. Expect to see this voice synthesis software on the market in the next few years, says Chang, who recently became assistant managing director of the lab’s Advanced Technology Center.

The Beijing lab is also helping Microsoft understand the Asian marketplace in more immediate consumer areas, such as multimedia communications over mobile devices. Already, there are more than 240 million cell-phone users in China alone. They tend to update their services more often than U.S. users and are more interested in gadgets generally, says Shipeng Li, head of the lab’s Internet media group and another former Sarnoff researcher. “Here it’s like fashion,” he says.

The stylishly casual Li wears jeans and comes across as more laid-back than other researchers. His group is all about smooth-smooth video, that is. In the next room, one of Li’s 20 students has set up a demo of one of the world’s first videoconferencing systems that runs on a handheld computer. The student picks up the handheld-which houses a video camera, microphone, wireless link, and data communication software-and speaks into it. His face shows up on the screen of a nearby desktop computer, which is similarly equipped. The video is encoded at 10 frames per second, enough to look fairly smooth, with an audio delay of about half a second as the researchers talk back and forth. Although the quality is lower than that of normal video, says Li, it’s still far higher than that of existing handheld technologies.

The key advance: software running on each user’s computer monitors data channel conditions, takes into account what kinds of devices are being used, and efficiently compresses the video stream so that fewer bits need to be sent. Some 50,000 users have downloaded the latest prototype version of the software from Microsoft’s website. If transmission delays can be reduced, Li says, handheld videophones should take off in the Asian market within three years.

But there are nearer-term applications, too. Take Web downloads of multimedia files. Researchers in Li’s group are developing ways to code video so it can be sent to your desktop without the pauses, skips, and hang-ups that are all too common with today’s Internet links. Li’s system does this by adapting to the conditions of the data connection.

Li employs a simple analogy to explain Microsoft’s advance. Imagine media content as “freight to be transported,” he says. Instead of today’s strategy of sending it in one big truck, which can get stuck in a traffic jam, Li’s team sends it in pieces in smaller vehicles, giving higher priority to those bits identified ahead of time as being especially important. Even if some pieces get stuck or lost, on average the most important ones-those that describe the basic picture structure and how it’s changing-get through.

The end result is smoother, more reliable video downloads. Using the technology, Li plays a video of singer Christina Aguilera; right next to it, he plays the same video on Microsoft’s current media player. The new version is less jerky and doesn’t skip. Indeed, says Li, the next release of Microsoft’s media player will incorporate this smooth scheme, courtesy of the Beijing lab.

1 comment. Share your thoughts »

Tagged: Business

Reprints and Permissions | Send feedback to the editor

From the Archives


Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me