Technology Review - Published By MIT
Advertisement

June 2004

The World's Hottest Computer Lab

Continued from page 3

By Gregory T. Huang

smaller text tool iconmedium text tool iconlarger text tool icon

Masters of Multimedia

Eric Chang is a sultan of speech. He talks fast, asks lots of questions, and seems to know what you're going to say before you say it. It's a bit unnerving at first, but given his graduate training in speech recognition at MIT, it makes sense. And since computer keyboards have trouble accommodating Asian languages-thousands of characters, in contrast to a few dozen letters-part of the motivation for Chang's speech group in Beijing is to develop better interfaces for Asian users. Speech-based systems are part of Microsoft's plan to enable legions of Chinese, for starters, to access information and communicate more effectively.

Chang walks into the office of a young researcher, Min Chu, and asks her to fire up the text-to-speech demo. Chu types in a sentence-in Chinese but sprinkled with English words, as is common in technical passages and discussions. After a few seconds, the computer generates a natural-sounding female voice, which sounds perfectly bilingual as it repeats the typed sentence over speakers on the desktop.

The trick is to get the inflections, timing, and transitions from word to word to sound just right-and not like a robotic monotone. Unlike other speech synthesizers, Chang and Chu's software breaks text into different-size chunks-phonemes, syllables, or whole words-and uses a database of more than 10,000 spoken sentences to select and piece together the right sounds. This bilingual synthesizer is "really head and shoulders above anything I've heard," says MIT's Zue, an expert on spoken-language systems.

It's an example of how the lab's cultural perspective has been instrumental in solving problems. The first goal of the project was to create a Mandarin speech synthesizer for the Chinese market. "In 2001, we had our first Bill G.' review," says Chang. "He said, That's good, but I don't understand Chinese.'" That reaction from Microsoft's chairman motivated Chang's group to apply the same mathematical models to English. Because pitch matters so much in Mandarin-a subtle change of tone is all that distinguishes the word for "mother" from the word for "horse"-the system was better able to capture the inflections of English and other languages as well. Expect to see this voice synthesis software on the market in the next few years, says Chang, who recently became assistant managing director of the lab's Advanced Technology Center.

The Beijing lab is also helping Microsoft understand the Asian marketplace in more immediate consumer areas, such as multimedia communications over mobile devices. Already, there are more than 240 million cell-phone users in China alone. They tend to update their services more often than U.S. users and are more interested in gadgets generally, says Shipeng Li, head of the lab's Internet media group and another former Sarnoff researcher. "Here it's like fashion," he says.

The stylishly casual Li wears jeans and comes across as more laid-back than other researchers. His group is all about smooth-smooth video, that is. In the next room, one of Li's 20 students has set up a demo of one of the world's first videoconferencing systems that runs on a handheld computer. The student picks up the handheld-which houses a video camera, microphone, wireless link, and data communication software-and speaks into it. His face shows up on the screen of a nearby desktop computer, which is similarly equipped. The video is encoded at 10 frames per second, enough to look fairly smooth, with an audio delay of about half a second as the researchers talk back and forth. Although the quality is lower than that of normal video, says Li, it's still far higher than that of existing handheld technologies.

The key advance: software running on each user's computer monitors data channel conditions, takes into account what kinds of devices are being used, and efficiently compresses the video stream so that fewer bits need to be sent. Some 50,000 users have downloaded the latest prototype version of the software from Microsoft's website. If transmission delays can be reduced, Li says, handheld videophones should take off in the Asian market within three years.

But there are nearer-term applications, too. Take Web downloads of multimedia files. Researchers in Li's group are developing ways to code video so it can be sent to your desktop without the pauses, skips, and hang-ups that are all too common with today's Internet links. Li's system does this by adapting to the conditions of the data connection.

Li employs a simple analogy to explain Microsoft's advance. Imagine media content as "freight to be transported," he says. Instead of today's strategy of sending it in one big truck, which can get stuck in a traffic jam, Li's team sends it in pieces in smaller vehicles, giving higher priority to those bits identified ahead of time as being especially important. Even if some pieces get stuck or lost, on average the most important ones-those that describe the basic picture structure and how it's changing-get through.

The end result is smoother, more reliable video downloads. Using the technology, Li plays a video of singer Christina Aguilera; right next to it, he plays the same video on Microsoft's current media player. The new version is less jerky and doesn't skip. Indeed, says Li, the next release of Microsoft's media player will incorporate this smooth scheme, courtesy of the Beijing lab.

June 2004

Would you like to read more articles from the June 2004 issue?

This article is from the June 2004 Issue of Technology Review. To read other articles from this issue simply register for My.TechnologyReview.com. It's free.

Subscribe today and save up to 41% »

Resources

Events

Comments

  • New Revolutionary  Technology for online distance learning
    rousseau1789 on 08/04/2007 at 4:04 PM
    Posts:
    1
    Avg Rating:
    3/5
    Hello all,

    I am a homeschool teacher here in America (California) and I have just discovered a revolutionary technology that has made it possible to have a real virtual classroom experience with my students.

    The compnay is Authogen-WiZiQ.

    Check out their website:
    www.wiziq.com

    Sincerely,

    Mark Cruthers
    Rate this comment: 12345
Advertisement

Current Issue

Technology Review November/December 2008
Sun + Water = Fuel
An MIT chemist has opened the way to making hydrogen fuel from water using sunlight.
•  Subscribe
Save 41%
•  Table of Contents
•  MIT News

Magazine Services

Career Resources

MIT Technology Insider

Stories and breaking news from inside MIT about the latest research, innovations, and startups--in a convenient monthly e-newsletter. Subscribe today

Follow us on Twitter

Twitter

Get Technology Review updates via the web, cellphone, or Instant Messager – Follow techreview on Twitter!

Advertisement

More Technology News from Forbes

Advertisement
Advertisement
TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology