The World's Hottest Computer Lab
Microsoft’s six-year-old Beijing lab has already paid dividends in speech recognition, wireless multimedia and graphics.
Half a world away from the calm beauty of Seattle and Puget Sound, there’s a lab where software dreams come true. At Microsoft Research Asia, the drive to succeed is as intense as the traffic that roars by the front door in unbridled, chaotic fury. If Microsoft’s other facilities around the globe seem idyllic, this one, in Beijing, China, is pure street. Nearby high-rises compete with smokestacks for skyline supremacy. Run-down buildings sit next to bustling consumer electronics markets and the Beijing Satellite Manufacturing Factory, where China conducts its spaceflight research. Microsoft’s mantra: work hard to get in the door; work harder to survive; then work even harder because the real work-that of an information technology world leader-is just beginning.
If you find it hard to root for Microsoft, you’ve never met Harry Shum. The Beijing lab’s managing director is hearty, engaging, and surprisingly young-in his 30s. “This is a new kind of manufacturing in China,” he says, waiting outside his office with a smile. “Not just shoes, socks, baby strollers. Now, we manufacture MIT students, papers, and software.” Shum’s longtime colleague Hongjiang Zhang is walking by but stops to concur: “It’s another level of Made in China,’” he says. Zhang, who’s a little older than Shum and more reserved, heads the lab’s Advanced Technology Center, a division launched late last year to accelerate new technologies into Microsoft’s product pipeline.
Together, Shum and Zhang lead an organization that looks like a typical corporate lab but feels like a startup. For all its cubicles and computers, the lab brims with enthusiasm; its energy comes from, of all things, students. Come in at any hour and you’ll find scores of them-the lab supports about 200 interns at any time, most from local universities-tooling away on projects jointly supervised by Microsoft managers. Add the buzz of Mandarin conversations, the window views of Beijing’s sprawl, and the ever present hint of cigarette smoke, and you are constantly reminded: you’re not in corporate USA anymore.
Although they run the lab, Shum and Zhang are at heart still researchers. Roaming up and down vast aisles of workstations, they show off their latest demos like proud parents. Shum stops at the desk of a young woman he calls “the number-one student” in computer science at Tsinghua University, one of China’s top engineering schools. On her screen are still photos of a waterfall, rain on a lake, and blades of grass.
With a click of the mouse, the scenes come alive. Water tumbles and splashes over the falls, raindrops plunk on the surface, and grass undulates in the breeze. The computer is generating the animation on the spot: software has scoured videos for statistical clues about how water and grass move and applied the lessons to the static images.
It’s all part of the lab’s ambition to lead the world in making computers interactive, entertaining, and ultimately more useful. Other demos include compression algorithms that store rich pictures using relatively few digital bits; computer vision software that tracks and recognizes human faces; a natural-sounding speech synthesizer; and user interfaces that capture handwriting digitally (see “Microsoft’s Magic Pen,” TR May 2004). “They’re doing really first-class research,” says Victor Zue, codirector of MIT’s Computer Science and Artificial Intelligence Laboratory and a member of the Beijing lab’s technical advisory board. And Raj Reddy, a renowned expert in human-computer interaction at Carnegie Mellon University, calls the lab’s leadership and talent pool “outstanding.”
Indeed, with 150 full-time researchers and more than $80 million from its parent company since opening in 1998, Microsoft Research Asia has become a powerhouse of infotech R&D. Far faster than even Microsoft’s top brass expected, the Beijing research outpost is influencing the company’s global business. More than 70 technologies it developed are already used in Microsoft products, including software for Windows operating systems and graphics packages for Xbox video games. More of the lab’s latest software is slated for the next version of Windows (code-named Longhorn), due out in 2006.
The Beijing lab is a key part of Microsoft’s effort to ensure its global future through research. “It’s interesting how much of the research directed at the Asian marketplace turns out to be generally applicable,” says Rick Rashid, senior vice president of Microsoft Research, which besides its main facility in Redmond, WA, also runs labs in San Francisco, Mountain View, CA, and Cambridge, England. “They’ll often attack a problem differently from what would happen in Europe or the U.S., because they come from a different perspective. They often find solutions that are different, and in some cases different turns out to be better.”
So has Bill Gates figured out China? Microsoft’s chairman doesn’t go that far, and his company isn’t the only infotech giant to open a research lab in China (see “Other U.S. Corporate Infotech Labs in China,” below). But he lights up when talk turns to his Beijing bonanza. “When you start a lab, you’re supposed to say, Okay, in five years we want you to contribute,’” Gates told Technology Review. “These guys-nine months after they got started-had these video compression results.” Those kinds of results are already setting the Microsoft lab apart from its competitors, making it a case study in global innovation. “People should pay attention to China,” says Gates. “It is a phenomenon in every respect.”
OTHER U.S. CORPORATE INFOTECH LABS IN CHINA
|IBM China Research Laboratory||1995||Beijing||Speech interfaces for telephones, machine translation, mobile devices, e-commerce|
|Intel China Research Center||1998||Beijing||Speech recognition with visual cues, machine translation, machine learning, advanced software compilers|
|Bell Labs Research China||2000||Beijing||Data networking, communications, optics|
|Motorola China Research Center||2000||Shanghai||Speech and handwriting recognition, natural-language processing, Internet data processing|
Beast from the East
Harry Shum is hungry. His entire lab is hungry. Over a catered lunch of noodles and fish in his Beijing office, Shum explains what drives his staff. “We started from nothing. The whole lab grew from this room. So I don’t rearrange anything,” he jokes, as if feng shui would matter to the world’s largest software company. But just ten years ago, the area around the lab was farmland. Today, Microsoft Research Asia occupies one and a half large floors in a six-story office building with a futuristic glass-front lobby. The lab has come to symbolize a city in the midst of a high-tech revolution.
Shum himself is a vibrant mix of East and West. His English is accented but very clear. Born and raised near Shanghai, he did his graduate work at Carnegie Mellon University (he says he’s “still a die-hard Pittsburgh Steelers fan”) and joined Microsoft Research in Redmond in 1996. There, he became one of the company’s rising stars, creating realistic 3-D graphics and virtual environments using principles borrowed from computer vision.
Two years later came the opportunity: Microsoft was starting a lab in China. The goal was to tap into the country’s immense talent pool of students and scientists, including many who had emigrated to other countries but could be enticed back to their native land. And being in position to explore a marketplace of a billion people in a rapidly industrializing economy couldn’t hurt, either. To lead the charge, Microsoft brought in Kai-Fu Lee, a well-known speech and multimedia expert from Apple Computer and Silicon Graphics. Shum remembers the day well. “Kai-Fu came into my office and said, I’m moving to Beijing, and I’m not leaving without you,’” he says.
Baining Guo wants less talk and more action. Guo, a former Intel researcher and now Microsoft Research Asia’s graphics research manager, doesn’t sit for interviews. He doesn’t do chit-chat. Whether the end product is a video game, a screen saver, or a personalized cartoon rendered from a photograph, he says, graphics is a bottom-line business: either it looks good or it doesn’t. His group consists of 12 staff researchers and, currently, 18 students; to examine their latest results, he walks down the hall to the open area where they all work.
A pressing problem in graphics-one of the lab’s standout areas-is getting computers to animate photorealistic human faces. In today’s video games, “characters’ expressions look fake,” says Guo. “Their faces don’t move believably or naturally.” It’s a tough problem, for instance, to get the wrinkles around the eyes and forehead to look right using conventional techniques that simply morph and stretch the features of an image.
Guo’s team demonstrates a cutting-edge solution. First they take about ten still pictures of a man’s face, each capturing a different expression: eyebrows raised, nose scrunched, laughing, grimacing, and so on. Then, by dividing the face into 14 regions and more than 100 “feature points”-eyelids, tips of eyebrows, corners of lips-their software blends different combinations of the photos to create more natural simulations of new expressions. The software also modulates the image from one expression to another over a few seconds. The result: the man’s face goes from looking surprised to looking disgusted in a realistic way, wrinkles and all.
Unlike the techniques used in computer-animated movies such as Toy Story, the Beijing researchers’ approach requires no manual drawing of frames. That means it could be used in a video game to generate realistic-looking faces on the fly. With some additional configuration, it could also map expressions from a user’s face to a virtual character’s to create a personalized avatar for a role-playing game. What’s more, photos of celebrities could be animated, or reanimated. “We could make Albert Einstein say, I love Windows,’” Guo deadpans. His team, though, is chasing a loftier goal that could ultimately transform moviemaking: software that generates photorealistic virtual actors in real time.
That kind of commitment to more-fundamental computer science research has earned the lab the respect of the academic community. “Microsoft Research is by far the biggest contributor to graphics in the corporate world. It’s a powerhouse,” says Paul Debevec, a graphics expert at the University of Southern California’s Institute for Creative Technologies. The Beijing lab, in particular, has achieved “some amazing results,” he adds. “It’s not just, How can we make a better Xbox?’”
But in fact, a better Xbox is ultimately part of the lab’s mission. Reminders that this is a business, not a researcher’s playground, are never far away. In an adjoining hallway, a large corner room has its windows plastered over with opaque sheets of paper. The sign on the locked door reads, “Xbox: Confidential.” Guo isn’t allowed to talk about what’s going on inside. “Some of our best people work in there,” is all he’ll say.
Masters of Multimedia
Eric Chang is a sultan of speech. He talks fast, asks lots of questions, and seems to know what you’re going to say before you say it. It’s a bit unnerving at first, but given his graduate training in speech recognition at MIT, it makes sense. And since computer keyboards have trouble accommodating Asian languages-thousands of characters, in contrast to a few dozen letters-part of the motivation for Chang’s speech group in Beijing is to develop better interfaces for Asian users. Speech-based systems are part of Microsoft’s plan to enable legions of Chinese, for starters, to access information and communicate more effectively.
Chang walks into the office of a young researcher, Min Chu, and asks her to fire up the text-to-speech demo. Chu types in a sentence-in Chinese but sprinkled with English words, as is common in technical passages and discussions. After a few seconds, the computer generates a natural-sounding female voice, which sounds perfectly bilingual as it repeats the typed sentence over speakers on the desktop.
The trick is to get the inflections, timing, and transitions from word to word to sound just right-and not like a robotic monotone. Unlike other speech synthesizers, Chang and Chu’s software breaks text into different-size chunks-phonemes, syllables, or whole words-and uses a database of more than 10,000 spoken sentences to select and piece together the right sounds. This bilingual synthesizer is “really head and shoulders above anything I’ve heard,” says MIT’s Zue, an expert on spoken-language systems.
It’s an example of how the lab’s cultural perspective has been instrumental in solving problems. The first goal of the project was to create a Mandarin speech synthesizer for the Chinese market. “In 2001, we had our first Bill G.’ review,” says Chang. “He said, That’s good, but I don’t understand Chinese.’” That reaction from Microsoft’s chairman motivated Chang’s group to apply the same mathematical models to English. Because pitch matters so much in Mandarin-a subtle change of tone is all that distinguishes the word for “mother” from the word for “horse”-the system was better able to capture the inflections of English and other languages as well. Expect to see this voice synthesis software on the market in the next few years, says Chang, who recently became assistant managing director of the lab’s Advanced Technology Center.
The Beijing lab is also helping Microsoft understand the Asian marketplace in more immediate consumer areas, such as multimedia communications over mobile devices. Already, there are more than 240 million cell-phone users in China alone. They tend to update their services more often than U.S. users and are more interested in gadgets generally, says Shipeng Li, head of the lab’s Internet media group and another former Sarnoff researcher. “Here it’s like fashion,” he says.
The stylishly casual Li wears jeans and comes across as more laid-back than other researchers. His group is all about smooth-smooth video, that is. In the next room, one of Li’s 20 students has set up a demo of one of the world’s first videoconferencing systems that runs on a handheld computer. The student picks up the handheld-which houses a video camera, microphone, wireless link, and data communication software-and speaks into it. His face shows up on the screen of a nearby desktop computer, which is similarly equipped. The video is encoded at 10 frames per second, enough to look fairly smooth, with an audio delay of about half a second as the researchers talk back and forth. Although the quality is lower than that of normal video, says Li, it’s still far higher than that of existing handheld technologies.
The key advance: software running on each user’s computer monitors data channel conditions, takes into account what kinds of devices are being used, and efficiently compresses the video stream so that fewer bits need to be sent. Some 50,000 users have downloaded the latest prototype version of the software from Microsoft’s website. If transmission delays can be reduced, Li says, handheld videophones should take off in the Asian market within three years.
But there are nearer-term applications, too. Take Web downloads of multimedia files. Researchers in Li’s group are developing ways to code video so it can be sent to your desktop without the pauses, skips, and hang-ups that are all too common with today’s Internet links. Li’s system does this by adapting to the conditions of the data connection.
Li employs a simple analogy to explain Microsoft’s advance. Imagine media content as “freight to be transported,” he says. Instead of today’s strategy of sending it in one big truck, which can get stuck in a traffic jam, Li’s team sends it in pieces in smaller vehicles, giving higher priority to those bits identified ahead of time as being especially important. Even if some pieces get stuck or lost, on average the most important ones-those that describe the basic picture structure and how it’s changing-get through.
The end result is smoother, more reliable video downloads. Using the technology, Li plays a video of singer Christina Aguilera; right next to it, he plays the same video on Microsoft’s current media player. The new version is less jerky and doesn’t skip. Indeed, says Li, the next release of Microsoft’s media player will incorporate this smooth scheme, courtesy of the Beijing lab.
The Gates Dynasty?
On the other side of the lab from Li’s demo, a sea of rsums threatens to swallow up the desk of Hongjiang Zhang. Indeed, 10,000 of them have arrived in six months, he says, in application for staff openings in the new Advanced Technology Center he has been tapped to run. To help screen the onslaught of applicants, Zhang’s team has resorted to administering written exams in 11 cities around China. “The biggest challenge is people,” says Zhang. “We have to get the right blend of partnership, comradeship, and leadership.”
The Advanced Technology Center-marked by a sign in bold letters-is expanding rapidly, with a staff that grew from 20 this winter to 70 by springtime. It represents the next step for the lab, one in which Beijing’s research results will be more directly transferred to products. The goal: to speed up the process of feeding new technologies back to the mother ship.
The center is Zhang’s baby. As a researcher, Zhang created software that looked at pictures and could identify which were visually interesting and which were not-useful for automatic video editing. Now, leaving research behind, he is looking at the bigger picture of the lab and trying to identify those technologies that are most promising for Microsoft’s product groups. “What is the return from investing heavily in long-term research?” he asks. “The mission of the center is to answer that question.”
Zhang reveals a hint of nostalgia as he discusses the center, which was launched in November 2003 at the five-year anniversary of Microsoft Research Asia’s opening. At the ceremony, he says, the company’s research head, Rick Rashid, recounted the lab’s accomplishments and gave his heartfelt congratulations to its leaders in front of Microsoft’s higher-ups. “Looking around the room, we had tears in our eyes,” says Zhang. “We thought, This is a dream come true. We made history.’”
But now, Zhang says, it’s time to start making the company’s future, by developing new products that will be used by a wider swath of society. Instead of sending research managers across the Pacific to meet with product people-a process that Zhang says “will not scale up”-the Advanced Technology Center’s staff will do initial product development in Beijing. Their proximity to the research teams will make it easier to determine which technologies are ready for products. At the same time, they will visit Redmond regularly, staying close enough to product teams that they can advise researchers about real-world issues. That’s a way for “research to create value for the company,” says Henry Chesbrough, an expert on technology strategy and management at the University of California, Berkeley.
The question for Microsoft is whether the Beijing lab can keep its close-knit researchers focused on long-term issues, while at the same time accelerating near-term product development plans. Nobody thinks this balancing act will be easy. “Part of the price you pay is, people begin to ask you for low-hanging fruit,” says MIT’s Zue. “Your success can easily turn into a curse if everybody’s asking you for something they need six months from now.”
If this were the United States, that might be the most daunting challenge the lab faced. But this is China. To remain productive, Microsoft Research Asia will also need to nurture its relationship with government officials and academics, so that it benefits not only Microsoft but also its host country. Therein lies a source of tension. Local graduate students say it is their dream to work for Microsoft. But go higher up the ranks of Chinese academia, and there is talk of a dark side. “It’s a shame the government and university authorities allow such a waste of talent,” says Hongfei Wang, a professor at the Chinese Academy of Sciences’ Institute of Chemistry. “These poor graduate students actually don’t have better choices. But by doing work on company projects, their opportunity for intellectual growth is greatly diminished.”
Indeed, Microsoft’s legacy in China may ultimately depend on whether the company Bill built is able to augment opportunities for Chinese citizens in general. Strengthening the educational system, providing technical training for young people, fostering local software companies, and promoting economic growth are a good start-and smart business-for what might one day be called the Gates dynasty.
At the end of another long workday, Harry Shum gets into a company car that will take him home to a subdivision on the outskirts of Beijing. The lab’s managing director checks his e-mail on a wireless handheld and then uses it to call home. He’s meeting his family for dinner; this will be the first night in a month that he hasn’t worked late. Beijing is peaceful at night, quiet. But things are changing fast. “This highway wasn’t even here five years ago,” Shum says. As he looks down this new road, he is already thinking about tomorrow, fighting the traffic in his mind, figuring out how to take his lab to the next level.