Microsoft’s 3-D Strategy

Microsoft’s Craig Mundie describes how the company’s vision of 3-D gaming could extend to all computer interactions.

Erica Naonearchive page

October 13, 2010

Microsoft has joined the wave of companies betting that 3-D is the next big thing for computing. At a recent talk at MIT, chief research and strategy officer Craig Mundie said he sees the technology as an innovation that “will get people out of treating a computer as a tool” and into treating the device as a natural extension of how they interact with the world around them. Microsoft plans to introduce consumers to the change through its gaming products, but Mundie outlined a vision that would eventually have people shopping and searching in 3-D as well.

**The future of 3-D:** During a talk at MIT last week, Craig Mundie, Microsoft’s chief research and strategy officer, showed how a natural 3-D interface could let users manipulate and examine products–like the disassembled motorcycle in the background.

The combination of better chips, better displays, and better sensors, Mundie said, is finally making it possible to move computing from today’s graphical user interfaces to the “natural user interface,” by allowing people to interact with 3-D content through the gestures they normally use. Today’s interfaces require users to learn about menu bars and double-clicks, but Mundie believes natural user interfaces, which work through gesture and voice, will be faster and easier to learn, and will prove more flexible in the long run.

Mundie also argued that natural user interfaces would reduce the mental effort required for people to operate software. Even people who are good at using controllers, keyboards, and mouses might find that a natural interface frees up attention and concentration so that they can focus better on the task at hand, he said. He believes that natural interfaces will make it easier to introduce software to people unfamiliar with computers, as well as make software generally easier to use, and therefore more attractive to consumers.

He also noted that today many programs come with what is essentially “an application-specific prosthetic”–for example, some driving games come with a steering-wheel device. Natural user interfaces may require some peripherals, such as depth-sensing cameras that can detect users’ movements, but Mundie sees these as ultimately having broader purpose than most of today’s devices.

The first step in this strategy, Mundie said, is Microsoft’s release next month of the Kinect sensor for the Xbox 360 gaming console; Kinect incorporates a depth-sensing camera and voice recognition and will cost about $150. It will allow users to play games by gesturing, without the need for a controller or additional equipment. This opens the way to 3-D interaction with games that Mundie hopes will lead to broader use of 3-D displays.

Mundie demonstrated how Kinect would allow a user to interact with 3-D game content through hand gestures, virtually picking up clues to examine them or show them to friends. “We’re trying to create a genre of games where you don’t have to think about how what you would do naturally would map to the controls,” Mundie said.

He also showed a concept video for a real-time 3-D multiplayer game called “The Spy from the 2080s” that included a TV show and a game that players could interact with using multiple devices. For example, they might watch an episode in 3-D on TV, then log in through a gaming console to work with friends to solve clues from the show. Mobile devices might provide additional updates. In the video, the outcome of gameplay even influenced the course of the TV show.

But while the company may plan to start with gaming, Mundie envisions 3-D eventually becoming a key part of many computer interfaces and online content. In one example, he demonstrated shopping using a 3-D natural interface; his hand gestures spun a 3-D image of a product, displaying it from a variety of angles, and opened it up to look at the parts inside.

He acknowledged, however, that there are challenges that need to be solved before 3-D can become ubiquitous. “We need a lot more computer than we currently have,” Mundie said, noting that processing high-definition, 3-D video in real time would strain the capabilities of most home computers today. He also admitted that companies still need to refine how users would interact with computers through gesture and voice–for example, distinguishing between when a gamer is issuing commands to the computer and when the same user is conversing with another player.

Microsoft is wise to focus on games initially, says Norbert Hildebrand, business development manager for Insight Media, a marketing research firm that covers emerging display technologies. With 3-D technologies, providing enough content is a huge issue today, he explains. Games are already created in 3-D and then rendered to work on a 2-D screen, which makes it easy to convert them for 3-D displays and other types of interfaces.

For other types of 3-D interaction, such as shopping or advertising, Hildebrand says content creators will need to be persuaded to invest the necessary money and resources. As far as Mundie’s vision of 3-D shopping, Hildebrand says, “at this point, it’s marketing talk only.” He points out that today’s 3-D displays don’t display text well, so marketers would have to come up with a hybrid approach to display both product images and information.

For now, Hildebrand believes that the average person views 3-D technology as something used on special occasions, not as a day-to-day technology. Some people have interpreted sales figures for 3-D-enabled televisions as a sign that consumers are adopting the technology, he adds, but these can be misleading, since most high-end televisions today have 3-D capabilities. It’s much harder to determine whether people are actually using 3-D and how often, he says.

3-D is on the way, Hildebrand says, but before Mundie’s vision of day-to-day 3-D becomes viable, “a lot of things have to come together.” This includes more 3-D content, better bandwidth for delivering it to users, faster processors to render it, and particularly, he believes, the next generation of display technology–one that doesn’t require special glasses.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.