Gesturing at Your TV Isn’t Ready for Prime Time

The Microsoft Kinect is fun, but it shows that so-called natural user interfaces still have a long way to go.

Erica Naonearchive page

November 8, 2010

The Microsoft Kinect, a sensor that works with the Xbox 360 game console, offers the first experience most people will have with a “natural” user interface. A player controls the $150 device with voice and gestures; there’s no need to hold any sort of controller or wear any special gloves or clothing. In a recent talk at MIT, Microsoft’s chief research and strategy officer, Craig Mundie, described the Kinect as a preview of what’s to come for user interfaces, suggesting that what works in gaming now will soon be used for shopping, design, and many other common computing tasks. Instead of thinking about controllers, keyboards, and other “application-specific prosthetics,” Mundie said, people could focus on the task at hand, making software much more appealing and easy to use.

**Don’t touch:** The Microsoft Kinect sensor lets people use gestures and voice commands to control video games. It reveals the problems of a gestural interface as well as its benefits.

But while using the Kinect for gaming is a fun and interesting experience, the device also illustrates that natural user interfaces have a long way to go before they could be suited to most everyday applications.

The Kinect uses both software and hardware to pick up a person’s position, gestures, and voice. To measure position, it emits an infrared beam and measures how long that light takes to bounce back from objects it encounters. Four microphones can receive voice commands, and software filters out background noise and even conversation from other people in the room.

Since all these systems need to be calibrated, setting up the Kinect takes some time. After you connect the sensor to an Xbox 360 and position it near the center line of a television, the Kinect’s motors automatically adjust its angle so that it can get a complete picture of the user.

The Kinect also needs a lot of space. It needs to be able to see the floor as a reference point for objects in the room, and the user has to stand at least six feet from the device (eight feet if two people plan to use it).

It also tests the sound levels in the room and adjusts for noise coming from the television’s speakers. If anything changes in the room—if furniture is moved, for instance, or the sound environment changes significantly—the device is thrown off and needs to be recalibrated.

All this means that as an everyday interface, the Kinect would make little practical sense. Its space requirements strain the capacity of a typical urban apartment. If Microsoft wants to make natural user interfaces accessible to everyone, it will have to consider the needs of the dorm room and the cubicle. The calibration process is also too finicky to make the Kinect useful for any critical application. Users would never tolerate needing to recalibrate in order to check e-mail.

Another problem is that the system the Kinect uses to map the positions of a user’s hands, shoulders, legs, and so forth—although surprisingly accurate—does not achieve especially fine resolution. When it interprets gestures, it sees hands, not fingers. This means that to control the device, a user must wave and make large arm motions.

So long as this is true, an interface such as the Kinect couldn’t be used at a desk. It would be tiring over an extended period of time, and it would also be slow. Compare, for example, the time it takes to scroll down three Web pages with a mouse, which requires tiny flicks of the wrists, with the time it would take to raise and lower the arm three times.

The device is unacceptably slow in other ways as well. To click a button on the screen, for example, the user has to hold a hand still for about three seconds to confirm that a click is intended. Imagine how annoying online shopping would be if there were a three-second delay every time you wanted to take a closer look at something.

These sorts of issues are probably deal killers for gestural interfaces, says Ben Bederson, an associate professor of computer science at the University of Maryland who studies human-computer interaction. “Microsoft deserves a huge amount of credit for the Kinect,” he says. But he adds that the device’s impressive technology is probably suited only to niche applications such as gaming. Even if the quality were as good as possible, he says, there are fundamental issues that he doesn’t believe can be solved: users need speed, accuracy, simplicity, and a comfortable interface that doesn’t interfere with a task or tire you out.

“When Microsoft can address all of those issues with gestures,” he says, “I’m all ears, but I’m not optimistic.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.