Gestural Interfaces

Controlling computers with our bodies

Julian Dibbell archive page

April 19, 2011

Determining depth: PrimeSense’s sensor determines depth by combining a number of techniques, including structured light, where an infrared pattern (red lines) is projected onto objects. How the pattern is distorted gives information about distances. The illustrated example here is an interactive airport information display (gray box), below is the depth sensor (blue box).

How do you issue complex commands to a computer without touching it? It’s a crucial issue now that televisions are connected to social networks and cars are fitted with computerized systems for communication, navigation, and entertainment. So Alexander Shpunt has designed a 3-D vision system that lets anyone control a computer just by gesturing in the air.

Shpunt spent five years developing the system at Tel Aviv-based PrimeSense, and Microsoft adopted the technology to power its popular Kinect controller for the Xbox 360 game console. Players can use it to direct characters with their bodies alone—no need for the wands, rings, gloves, or colored tags that previous gestural interfaces relied on to detect the user’s movements.

The key to dispensing with those props was getting the computer to see the world in three dimensions, rather than the two captured by normal cameras. Sensing depth makes it relatively easy to distinguish, say, an arm from a table in the background, and then track the arm’s movement.

Shpunt recalls that when he started developing his system there were a few ways to sense depth—primarily “time of flight” (determining distance from a sensor by measuring how long it takes light or sound to bounce off an object) and “structured light” (projecting patterns of light onto objects and analyzing how the patterns are distorted by the object’s surface). Although there was a lot of academic activity and a few companies built prototypes, there was “nothing really mature” that could be mass-produced, he says. Instead, he built his own system, cobbling together an approach that borrowed from those two techniques as well as stereoscopy—comparing images of the same scene from two different viewpoints.

The Kinect is only the beginning of what Shpunt believes will be a gestural-interface revolution. A small army of hackers, encouraged by PrimeSense, is already retooling the controller to other ends. Researchers at Louisiana State University have rigged a helmetless, gloveless virtual-reality system out of a Kinect unit and an off-the-shelf 3-D TV set. In Australia, a logistics software firm quickly put together a gesture-controlled system for monitoring air traffic. Further real-world applications are easy to imagine, says Shpunt: gaze-tracking heads-up controls for automobiles, touchless interactive displays for shopping malls and airports.

For now, Shpunt is working with computer maker Asus to build gestural controls for today’s increasingly complex and network-connected televisions—essentially turning a TV into a giant iPad that can be operated from the couch without a remote control.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.