Hackers Take the Kinect to New Levels

But the Holy Grail—controlling a computer without touching it—proves hard to achieve.

Tim Carmodyarchive page

December 2, 2010

Soon after Microsoft released the Kinect gaming device, hackers found a way to pull raw data out of the system, radically expanding its potential uses. Enthusiasts have used the hardware to draw 3-D doodles in the air with hand movements, to play with virtual onscreen characters, and allow a robot to recognize gestures and map its surroundings.

**Hand waving:** A software plug-in called DepthJS makes it possible to control a Web browser using the Kinect.

But one of the biggest goals of Kinect hackers—controlling a computer with gestures—is proving difficult to achieve.

Researchers at MIT’s Media Lab have created a new Chrome Web browser extension that lets users interact with any Web page via the Kinect if the device is plugged into a computer. Their project is one test case for the promise and limitations of hacking Microsoft’s gaming peripheral for nongaming uses.

The extension, called DepthJS, uses JavaScript to translate a small number of hand gestures into commands that can be executed by the browser. For example, a rapid arm movement to the left switches between open browser windows. Opening and closing a hand quickly acts as a mouse click.

The goal isn’t really to use the Kinect as a practical means of browsing the Web. Instead, DepthJS is meant to act as the interface between a variety of Web applications and the gestures captured by Kinect.

“Getting Kinect’s events into the Web browser is all about lowering the cost of entry to exploring and creating applications using depth information,” says Doug Fritz of the Fluid Interfaces group at MIT, who worked on the project. Computer users spend most of their time in the Web browser, Fritz notes. And most computer programmers (especially Web developers) know how to use JavaScript. This makes it an easy point of entry for Kinect programming.

One trouble is that unlike using a mouse, keyboard, or touch screen, there is no widely recognized (or naturally intuitive) vocabulary for gestural computing. Microsoft has developed a small number of gestures to let Kinect users navigate menus and browse media on the Xbox.

“Most of us hadn’t even used a Kinect with the Xbox before we started working, so we weren’t really burdened by the gesture language Microsoft has developed,” says Fritz. The team was inspired by the iPhone’s multitouch gestures and work by 3-D computing pioneer John Underkoffler. Surprisingly, some of the gestures created for DepthJS are similar to those Microsoft came up with. “Right now we are in that state of rapid change where people are remixing familiar interaction techniques with what feels natural,” Fritz said.

Limor Fried and Phillip Torrone from Adafruit Industries, a company that supplies equipment to hardware hackers, helped kick off the race to hack the Kinect by putting out a bounty of $3,000 for software that could connect the device to a regular computer.

Both are excited about the future of the Kinect as an off-the-shelf sensor for everything from high-end robotics to art projects. Developers have created a steady stream of videos of different applications using the Kinect. “These videos are really just proof-of-concepts that show some of the possibilities for further development,” says Fried.

One of the most popular videos is of a 3-D interactive puppet. “It’s fun, it’s intuitive, and it’s something that would be really hard to do without this inexpensive, off-the-shelf component. As you bring down the barriers, people have room to get creative.”

MIT’s Fritz is quick to note that three-dimensional, natural user interface computing using gestural recognition and depth sensors has been in play in the research community for years. The Kinect is a breakthrough device in terms of packaging and implementing these technologies for consumers. The more familiar users become with it, the more likely they are to translate it to spheres beyond gaming.

“The keyboard and the mouse aren’t going anywhere, but there is a lot of space for something more, and I think people are ready for that,” Fritz says.

But any effort to translate gestures to the screen inevitably bumps into the fact that we’re still three-dimensional beings trying to interact with a two-dimensional world. Most Kinect games solve this problem by matching us with an onscreen avatar who imitates our movements. Whether we’re dancing, playing volleyball, or whitewater rafting, the characters on the screen perform a stylized version of our movements offscreen.

One solution could be to use light projectors to create virtual objects in real space that we can interact with. Microsoft Research has already taken steps in this direction with Mobile Surface, a projector-based multitouch environment.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.