Kinect Project Merges Real and Virtual Worlds

New software turns the Kinect into a cheap 3-D scanner—opening up applications ranging from crime fighting to interior design.

Nic Flemingarchive page

September 30, 2011

Microsoft’s Kinect Xbox controller, which lets gamers control on-screen action with their body movements, has been adapted in hundreds of interesting, useful, and occasionally bizarre ways since its release in November 2010. It’s been used for robotic vision and automated home lighting. It’s helped wheelchair users with their shopping. Yet these uses could look like child’s play compared to the new 3-D modeling capabilities Microsoft has developed for the Kinect.

KinectFusion, a research project that lets users generate high-quality 3-D models in real time using a standard $100 Kinect, was the star of the show at Microsoft Research’s 20th anniversary event held this week at its European headquarters in Cambridge, U.K. KinectFusion also includes a realistic physics engine that allows scanned objects to be manipulated in realistic ways.

The technology allows objects, people, and entire rooms to be scanned in 3-D at a fraction of the normal cost. Imagine true-to-life avatars and objects being imported into virtual environments. Or a crime scene that can be re-created within seconds. Visualizing a new sofa in your living room and other virtual interior design tricks could become remarkably simple.

“KinectFusion is a platform that allows us to rethink the ways that computers see the world,” says project leader Shahram Izadi. “We have outlined some ways it could be used, but I expect there are a whole host of future applications waiting to be discovered.”

3-D scanners already exist, but none of them approach KinectFusion in ease of use and speed, and even desktop versions cost around $3,000.

“In the same way that products like Microsoft Office democratized the creation of 2-D documents, with KinectFusion anyone can create 3-D content just by picking up a Kinect and scanning something in,” says team member Steve Hodges.

The first public unveiling of KinectFusion at the SIGGRAPH conference in Vancouver in August triggered huge excitement. Details of how it works will be revealed in papers presented next month at the UIST Symposium in Santa Barbara, California, and ISMAR in Basel, Switzerland.

The Kinect projects a laser dot pattern into a scene and looks for distortions using an infrared camera, a technique called structured light depth sensing. This generates a “point cloud” of distances to the camera that the Kinect uses to perceive and identify objects and gestures in real time.

A KinectFusion user waves a Kinect around a scene or object. An algorithm called iterative closest point (ICP) is used to merge data from the snapshots being taken at 30 frames per second into an ever-more-detailed 3-D representation. ICP is also used to track the position and orientation of the camera by comparing new frame data with previous frames and the composite merged representation. The team describes the use of a standard computer graphics processing unit for both camera tracking and image generation as a major innovation.

While KinectFusion is generating a buzz, it is still an ongoing research project. Microsoft has not disclosed plans to release any products using the technology, or versions of the software that power the system.

“It’s just stunning,” says Christian Holz, of the Hasso Plattner Institute at the University of Potsdam, in Germany, who previously worked on a project that used Kinect at Microsoft Research in Redmond, Washington. “It’s going to make 3-D creation available to a much wider range of people. The fact that it can not only model the real-world environment in mind-blowing fidelity, but also use the model to simulate realistic physics on top of that, opens up the possibility of a vast number of applications.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.