Microsoft Demos Augmented Vision

Better computer-vision algorithms overlay digital information on the real world.

Kate Greenearchive page

February 24, 2009

Today, Microsoft researchers will demonstrate software that can, in real time, superimpose computer-generated information on top of a digitized view of the real world.

**Altered vision**: This laptop is running augmented-reality software developed by Microsoft engineers. It can recognize a person’s location using the built-in camera. In this demonstration, virtual bubbles lead to a virtual pot of gold.

Adding additional visual data to a video display is a technique known as augmented reality. Michael Cohen, principal researcher at Microsoft Research, in Redmond, WA, says that the approach could add another dimension to future smart phones. “You could be out on the street, hold the device up, and it could recognize a restaurant and deliver ratings and the menu,” he says. A smart phone featuring an augmented-reality display could also overlay a bus route and an estimate of when the next bus is due on top of a particular street. “It essentially becomes your portal to information,” Cohen says.

Cohen and his colleagues will demo the augmented-reality technology at TechFest, an annual showcase of Microsoft’s research projects, in Redmond. Their software, which runs on a small portable computer, analyzes scenes from a camera, matches to those stored in a database, and overlays supplementary information on the display. The researchers note that a smart phone with augmented reality could help allow engineers to “see” the pipes or electrical cables below a street. In the demonstration given at TechFest, the software will be used to lead people on a treasure hunt to a hidden prize of a (virtual) pot of gold.

Augmented reality has been an active area of research for more than a decade, although it has often required a head-mounted display and a backpack’s worth of computing equipment. In recent years, cell phones and portable computers with cameras and other sensors have become powerful enough to handle the computational workload needed to run an augmented-reality system. Researchers at Nokia and Columbia University, for instance, are also developing augmented-reality systems, and a Japanese startup called Tonchidot hopes to turn the concept into a product.

Most augmented-reality systems must be able to orient themselves accurately in order to function reliably. Some locate their position using GPS or by triangulating several Wi-Fi signals, and determine which way they are pointing using an accelerometer and a digital compass. Microsoft’s augmented-reality device focuses on being able to recognize objects within a scene, using sophisticated computer-vision algorithms. Since the demonstration is carried out in the controlled environment of the conference hall, the researchers are not using location-detecting sensors; instead, they’re relying solely on computer vision.

Recognizing elements of a scene regardless of the angle or lighting is a significant challenge. Cohen and his colleague Simon Winder, a senior research engineer at Microsoft, have developed algorithms that perform this task frame by frame, for a video feed in real time. The algorithm instantly matches frames to previously analyzed images stored in a database. In developing the algorithm, the researchers determined the best parameters or characteristics to help the system match each scene. Cohen explains that they used machine learning to quickly test different parameters and determine the ones that will provide the best matches.

For today’s demo, Cohen’s team took pictures of the conference hall in which TechFest is being held. The photos were analyzed using the computer-vision software, and the key features were stored in a database on a laptop computer that employs a built-in video camera to capture a scene.

“In about a tenth or a fifteenth of a second, the software is able to recognize a scene and look it up in a database,” says Cohen. For the treasure-hunt game demoed during TechFest, the software displays a trail of bubbles that point to the direction in which the user should walk to find the prize.

Since it is just a research project, Cohen stresses that there is still plenty of room for improvement. For one thing, the parameters used to identify physical features of objects could be refined to make matching even more accurate, he says.

Another challenge to consider is how this kind of system would work in a less controlled environment, says Kari Pulli, a research fellow at Nokia. “The most common augmented-reality application is to use it as a museum guide,” he says. “That’s easy to do because the environment is fixed.” The challenge is to make sure that such systems can work in an unfamiliar context, like a city street. But Pulli believes that this could become possible thanks to databases owned by Microsoft, Google, and Navteq that contain images of street views.

Cohen says he’s optimistic that the computer-vision algorithms developed by his team could have myriad uses–from augmented-reality systems to gaming and robotics–but he doesn’t foresee them being used in a specific Microsoft product anytime soon.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.