Recognizing elements of a scene regardless of the angle or lighting is a significant challenge. Cohen and his colleague Simon Winder, a senior research engineer at Microsoft, have developed algorithms that perform this task frame by frame, for a video feed in real time. The algorithm instantly matches frames to previously analyzed images stored in a database. In developing the algorithm, the researchers determined the best parameters or characteristics to help the system match each scene. Cohen explains that they used machine learning to quickly test different parameters and determine the ones that will provide the best matches.
For today’s demo, Cohen’s team took pictures of the conference hall in which TechFest is being held. The photos were analyzed using the computer-vision software, and the key features were stored in a database on a laptop computer that employs a built-in video camera to capture a scene.
“In about a tenth or a fifteenth of a second, the software is able to recognize a scene and look it up in a database,” says Cohen. For the treasure-hunt game demoed during TechFest, the software displays a trail of bubbles that point to the direction in which the user should walk to find the prize.
Since it is just a research project, Cohen stresses that there is still plenty of room for improvement. For one thing, the parameters used to identify physical features of objects could be refined to make matching even more accurate, he says.
Another challenge to consider is how this kind of system would work in a less controlled environment, says Kari Pulli, a research fellow at Nokia. “The most common augmented-reality application is to use it as a museum guide,” he says. “That’s easy to do because the environment is fixed.” The challenge is to make sure that such systems can work in an unfamiliar context, like a city street. But Pulli believes that this could become possible thanks to databases owned by Microsoft, Google, and Navteq that contain images of street views.
Cohen says he’s optimistic that the computer-vision algorithms developed by his team could have myriad uses–from augmented-reality systems to gaming and robotics–but he doesn’t foresee them being used in a specific Microsoft product anytime soon.