Mobile Web Searches Using Pictures

A new Microsoft application lets people search the Internet on their cell phones using a camera instead of a keypad.

Kate Greenearchive page

March 13, 2007

Searching for information on your cell phone by typing keywords can be cumbersome. But now researchers at Microsoft have developed a software prototype called Lincoln that they hope will make Web searches easier. According to Larry Zitnick, a Microsoft researcher who works on the project, phones equipped with the software could, for example, access online movie reviews by snapping pictures of movie posters or DVD covers and get product information from pictures of advertisements in magazines or on buses.

**Site seeing**: Using Microsoft’s new image-based Web-search software, a person can take a picture of a magazine, such as Technology Review, with a cell-phone camera and be directed to a website.

“The main thing we want to do is connect real-world objects with the Web using pictures,” says Zitnick. “[Lincoln] is a way of finding information on the Web using images instead of keywords.”

The software works by matching pictures taken on phones with pretagged pictures in a database. It provides the best results when the pictures are of two-dimensional objects, such as magazine ads or DVD covers, Zitnick says. (See the accompanying chart to find out how compatible certain pictures are with Lincoln.) Currently, the database contains pictures of DVD covers that link to movie reviews uploaded by Microsoft researchers. However, anyone can contribute his or her pictures and links to the database, and Zitnick hopes that people will fill it with pictures and links to anything from information about graffiti art to scavenger-hunt clues. Right now, Lincoln can only be downloaded for free using Internet Explorer 6 and 7, and it can only run on smart phones equipped with Windows Mobile 5.0 and PocketPCs.

Lincoln is part of a trend to link the physical world with information on the Web, often with the help of cell-phone cameras. Nokia researchers are developing software and hardware that automatically hyperlinks buildings, storefronts, and certain people via a cell-phone camera. (See “Hyperlinking Reality via Phones.”) And a handful of companies, including Mobot, based in Lexington, MA, are exploring the marketing capabilities of such technology by connecting pictures of real-world advertisements and company logos to the Web.

Multimedia

View a chart of available images for Microsoft's new software.

According to Zitnick, there are two elements that distinguish his technology from others. First is the fact that anyone can contribute images, links, and comments to the database. The second element is the type of image-recognition system that Microsoft researchers have developed, which Zitnick believes will be able to search through millions of images quickly.

At the heart of the image-recognition engine is an algorithm that analyzes a picture and creates a signature that describes the picture succinctly, using a small amount of data. This signature consists of information that describes the relative position of the pixels and the intensity of a certain feature in a picture, such as the Mona Lisa’s smile. In order to make this information easily searchable, data triplets are created from groups of three features. For instance, a triplet might contain information about a close-up of the Mona Lisa’s smile, cheek, and nose.

When a picture is taken, the algorithm quickly establishes these data sets and compares them with established sets for the pictures already in the database. Microsoft’s approach makes searching through large databases more efficient than other methods that compare a large number of individual features one by one, says Zitnick. Microsoft’s engine only has to search for these triplets of data, he says, because the odds of there being many images with the same three data sets are small. “It narrows it down pretty quickly,” he says.

Currently, the whole process, from uploading a picture to accessing a link, takes about 10 seconds, Zitnick says. Matching the actual picture in the database takes about a second, but uploading an image to the server and downloading the Web page takes about four to five seconds, depending on the wireless connection, he says. Presently, the search engine has been tested on databases containing 30,000 images, but by using this image-matching approach, Zitnick expects that the system could handle searching through millions of pictures without slowing down.

As Lincoln is downloaded by more people, and they add pictures and links, the application will become more useful. However, at this early stage, it’s unclear whether or not the user experience is good enough to attract people to the software in the first place. “The question is whether it’s above or below the threshold of what the user will take,” says Luis von Ahn, professor of computer science at Carnegie Mellon University, in Pittsburgh. “It’s always possible to do [image recognition] that’s better than nothing, but it’s hard to do something that’s perfect.”

Still, he believes that the “application is really cool,” and that importantly, the research leverages user-generated information to make the Web links relevant to a given picture. In his own work, von Ahn has developed computer games for people to play that train computers to recognize objects in pictures. The technology is now the basis for the Google Image Labeler, which consists of a game that helps Google serve up more-accurate picture results for keyword searches. Microsoft’s approach of having a computer match images in a completely automated way is aiming for the gold standard of computer vision, says von Ahn, but he believes that there are decades of work to be done before the standard will be achieved.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.