The researchers' system gained its expertise by being exposed to thousands of pictures that included objects such as mountains, flowers, people, water, and tigers, as well as the semantic tags that corresponded to the objects. Then the researchers tested how well the system performed by exposing it to new pictures that included objects that weren't yet labeled. When compared with a human's description of a scene, the system did well: a picture of a tiger in tall grass prompted the system to find "cat," "tiger," "plants," "leaf," and "grass." A human-made caption included "cat," "tiger," "forest," and "grass." And when the researchers compared their system's tags with more typical content-based approaches, they found that it did better by about 40 percent. In other words, it produced fewer words that were not applicable to the image.
Larry Zitnick, an image-search researcher at Microsoft, says that the research is pushing the limits of content-based search to see how well it can work. "What they're doing is analyzing how far we can go based on [searching an image for objects], and that's really good as far as pushing the envelope." He also suspects that the approach could work well for large sets of images, such as those on the Internet.
Zitnick adds that the UCSD results could be great for certain types of simple object searches in pictures. However, it would not work for other searches, such as distinguishing the U.S. capitol building from the state capitol building in Lincoln, NE. "Visual problems are very difficult, and I don't think any one solution is going to solve everything," Zitnick says.
However, the researchers' approach could be useful if folded into existing search software, says Chuck Rosenberg, a Google software engineer who works on image search. If incorporated into desktop search, the approach could allow people to search for images based on the similarity of appearance. But it wouldn't necessarily help people find pictures based on more obscure concepts such as happiness. "For example," Rosenberg says, "I might want a picture of a happy family out for an evening walk to put on a card I'm making. For a computer to truly find that picture based on the content of the image alone ... is beyond current technology."
Vasconcelos of UCSD suspects that it will be more than five years before computers are able to identify more-difficult concepts, such as happiness, in pictures. But that doesn't mean current research won't be useful before then, he says. "The expectation has to be that [the technology] is more like an aid, not like an answer."
Tags
Google Internet photography search