Skip to Content

Researchers at the University of California, San Diego (UCSD), have developed a new image-search method that they claim outperforms existing approaches “by a significant margin” in terms of accuracy and efficiency. The researchers’ approach modifies a typical machine-learning method used to train computers to recognize images, says Nuno Vasconcelos, professor of electrical and computer engineering at UCSD. The result is a search engine that automatically labels pictures with the names of the objects in it, such as “radish,” “umbrella,” or “swimmer.” And because the approach uses words to label and classify parts of pictures, it lends itself nicely to typical keyword searches that people perform on the Web, says Vasconcelos.

Finding photos: A new algorithm developed at UCSD that adds word tags to images can increase image-search accuracy and efficiency. Above, features from a picture are assigned a likelihood that they belong in certain categories, such as “water” or “person.”

Currently, searching for images on the Internet using keywords can be hit-or-miss. This is because most image-based searches use metadata–text, such as a file name, date, or other basic information associated with a picture–that can be incomplete, useless for keyword searches, or absent altogether. Computer scientists have been working on better ways to identify pictures and make them searchable for more than a decade, but getting machines to go beyond metadata and determine what objects are in a picture is a tough problem to solve, and most efforts to date have only been moderately successful.

While the UCSD research doesn’t completely solve the problem, it improves performance and efficiency for a certain approach, says Vasconcelos, and it identifies some “limitations in the way people were addressing the problem.”

The approach that the researchers tackled is called “content-based,” and it involves describing objects in a picture by analyzing features such as color, texture, and lines. These objects can be represented by sets of features and then compared with the sets extracted from other pictures. Feature sets are described by their statistics, and the computer searches for statistically likely matches.

Multimedia

  • Image Search

The new research is based on this approach, but it adds an intermediate step, says Pedro Moreno, a Google research engineer who worked on the project. Moreno explains that this new step provides a “semantic label,” or a word tag that describes objects in pictures instead of relying solely on sets of numbers.

For instance, consider submitting an image of a dog on a lawn. The objects in the pictures are analyzed and compared with results for known categories of objects, such as dogs, cats, or fish. Then the computer provides a statistical analysis that gives the likelihood that a picture matches those categories. The system might score the picture with a 60 percent probability that the main object is a dog and a 20 percent probability that it is a cat or a fish. Thus, the computer deems that, in all likelihood, the picture contains an image of a dog. “The key idea is to represent images in this semantic space,” Moreno says. “This seems to improve performance significantly.”

The researchers’ system gained its expertise by being exposed to thousands of pictures that included objects such as mountains, flowers, people, water, and tigers, as well as the semantic tags that corresponded to the objects. Then the researchers tested how well the system performed by exposing it to new pictures that included objects that weren’t yet labeled. When compared with a human’s description of a scene, the system did well: a picture of a tiger in tall grass prompted the system to find “cat,” “tiger,” “plants,” “leaf,” and “grass.” A human-made caption included “cat,” “tiger,” “forest,” and “grass.” And when the researchers compared their system’s tags with more typical content-based approaches, they found that it did better by about 40 percent. In other words, it produced fewer words that were not applicable to the image.

Larry Zitnick, an image-search researcher at Microsoft, says that the research is pushing the limits of content-based search to see how well it can work. “What they’re doing is analyzing how far we can go based on [searching an image for objects], and that’s really good as far as pushing the envelope.” He also suspects that the approach could work well for large sets of images, such as those on the Internet.

Zitnick adds that the UCSD results could be great for certain types of simple object searches in pictures. However, it would not work for other searches, such as distinguishing the U.S. capitol building from the state capitol building in Lincoln, NE. “Visual problems are very difficult, and I don’t think any one solution is going to solve everything,” Zitnick says.

However, the researchers’ approach could be useful if folded into existing search software, says Chuck Rosenberg, a Google software engineer who works on image search. If incorporated into desktop search, the approach could allow people to search for images based on the similarity of appearance. But it wouldn’t necessarily help people find pictures based on more obscure concepts such as happiness. “For example,” Rosenberg says, “I might want a picture of a happy family out for an evening walk to put on a card I’m making. For a computer to truly find that picture based on the content of the image alone … is beyond current technology.”

Vasconcelos of UCSD suspects that it will be more than five years before computers are able to identify more-difficult concepts, such as happiness, in pictures. But that doesn’t mean current research won’t be useful before then, he says. “The expectation has to be that [the technology] is more like an aid, not like an answer.”

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.