A Face-Finding Search Engine

A new approach to face recognition is better at handling low-resolution video.

Kate Greenearchive page

September 17, 2008

Today there are more low-quality video cameras–surveillance and traffic cameras, cell-phone cameras and webcams–than ever before. But modern search engines can’t identify objects very reliably in clear, static pictures, much less in grainy YouTube clips. A new software approach from researchers at Carnegie Mellon University could make it easier to identify a person’s face in a low-resolution video. The researchers say that the software could be used to identify criminals or missing persons, or it could be integrated into next-generation video search engines.

**Fuzzy faces**: A new face-recognition system from researchers at Carnegie Mellon works even on low-resolution images.

Today’s face-recognition systems actually work quite well, says Pablo Hennings-Yeomans, a researcher at Carnegie Mellon who developed the system–when, that is, researchers can control the lighting, angle of the face, and type of camera used. “The new science of face recognition is dealing with unconstrained environments,” he says. “Our work, in particular, focuses on the problem of resolution.”

In order for a face-recognition system to identify a person, explains Hennings-Yeomans, it must first be trained on a database of faces. For each face, the system uses a so-called feature-extraction algorithm to discern patterns in the arrangement of image pixels; as it’s trained, it learns to associate some of those patterns with physical traits: eyes that slant down, for instance, or a prominent chin.

The problem, says Hennings-Yeomans, is that existing face-recognition systems can identify faces only in pictures with the same resolution as those with which the systems were trained. This gives researchers two choices if they want to identify low-resolution pictures: they can either train their systems using low-resolution images, which yields poor results in the long run, or they can add pixels, or resolution, to the images to be identified.

The latter approach, which is achieved by using so-called super-resolution algorithms, is common, but its results are mixed, says Hennings-Yeomans. A super-resolution algorithm makes assumptions about the shape of objects in an image and uses them to sharpen object boundaries. While the results may look impressive to the human eye, they don’t accord well with the types of patterns that face-recognition systems are trained to look for. “Super-resolution will give you an interpolated image that looks better,” says Hennings-Yeomans, “but it will have distortions like noise or artificial [features].”

**Make me a match**: The “probe images” along the top row are used to query a database of stored “gallery images,” much like keywords entered into a Web search engine. When faces match, as they do along the diagonal, the resulting composite image has smooth features. Blurred features indicate a mismatch.

Together with B. Vijaya Kumar, a professor of electrical and computer engineering at Carnegie Mellon, and Simon Baker of Microsoft Research, Hennings-Yeomans has tested an approach that improves upon face-recognition systems that use standard super-resolution. Instead of applying super-resolution algorithms to an image and running the results through a face-recognition system, the researchers designed software that combines aspects of a super-resolution algorithm and the feature-extraction algorithm of a face-recognition system. To find a match for an image, the system first feeds it through this intermediary algorithm, which doesn’t reconstruct an image that looks better to the human eye, as super-resolution algorithms do. Instead, it extracts features that are specifically readable by the face-recognition system. In this way, it avoids the distortions characteristic of super-resolution algorithms used alone.

In prior work, the researchers showed that the intermediary algorithm improved face-matching results when finding matches for a single picture. In a paper being presented at the IEEE International Conference on Biometrics: Theory, Systems, and Applications later this month, the researchers show that the system works even better, in some cases, when multiple images or frames, even from different cameras, are used.

The approach shows promise, says Pawan Sinha, a professor of brain and cognitive sciences at MIT. The problem of low-resolution images and video “is undoubtedly important and has not been adequately tackled by any of the commercial face-recognition systems that I know of,” he says. “Overall, I like the work.”

Ultimately, says Hennings-Yeomans, super-resolution algorithms still need to be improved, but he doesn’t think it would take too much work to apply his group’s approach to, say, a Web tool that searches YouTube videos. “You’re going to see face-recognition systems for image retrieval,” he says. “You’ll Google not by using text queries, but by giving an image.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.