Deep Neural Nets Can Now Recognize Your Face in Thermal Images

Matching an infrared image of a face to its visible light counterpart is a difficult task, but one that deep neural networks are now coming to grips with.

Emerging Technology from the arXivarchive page

July 24, 2015

One problem with infrared surveillance videos or infrared CCTV images is that it is hard to recognize the people in them. Faces look different in the infrared and matching these images to their normal appearance is a significant unsolved challenge.

The problem is that the link between the way people look in infrared and visible light is highly nonlinear. This is particularly tricky for footage taken in the mid- and far-infrared, which tends to use passive sensors that detect emitted light rather than the reflected variety.

Today, Saquib Sarfraz and Rainer Stiefelhagen at the Karlsruhe Institute of Technology in Germany say they’ve worked out how to connect a mid- or far-infrared image of a face with its visible light counterpart for the first time. The trick they’ve perfected is to teach a neural network to do all the work.

The way a face emits infrared light is entirely different from the way it reflects it. These emissions vary according to the temperature of the air and the temperature of the skin, which in turn depends on the person’s activity levels, whether he or she has a fever and so on.

There’s another problem that makes comparisons difficult. Visible light images tend to have a high resolution while far infrared pictures tend to have a much lower resolution because of the nature of the cameras that take them. Together, these factors make it hard to match an infrared face with its visible light counterpart.

But the recent improvements in deep neural networks in tackling all kinds of complex problems gave Sarfraz and Stiefelhagen an idea. Why not train a network to recognize visible light faces by looking at infrared versions?

There are two important factors that have combined in recent years to make neural networks much more powerful. The first is a better understanding of how to build and tweak the networks to perform their task, a technique that has led to the creation of so-called deep neural nets. That’s something Sarfraz and Stiefelhagen could learn from other work.

The second is the availability of huge annotated datasets that can be used to train these networks. For example, accurate automated face recognition has only become possible because of the creation of vast banks of images in which people’s faces have been isolated and identified by human observers thanks to crowdsourcing services such as Amazon’s Mechanical Turk.

These data sets are much harder to come by for infrared/visible light comparisons. However, Sarfraz and Stiefelhagen found one they thought could do the trick. This was created at the University of Notre Dame and consists of 4,585 images of 82 people taken either in visible light at a resolution of 1600 x 1200 pixels or in the far infrared at 312 x 239 pixels.

The data set contains images of people smiling, laughing and with a neutral expression taken in different sessions to capture the way people’s appearance changes from day to day, and in two different lighting conditions.

They then divided each image into a set of overlapping patches, 20 x 20 pixels in size, to dramatically increase the size of the database.

Finally, Sarfraz and Stiefelhagen used the images of the first 41 people to train their neural net and the images of the other 41 people to test it.

The results make for interesting reading. “The presented approach improves the state-of-the-art by more than 10 percent,” say Sarfraz and Stiefelhagen.

What’s more, the net can match a thermal image to its visible counterpart in just 35 milliseconds. “This is therefore, very fast and capable of running in real-time at ∼ 28 fps,” they say.

It is by no means perfect, however. At best, its accuracy is just over 80 percent when it has a wide range of visible light images to compare the thermal image against. The one-to-one comparison accuracy is just 55 percent, however.

Better accuracy is clearly possible with bigger datasets and a more powerful network. Of these, the creation of a data set that is bigger by orders of magnitude will be by far the harder of the two tasks.

But it’s not difficult to imagine such a database being created relatively quickly, given that interested customers are likely to be the military, law enforcement agencies and governments who generally have deeper pockets when it comes to security-related technology.

Ref: arxiv.org/abs/1507.02879 : Deep Perceptual Mapping for Thermal to Visible Face Recognition

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.