Skip to Content
Artificial intelligence

This algorithm automatically spots “face swaps” in videos

But the same system can be used to make better fake videos that are harder to detect.

The ability to take one person’s face or expression and superimpose it onto a video of another person has recently become possible. In particular, pornographic videos called “deepfakes” have emerged on websites such as Reddit and 4Chan showing famous individuals’ faces superimposed onto the bodies of actors.

This phenomenon has significant implications. At the very least, it has the potential to undermine the reputation of people who are victims of this kind of forgery. It poses problems for biometric ID systems. And it threatens to undermine public trust in videos of any kind.

So a quick and accurate way to spot these videos is desperately needed.

Which of these pairs of images are forgeries? Answer below.

Enter Andreas Rossler at the Technical University of Munich in Germany and colleagues, who have developed a deep-learning system that can automatically spot face-swap videos. The new technique could help identify forged videos as they are posted to the web.

But the work also has sting in the tail. The same deep-learning technique that can spot face-swap videos can also be used to improve the quality of face swaps in the first place—and that could make them harder to detect.

The new technique relies on a deep-learning algorithm that Rossler and co have trained to spot face swaps. These algorithms can only learn from huge annotated data sets of good examples, which simply have not existed until now.

So the team began by creating a large data set of face-swap videos and their originals. They use two types of face swaps that can be easily made using software called Face2Face. (This software was created by some members of this team.)

The first type of face swap superimposes one person’s face on another’s body so that it takes on their expressions. The second takes the expressions from one face and modifies a second face to show them.

The team have done this with over 1,000 videos, creating a database of about half a million images in which the faces have been manipulated with state-of-the-art face-editing software. They called this the FaceForensics database.

The size of this database is a significant improvement over what had been previously available. “We introduce a novel data set of manipulated videos that exceeds all existing publicly available forensic data sets by orders of magnitude,” says Rossler and co.

Next, the team uses the database to train a deep-learning algorithm to recognize the difference between face swaps and their unadulterated originals. They call the resulting algorithm XceptionNet.

Finally, they compare the new approach to other forgery detection techniques.

The results are impressive. XceptionNet clearly outperforms other techniques in spotting videos that have been manipulated, even when the videos have been compressed, which makes the task significantly harder. “We set a strong baseline of results for detecting a facial manipulation with modern deep-learning architectures,” say Rossler and co.

That should make it easier to spot forged videos as they are uploaded to the web. But the team is well aware of the cat-and-mouse nature of forgery detection: as soon as a new detection technique emerges, the race begins to find a way to fool it.

Rossler and co have a natural head start since they developed XceptionNet. So they use it to spot the telltale signs that a video has been manipulated and then use this information to refine the forgery, making it even harder to detect.

It turns out that this process improves the visual quality of the forgery but does not have much effect on XceptionNet’s ability to detect it. “Our refiner mainly improves visual quality, but it only slightly encumbers forgery detection for deep-learning method trained exactly on the forged output data,” they say.

That’s interesting work since it introduces an entirely new way of improving the process of image manipulation. “We believe that this interplay between tampering and detection is an extremely exciting avenue for follow-up work,” they say.

Ref: arxiv.org/abs/1803.09179 : FaceForensics: A Large-scale Video Data Set for Forgery Detection in Human Faces

Answer: The upper image in each pair is real.

 

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.