In 2019, two multimedia artists, Francesca Panetta and Halsey Burgund, set about to pursue a provocative idea. Deepfake video and audio had been advancing in parallel but had yet to be integrated into a complete experience. Could they do it in a way that demonstrated the technology’s full potential while educating people about how it could be abused?
To bring the experiment to life, they chose an equally provocative subject: they would create an alternative history of the 1969 Apollo moon landing. Before the launch, US president Richard Nixon’s speechwriters had prepared two versions of his national address—one designated “In Event of Moon Disaster,” in case things didn’t go as planned. The real Nixon, fortunately, never had to deliver it. But a deepfake Nixon could.
So Panetta, the creative director at MIT’s Center for Virtuality, and Burgund, a fellow at the MIT Open Documentary Lab, partnered up with two AI companies. Canny AI would handle the deepfake video, and Respeecher would prepare the deepfake audio. With all the technical components in place, they just needed one last thing: an actor who would supply the performance.
“We needed to find somebody who was willing to do this, because it’s a little bit of a weird ask,” Burgund says. “Somebody who was more flexible in their thinking about what an actor is and does.”
While deepfakes have now been around for a number of years, deepfake casting and acting are relatively new. Early deepfake technologies weren’t very good, used primarily in dark corners of the internet to swap celebrities into porn videos without their consent. But as deepfakes have grown increasingly realistic, more and more artists and filmmakers have begun using them in broadcast-quality productions and TV ads. This means hiring real actors for one aspect of the performance or another. Some jobs require an actor to provide “base” footage; others need a voice.
For actors, it opens up exciting creative and professional possibilities. But it also raises a host of ethical questions. “This is so new that there’s no real process or anything like that,” Burgund says. “I mean, we were just sort of making things up and flailing about.”
“Want to become Nixon?”
The first thing Panetta and Burgund did was ask both companies what kind of actor they needed to make the deepfakes work. “It was interesting not only what were the important criteria but also what weren’t,” Burgund says.
For the visuals, Canny AI specializes in video dialogue replacement, which uses an actor’s mouth movements to manipulate someone else’s mouth in existing footage. The actor, in other words, serves as a puppeteer, never to be seen in the final product. The person’s appearance, gender, age, and ethnicity don’t really matter.
But for the audio, Respeecher, which transmutes one voice into another, said it’d be easier to work with an actor who had a similar register and accent to Nixon’s. Armed with that knowledge, Panetta and Burgund began posting on various acting forums and emailing local acting groups. Their pitch: “Want to become Nixon?”
This is how Lewis D. Wheeler, a Boston-based white male actor, found himself holed up in a studio for days listening to and repeating snippets of Nixon’s audio. There were hundreds of snippets, each only a few seconds long, “some of which weren’t even complete words,” he says.
The snippets had been taken from various Nixon speeches, much of it from his resignation. Given the grave nature of the moon disaster speech, Respeecher needed training materials that captured the same somber tone.
Wheeler’s job was to re-record each snippet in his own voice, matching the exact rhythm and intonation. These little bits were then fed into Respeecher’s algorithm to map his voice to Nixon’s. “It was pretty exhausting and pretty painstaking,” he says, “but really interesting, too, building it brick by brick.”
The visual part of the deepfake was much more straightforward. In the archival footage that would be manipulated, Nixon had delivered the real moon landing address squarely facing the camera. Wheeler needed only to deliver its alternate, start to finish, in the same way, for the production crew to capture his mouth movements at the right angle.
This is where, as an actor, he started to find things more familiar. Ultimately his performance would be the one part of him that would make it into the final deepfake. “That was the most challenging and most rewarding,” he says. “For that, I had to really get into the mindset of, okay, what is this speech about? How do you tell the American people that this tragedy has happened?”
“How do we feel?”
On the face of it, Zach Math, a film producer and director, was working on a similar project. He’d been hired by Mischief USA, a creative agency, to direct a pair of ads for a voting rights campaign. The ads would feature deepfaked versions of North Korean leader Kim Jong-un and Russian president Vladimir Putin. But he ended up in the middle of something very different from Panetta and Burgund’s experiment.
In consultation with a deepfake artist, John Lee, the team had chosen to go the face-swapping route with the open-source software DeepFaceLab. It meant the final ad would include the actors’ bodies, so they needed to cast believable body doubles.
The ad would also include the actors’ real voices, adding an additional casting consideration. The team wanted the deepfake leaders to speak in English, though with authentic North Korean and Russian accents. So the casting director went hunting for male actors who resembled each leader in build and facial structure, matched their ethnicity, and could do convincing voice impersonations.
For Putin, the casting process was relatively easy. There’s an abundance of available footage of Putin delivering various speeches, providing the algorithm with plenty of training data to deepfake his face making a range of expressions. Consequently, there was more flexibility in what the actor could look like, because the deepfake could do most of the work.
But for Kim, most of the videos available showed him wearing glasses, which obscured his face and caused the algorithm to break down. Narrowing the training footage to only the videos without glasses left far fewer training samples to learn from. The resulting deepfake still looked like Kim, but his face movements looked less natural. Face-swapped onto an actor, it muted the actor’s expressions.
To counteract that, the team began running all of the actors’ casting tapes through DeepFaceLab to see which one came out looking the most convincing. To their surprise, the winner looked least like Kim physically but had the most expressive performance.
To address the aspects of Kim’s appearance that the deepfake couldn’t replicate, the team relied on makeup, costumes, and post-production work. The actor was slimmer than Kim, for example, so they had him wear a fat suit.
When it came down to judging the quality of the deepfake, Math says, it was less about the visual details and more about the experience. “It was never ‘Does that ear look weird?’ I mean, there were those discussions,” he says. “But it was always like, ‘Sit back—how do we feel?’”
“They were effectively acting as a human shield”
In some ways, there’s little difference between deepfake acting and CGI acting, or perhaps voice acting for a cartoon. Your likeness doesn’t make it into the final production, but the result still has your signature and interpretation. But deepfake casting can also go the other direction, with an person’s face swapped into someone else’s performance.
Making this type of fake persuasive was the task of Ryan Laney, a visual effects artist who worked on the 2020 HBO documentary Welcome to Chechnya. The film follows activists who risk their lives to fight the persecution of LGBTQ individuals in the Russian republic. Many of them live in secrecy for fear of torture and execution.
In order to tell their stories, director David France promised to protect their identities, but he wanted to do so without losing their humanity. After testing out numerous solutions, his team finally landed on deepfakes. He partnered with Laney, who developed an algorithm that overlaid one face onto another while retaining the latter’s expressions.
The casting process was thus a search not for performers but for 23 people who would be willing to lend their faces. France ultimately asked LGBTQ activists to volunteer as “covers.” “He came at it from not who is the best actor, but who are the people interested in the cause,” Laney says, “because they were effectively acting as a human shield.”
The team scouted the activists through events and Instagram posts, based on their appearance. Each cover face needed to look sufficiently different from the person being masked while also aligning in certain characteristics. Facial hair, jawlines, and nose length needed to roughly match, for example, and each pair had to be approximately the same age for the cover person’s face to look natural on the original subject’s body.
The team didn’t always match ethnicity or gender, however. The lead character, Maxim Lapunov, who is white, was shielded by a Latino activist, and a female character was shielded by an activist who is gender nonconforming.
Throughout the process, France and Laney made sure to get fully informed consent from all parties. “The subjects of the film actually got to look at the work before David released it,” Laney says. “Everybody got to sign off on their own cover to make sure they felt comfortable.”
“It just gets people thinking”
While professionalized deepfakes have pushed the boundaries of art and creativity, their existence also raises tricky ethical questions. There are currently no real guidelines on how to label deepfakes, for example, or where the line falls between satire and misinformation.
For now, artists and filmmakers rely on a personal sense of right and wrong. France and Laney, for example, added a disclaimer to the start of the documentary stating that some characters had been “digitally disguised” for their protection. They also added soft edges to the masked individuals to differentiate them. “We didn’t want to hide somebody without telling the audience,” Laney says.
Stephanie Lepp, an artist and producer who creates deepfakes for political commentary, similarly marks her videos upfront to make clear they are fake. In her series Deep Reckonings, which imagines powerful figures like Mark Zuckerberg apologizing for their actions, she also used voice actors rather than deepfake audio to further distinguish the project as satirical and not deceptive.
Other projects have been more coy, such as those of Barnaby Francis, an artist-activist who works under the pseudonym Bill Posters. Over the years, Francis has deepfaked politicians like Boris Johnson and celebrities like Kim Kardashian, all in the name of education and satire. Some of the videos, however, are only labeled externally—for example, in the caption when Francis posts them on Instagram. Pulled out of that context, they risk blurring art and reality, which has sometimes led him into dicey territory.
There are also few rules around whose images and speech can be manipulated—and few protections for actors behind the scenes. Thus far, most professionalized deepfakes have been based on famous people and made with clear, constructive goals, so they are legally protected in the US under satire laws. In the case of Mischief’s Putin and Kim deepfakes, however, the actors have remained anonymous for “personal security reasons,” the team said, because of the controversial nature of manipulating the images of dictators.
Knowing how amateur deepfakes have been used to abuse, manipulate, and harass women, some creators are also worried about the direction things could go. “There’s a lot of people getting onto the bandwagon who are not really ethically or morally bothered about who their clients are, where this may appear, and in what form,” Francis says.
Despite these tough questions, however, many artists and filmmakers firmly believe deepfakes should be here to stay. Used ethically, the technology expands the possibilities of art and critique, provocation and persuasion. “It just gets people thinking,” Francis says. “It’s the perfect art form for these kinds of absurdist, almost surrealist times that we’re experiencing.”
Your daily newsletter about what’s up in emerging technology from MIT Technology Review.