EmTech Stage: Facebook’s CTO on misinformation

Our editor-in-chief chatted with Facebook's CTO about combatting misinformation

November 18, 2020

Misinformation and social media have become inseparable from one another; as platforms like Twitter and Facebook have grown to globe-spanning size, so too has the threat posed by the spread of false content. In the midst of a volatile election season in the US and a raging global pandemic, the power of information to alter opinions and save lives (or endanger them) is on full display. In the first of two exclusive interviews with two of the tech world’s most powerful people, Technology Review’s Editor-in-Chief Gideon Lichfield sits down with Facebook CTO Mike Schroepfer to talk about the challenges of combating false and harmful content on an online platform used by billions around the world. This conversation is from the EmTech MIT virtual conference and has been edited for length and clarity.

For more of coverage on this topic, check out this week's episode of Deep Tech and our tech policy coverage.

Credits:

This episode from EmTech was produced by Jennifer Strong and Emma Cillekens, with special thanks to Brian Bryson and Benji Rosen. We’re edited by Michael Reilly and Gideon Lichfield.

Transcript:

Strong: Hey everybody, it’s Jennifer Strong. Last week I promised to pick out something to play for you from EmTech, our newsroom’s big annual conference. So here it is. With the U-S election just days away, we're going to dive straight into one of the most contentious topics in the world of tech and beyond - misinformation.

Now a lot of this starts on conspiracy websites, but it's on social media that it gets amplified and spread. These companies are taking increasingly bold measures to ban certain kinds of fake news and extremist groups, and they’re using technology to filter out misinformation before humans can see it. They claim to be getting better and better at that, and one day they say they’ll be able to make the internet safe again for everyone. But, can they really do that?

In the next two episodes we’re going to meet the chief technology officers of Facebook and Twitter. They’ve both taken VERY different approaches when it comes to misinformation, in part because a lot of what happens on Facebook is in private groups, which makes it a harder problem to tackle. Whereas on Twitter, most everything happens in public. So, first up - Facebook. Here’s Gideon Lichfield, the editor in chief of Tech Review. He’s on the virtual mainstage of EmTech for a session that asks, ‘Can AI clean up the internet’? This conversation’s been edited for length and clarity.

Lichfield: I am going to turn to our first speaker, who is Mike Schroepfer. Known generally to all his colleagues as Schrep. He is the CTO of Facebook. He's worked at Facebook since 2008 and when it was a lot smaller and he became CTO in 2013. Last year The New York Times wrote a big profile of him, which is a very interesting read. It was titled ‘Facebook's AI whiz is now facing the task of cleaning it up. Sometimes that leads him tears. Schrep, welcome. Thank you for joining us at EmTech.

Schroepfer: Hey Gideon, thanks. Happy to be here.

Lichfield: Facebook has made some pretty aggressive moves particularly in just the last few months. You’ve taken action against QAnon, you've banned Holocaust denial, and anti-vaccination ads. But people have been warning about QAnon for years, people have been warning about anti-vaccination misinformation for years. So, why did it take you so long? What, what, changed in your thinking to make you take this action?

Schroepfer: Yeah, I mean, the world is changing all the time. There's a lot of recent data you know, on the rise of antisemitic beliefs or lack of understanding about the Holocaust. QAnon you know has moved into more of a threat of violence in recent years. And the idea that there would be threats of violence around a US election is a new thing. And so, particularly around places where society and things that are critical events, like an election, we're doing everything we can to, to make sure that people feel safe and secure and informed to make the decision they get to make to elect who is in government. And so we're taking more aggressive measures.

Lichfield: You said something just now, you said there was a lot of data. And that sort of resonates with me with something that I had Alex Stamos, the former chief security officer of Facebook, he said in a podcast recently, that at Facebook decisions are really taken on the basis of data. So is it that you need, you needed to have overwhelming data evidence, but, you know, the Holocaust denial is causing harm or the QAnon is causing harm before you take action against it.

Schroepfer: What I’d say is this is. We operate a service that's used by billions of people around the world and so a mistake I don't wanna make is assume that I understand what other people need, what other people want, or what's happening. And so, a way to avoid that is to rely on expertise where we have it. So, you know, for example, for dangerous organizations, we have many people with backgrounds in counter terrorism, went to West Point, we have many people with law enforcement backgrounds where you talk about voting interference, we have experts with backgrounds and voting and rights.

And so you, you listen to experts, uh, and you look at data and you, and you try to understand that topic rather than, you know, you don't want me making these decisions. You, you want sort of the experts and you want the data to do it. And because it's not just, you know, this issue here, it's, it's issues of privacy, it's issues and locales, and, and, so I would say that we try to be rigorous in using sort of expertise and data where we can, so we're not making assumptions about what's happening in the world or, or what we think people need.

Lichfield: Well, let's talk a bit more about QAnon specifically because the approach that you take, obviously, to handling this information, as you try to train your AIs to recognize stuff that is harmful. And the difficulty with this approach is the nature of misinformation keeps changing it's context specific, right? And misinformation about Muslims in Myanmar, which sparked riots there. You don't know that that is misinformation until it starts appearing. The issue it seems to me with Q Anon is it's such a, it's not like ISIS or something. its beliefs keep changing the accounts, keep changing. So, how do you tackle something that is so ill defined as, as a threat like that?

Schroepfer: Well, you know, I will talk about this and, and I think one of the, from a technical perspective, one of the hardest challenges that I've been very focused on in the last few years, because of similar problems in terms of subtlety, coded language and adversarial behavior, which is hate speech. There's overt hate speech, which is very obvious and you can use sort of phrases you've banked or, or, or keywords. But people adapt and they use coded language and they do it, you know, on a daily, weekly basis. And you can even do this with memes where you have a picture and then you overlay some words on top of it, and it completely changes the meaning. You smell great today. And the pictures of skunk is a very different thing than, you know, a flower, and you have to put it all together.

And so, um, and similarly, as you say, with QAnon and there can be subtlety and things like that. This is why I've been so focused on, you know, a couple of key AI technologies. One is we've dramatically increased the power of these classifiers to understand and, and deal with nuanced information. You know, five or ten years ago, sort of keywords were probably the best we could do. Now we're at the point where our classifiers are catching errors in the labeling data or catching errors that human reviewers sometimes make. Because they are powerful enough to catch subtlety in topics like, is this a post that's inciting violence against a voter? Or are they just expressing displeasure with voting or this population? Those are two very… unfortunately it's a, it's a fine line when you look at how careful people try to be about coding the language to sort of get around it.

And so you see similar things with QAnon and others. And so we've got classifiers now that, that, you know, our state-of-the-art work in multiple languages and are really impressive in what they've done through techniques that we can go into like self supervision, um, to look at, you know, billions of pieces of data to, to train. And then the other thing we've got is we sort of use a similar technique like this, that allows us to do, you know, the best way to describe it as sort of fuzzy matching. Which is as a human reviewer, spends the time and says, you know what, I think that these pieces of misinformation, or this is a QAnon group, even though it's coded in different languages, what we can then do is sort of fan out and find things that are semantically similar, not the exact words, not keywords, not regexes, um, but things that are very close in a, in an embedding space that are semantically similar. And then we can take action on them.

And this allows what I call quick reaction. So, even if I had no idea what this thing was yesterday, today, if a bunch of human reviewers find it, we can then go amplify their work sort of across the network and implement that proactively anytime new pieces of information. Just to put this in context, you know, in Q2, we took down 7 million pieces of COVID misinformation. Obviously in Q4 of last year, there was no such thing as COVID misinformation. So we had to sort of build a new classifier techniques to do this. And the thing I've challenged the team is like getting our classifier build time down from what used to be many, many months to, you know, what, sometimes weeks, to days, to minutes. First time I see an example, or first time I read a new policy, I want to be able to build a classifier that's functional at, you know, at billion user scale. And, you know, we're not there yet, but we're making rapid progress

Lichfield: Well. So I think this is what the question is, how rapid is the progress, right? That, that 7 million pieces of misinformation statistic. I saw that quoted by a Facebook spokesperson in response to a study that came out from Avaaz in August. And it had looked at COVID misinformation that found that the top 10 websites that were spreading misinformation had four times as many estimated views on Facebook as equivalent content from the websites of 10 leading health institutions, like the WHO, they found that only 16% of all health misinformation, they analyzed had a warning label from Facebook. So in other words, you're obviously doing a lot, you're doing a lot more than you were and you, and you're still, by that count way behind the curve. How, and this is a crisis that is killing people. So how long is it going to take you to get there, do you think?

Schroepfer: Yeah, I mean, I think that, you know, this is where, you know, I'd like us to be publishing more data on this. Because really what you needed to compare apples to apples is overall reach of this information, and sort of what is the information, sort of, exposure diet of the average Facebook user. And I think there's a couple of pieces that people don't get. The first is most people's newsfeed is filled with content from their friends. Like, news links, these are sort of a minority of the views all in and people's news feed and Facebook. I mean, the point of Facebook is to connect with your friends and you've probably experienced this yourself. It's, you know, posts and pictures and things like that.

Secondly, on things like COVID misinformation, like what you really got to compare that with is, comparing it, for example, to views of our COVID information center, which we literally shoved to the very top of the newsfeed so that everyone could get information on that. We're doing similar things, um, for voting. We've help to register almost two and a half million voters, in the U.S.. Similar information, you know, for issues of racial justice given all the horrible events that have happened this year. So what I don't have is the comprehensive study of, you know, how many times did someone view the COVID information hub versus these other things? Um, you know, but my guess is it would be that they're getting a lot more of that good information from us.

But look, you know, anytime any of this stuff escapes I'm, I'm not done yet. This is why I'm still here doing my job is, is we want to get this better. And, and, and yes, I wish it was 0%. I wish our classifiers were 99.999% accurate. They're not. You know, my job is to get them there as fast as humanly possible. And when we get off this call, that's what I'm going to go work on. What I can do is just look at like recent history and project progress forward. Because I can't fix the past, but I can fix today and tomorrow. When I look at things like, you know, hate speech where, you know, in 2017, only about a quarter of the pieces of hate speech were found by our systems, first. Almost three quarters of it was found by someone on Facebook first. Which is awful, which means they were exposed to it and had to had to report it to us. And now the number's up to 99, 94.5%. Even in the last, you know, between Q2 of this year and same time last year, we 5Xed, the amount of content we're taking down for hate speech. And I can trace all of that. Now, that number should be 99.99 and we shouldn't even be having this conversation because you should say, I've never seen any of this stuff, and I never hear about it, ‘cause it's gone.

That is my goal, but I can't get there yet. But if you just look at the last, you know, anytime I say something 5Xs in a year, or it goes from 24% to 94% in two years, like, and I say, we've got a, we're not, I'm not out of ideas, we're still deploying state-of-the-art stuff like this week, next week, last week, then that's why I'm optimistic overall that, that we're going to move this problem into a place where it's not the first thing you want to talk to me about but I'm not there yet.

Lichfield: It's a tech problem. It's also obviously a, a workforce problem. You're obviously going to be familiar with, uh, the, the memo that Sophie Zhang, who was a former Facebook data scientist wrote when she departed. And she wrote about how she was working on one of the teams, you have multiple teams that work on trying to identify harmful information around the world. And her main complaint, it seems was that she felt like those teams were understaffed and she was having to prioritize decisions about whether to treat, you know, misinformation around an election in a country for instances as dangerous. And when that, those decisions one prioritized, sometimes it could take months for a problem to be dealt with and that could have real consequences. Um, you have, I think what 15,000 human moderators right now, do you think you have enough people?

Schroepfer: I never think we have enough people on anything. So I, you know, I've yet to be on a project where we were looking for things to work on and I mean that real seriously. And we, you know, at 35,000 people working on this from, you know, review and content and safety and security side. The other thing that I think we don't talk a lot about is, if you go talk to the heads of my AI team and ask them what has Schrep been asking us to do for the last three years, it's integrity, it's content moderation. It's not cool wizzy, new things. It's like, how do we fight this problem? And it's been years we've been working on it.

So I've taken sort of the best and the brightest we have in the company and said, you know, and it's not like I have to order them to do it because they want to work on it. I say, we've got this huge problem, we can help, let's go get this done. Are we done yet? No. Am I impatient? Absolutely. Do I wish we had more people working on it? All the time. You know, we have to make our trade-offs on these things, and so, you know, um, but my job, you know, and what we can do with technology is sort of remove some of those trade-offs. You know, every time we deploy a new, more powerful classifier, um, that removes a ton of work from our human moderators, who can then go work on higher level problems. You know, instead of you, you know, really easy decisions, they move on to misinformation and really vague things and evaluating dangerous groups and that sort of moving people up the difficulty curve is, is also improving things. And that's what we're trying to do.

Strong: We’re going to take a short break - but first, I want to suggest another show I think you'll like. Brave New Planet weighs the pros and cons of a wide range of powerful innovations in science and tech. Dr. Eric Lander, who directs the Broad Institute of MIT and Harvard explores hard questions like;

Lander: Should we alter the Earth’s atmosphere to prevent climate change? And Can truth and democracy survive the impact of deepfakes?

Strong: Brave New Planet is from Pushkin Industries. You can find it wherever you get your podcasts. We’ll be back right after this.

[Advertisement]

Strong: Welcome back to a special episode of In Machines We Trust. This is a conversation between Facebook’s Mike Schroepfer and Tech Review’s Editor-In-Chief Gideon Lichfield. It happened live on the virtual stage of our EmTech Conference, and it’s been edited for length and clarity. If you want more on this topic, including our analysis, please check out the show notes or visit us at Technology Review dot com.

Lichfield: A couple of questions that I'm going to throw in from the audience, how does misinformation affect Facebook's revenue stream? And another is, um, about, uh, how does it affect trust in Facebook? Well, there seems to be an underlying lack of trust in Facebook and how do you measure trust? And the gloss that we want to put on these questions is, clearly you care about misinformation, clearly a lot of the people that work at Facebook care about it or worried by it, but there is, I think an underlying question that people have is does Facebook as a company care about it, is it impacted by it negatively enough for it to really tackle the problem seriously?

Schroepfer: Yeah. I mean, look, I'm a person in society too. I care a lot about democracy and the future and advancing people's lives in a positive way. And I challenge you to find, you know, someone who feels differently inside our offices. And so we, yes, we work at Facebook, but we're people in the world and I care a lot about the future for my children. And well, well, you're asking, do we care? And the answer is yes. Um, you know, do we have the incentives? Like what did we spend a lot of our time talking about today? We talked about misinformation and other things, you know, honestly, what would I rather talk about? I'd rather talk about VR and, and positive uses of AR and all the awesome new technology we're building, because, you know, that's, that's normally what a CTO would be talking about.

So it is obviously something that is challenging trust in the company, trust in our products, that is a huge problem for us, um, from a self-interest standpoint. So even if you think I'm full of it, you just, from a practical self-interested standpoint, like as a brand, as a consumer product that people voluntarily use every single day, when I try to sell a new product like Portal, which is a camera for your home, like the people trust the company that's behind this product and think we have, you know, their, their best intentions at heart. If they don't, it's going to be a huge challenge for absolutely everything I do. So, I think the interests here are, are pretty aligned. I don't think there's a lot of good examples of consumer products that are free, that survive if people don't like them, don't like the companies or think they're bad. So this is from a self-interested standpoint, a critical issue for us.

[Credits]

Strong: This conversation with Facebook’s CTO is the first of two episodes on misinformation and social media. In the next part we chat with the CTO of Twitter. If you’d like to hear our newsroom’s analysis of this topic and the election, I’ve dropped a link in our show notes. I hope you’ll check it out. This episode from EmTech was produced by me and by Emma Cillekens, with special thanks to Brian Bryson and Benji Rosen. We’re edited by Michael Reilly and Gideon Lichfield. As always, thanks for listening. I’m Jennifer Strong.

[TR ID]

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.