MIT Technology Review Subscribe

How Facebook uses machine learning to detect fake accounts

Fraudsters use fake accounts to spread spam, phishing links, or malware. Now Facebook is revealing details on how it uses AI to fight back.

In 2019, Facebook took down on average close to 2 billion fake accounts per quarter. Fraudsters use these fake accounts to spread spam, phishing links, or malware. It’s a lucrative business that can be devastating for any innocent users that it snares.

Facebook is now releasing details about the machine-learning system it uses to tackle this challenge. The tech giant distinguishes between two types of fake accounts. First there are “user-misclassified accounts,” personal profiles for businesses or pets that are meant to be Pages. These are relatively straightforward to deal with—they just get converted to Pages.  “Violating accounts,” on the other hand, are more serious. These are personal profiles that engage in scamming and spamming or otherwise violate the platform’s terms of service. Violating accounts need to be removed as quickly as possible without casting too wide a net and snagging real accounts as well.

Advertisement

To do this, Facebook uses hand-coded rules and machine learning to block a fake account either before it is created or before it becomes active. In other words, before it can harm real users. The final stage is after a fake account has gone live. This is when detection gets a lot trickier and where the new machine-learning system, known as Deep Entity Classification (DEC), comes in.

This story is only available to subscribers.

Don’t settle for half the story.
Get paywall-free access to technology news for the here and now.

Subscribe now Already a subscriber? Sign in
You’ve read all your free stories.

MIT Technology Review provides an intelligent and independent filter for the flood of information about technology.

Subscribe now Already a subscriber? Sign in

Going deep

DEC learns to differentiate fake and real users by their connection patterns across the network. It calls these “deep features,” and they include things like the average age or gender distribution of the user’s friends. Facebook uses over 20,000 deep features to characterize each account, providing a snapshot of how each profile behaves to make it difficult for attackers to game the system by changing tactics.

The system starts by using a large number of low-precision machine-generated labels. These are generated through a mix of rules and other machine-learning models that estimate whether users are real or fake. Once that data is used to train a neural network, the model is then fine-tuned with a small batch of high-precision hand-labeled data, generated by people around the world who have an understanding of local cultural norms.

The final classification system can identify one of four types of fake profiles: illegitimate accounts not representative of the person, compromised accounts of real users that have been taken over by attackers, spammers who repeatedly send revenue-generating messages, and scammers who manipulate users into divulging personal information. Since implementing DEC, Facebook says, it has kept the volume of fake accounts on the platform at around 5% of monthly active users.

The details of Facebook’s cleanup efforts come amid concerns about manipulation in the upcoming US presidential election, especially around deepfakes. In December, the New York Times reported a coordinated disinformation campaign using deepfakes to spin up fake accounts en masse with convincing profile pictures.

Safeguarding the election

The Facebook team said the timing of its release was only a coincidence. “This is about just spotting violations in general; it’s not specifically targeted at any kind of election topics,” says Daniel Bernhardt, the engineering manager of Facebook’s Community Integrity team. But the DEC would be complementary to the platform’s other efforts to clamp down on election tampering. Because the system relies on deep features to categorize each profile, it will be resilient to being fooled by deepfake profile images, for example.

Aviv Ovadya, who founded the nonprofit Thoughtful Technology Project and studies platform design and governance, says that Facebook’s effort to be more transparent with its cleanup procedures is commendable. “It can be really useful and powerful to carefully talk about architectural decisions—and the ways in which security systems work—that can be emulated by other companies,” he says. “Because companies like Facebook have significantly more resources to invest than smaller companies, it’s useful to have this knowledge sharing.”

But the cleanup efforts also have a long way to go. With 2.5 billion monthly active users, 5% is still 125 million fake accounts. Machine learning will also go only so far: no matter how much data a model is trained on, it won’t ever catch every bad account with perfect precision. The platform will likely need to turn to other combinations of humans and machines to improve.

Advertisement

Update: An earlier version of this article referenced outdated numbers about the impact of Facebook’s DEC system. They have been updated to reflect the most recent information.

This is your last free story.
Sign in Subscribe now

Your daily newsletter about what’s up in emerging technology from MIT Technology Review.

Please, enter a valid email.
Privacy Policy
Submitting...
There was an error submitting the request.
Thanks for signing up!

Our most popular stories

Advertisement