In 2019, Facebook took down on average close to 2 billion fake accounts per quarter. Fraudsters use these fake accounts to spread spam, phishing links, or malware. It’s a lucrative business that can be devastating for any innocent users that it snares.
Facebook is now releasing details about the machine-learning system it uses to tackle this challenge. The tech giant distinguishes between two types of fake accounts. First there are “user-misclassified accounts,” personal profiles for businesses or pets that are meant to be Pages. These are relatively straightforward to deal with—they just get converted to Pages. “Violating accounts,” on the other hand, are more serious. These are personal profiles that engage in scamming and spamming or otherwise violate the platform’s terms of service. Violating accounts need to be removed as quickly as possible without casting too wide a net and snagging real accounts as well.
To do this, Facebook uses hand-coded rules and machine learning to block a fake account either before it is created or before it becomes active. In other words, before it can harm real users. The final stage is after a fake account has gone live. This is when detection gets a lot trickier and where the new machine-learning system, known as Deep Entity Classification (DEC), comes in.
DEC learns to differentiate fake and real users by their connection patterns across the network. It calls these “deep features,” and they include things like the average age or gender distribution of the user’s friends. Facebook uses over 20,000 deep features to characterize each account, providing a snapshot of how each profile behaves to make it difficult for attackers to game the system by changing tactics.
The system starts by using a large number of low-precision machine-generated labels. These are generated through a mix of rules and other machine-learning models that estimate whether users are real or fake. Once that data is used to train a neural network, the model is then fine-tuned with a small batch of high-precision hand-labeled data, generated by people around the world who have an understanding of local cultural norms.
The final classification system can identify one of four types of fake profiles: illegitimate accounts not representative of the person, compromised accounts of real users that have been taken over by attackers, spammers who repeatedly send revenue-generating messages, and scammers who manipulate users into divulging personal information. Since implementing DEC, Facebook says, it has kept the volume of fake accounts on the platform at around 5% of monthly active users.
The details of Facebook’s cleanup efforts come amid concerns about manipulation in the upcoming US presidential election, especially around deepfakes. In December, the New York Times reported a coordinated disinformation campaign using deepfakes to spin up fake accounts en masse with convincing profile pictures.
The Facebook team said the timing of its release was only a coincidence. “This is about just spotting violations in general; it's not specifically targeted at any kind of election topics,” says Daniel Bernhardt, the engineering manager of Facebook’s Community Integrity team. But the DEC would be complementary to the platform’s other efforts to clamp down on election tampering. Because the system relies on deep features to categorize each profile, it will be resilient to being fooled by deepfake profile images, for example.
Aviv Ovadya, who founded the nonprofit Thoughtful Technology Project and studies platform design and governance, says that Facebook’s effort to be more transparent with its cleanup procedures is commendable. “It can be really useful and powerful to carefully talk about architectural decisions—and the ways in which security systems work—that can be emulated by other companies,” he says. “Because companies like Facebook have significantly more resources to invest than smaller companies, it’s useful to have this knowledge sharing.”
But the cleanup efforts also have a long way to go. With 2.5 billion monthly active users, 5% is still 125 million fake accounts. Machine learning will also go only so far: no matter how much data a model is trained on, it won’t ever catch every bad account with perfect precision. The platform will likely need to turn to other combinations of humans and machines to improve.
Update: An earlier version of this article referenced outdated numbers about the impact of Facebook's DEC system. They have been updated to reflect the most recent information.