Skip to Content
MIT Technology Review

Explainer: What do political databases know about you?

If you live in the US, you’re almost certainly being tracked by political organizations. They know a lot about you—but some data is just guesswork.

I Voted stickersI Voted stickers
Getty

American citizens are inundated with political messages—on social networks, in their news feeds, through email, text messages, and phone calls. It’s not an accident that people get bombarded: political groups prefer a “multimodal” voter contact strategy, where they use many platforms and multiple attempts to persuade a citizen to engage with their cause or candidate. An ad is followed by an email, which is followed by a text message—all designed to reinforce the message.

These strategies are employed by political campaigns, political action committees, advocacy groups, and nonprofits alike. These different groups are subject to very different rules and regulations, but they all rely on capturing and devouring data about millions of people in America. 

Who is in these data sets?

Almost everyone. Most campaigns get their voter information from a handful of data vendors, either nonpartisan or partisan. These companies try to provide data on all US adults, regardless of whether they are registered voters. It’s unlikely that an individual vendor has comprehensive files on all eligible US voters, but the Pew Research Center, which released a report on commercial voter files in 2018, found that over 90% of people in its own sample of US adults could be found on at least one registry.

What data is collected and where does it come from?

The main source of voter data is public voting records, which include a voter’s names, address, and party affiliation. But voter data is very patchy and decentralized: each state holds its own database, and they often have different attributes. So vendors supplement it with other sources, like phone books and credit data. 

It’s hard to get a full picture of everything that is fed into the vendors’ databases: the recipe each one uses is usually considered a trade secret. Pew’s study explained that the registries are “an amalgamation of administrative data from states about registration and voting, modeled data about partisanship, political engagement and political support provided by vendors; and demographic, financial and lifestyle data culled from a wide range of sources.” 

Data vendors attempt to match up and reconcile these different data sets to create one comprehensive record for each person in the US based on key identifiers like name, address, gender, and date of birth.

L2 is one of the largest companies trading in this information, and it claims to have more than 600 data attributes pulled from census data, emails from commercial sources, donor data sets, and more. Experts say that most vendors provide hundreds of data points about each voter. 

How accurate are these voter databases? 

It’s up for debate. Some data points are very accurate, but others are really just predictions or guesses. Party and race, for example, are often inferred on the basis of someone’s name and location. Somebody with the last name Ryan is assumed to be white, while somebody in a heavily Republican district is assumed to be a Republican voter. 

The accuracy of specific attributes varies a lot: Pew found that race was accurate 79% of the time, education 51%, and religion 52%. Household income, meanwhile, was accurate just 37% of the time. There was also measurable bias, with higher error rates for younger, highly mobile, unregistered, and Hispanic voters. 

Eitan Hersh, a professor at Tufts who testified to Congress after the Cambridge Analytica scandal in 2016, believes the data—particularly the modeled attributes—is inaccurate to the point of hindering its usefulness for campaigns. In his testimony, he noted that models he’d studied assumed a person’s race incorrectly 25% of the time. And race is much easier to predict than a person’s swing issue.

How do political groups use this data?

Campaigns and other political groups purchase data from vendors, but they often combine information and attach additional data sets to it. Campaigns will also create data sets themselves from social-media testing and advertisement data, though it’s not clear just how common this practice is. 

They often use all this to try to identify adults who will respond to a specific issue. For example, a campaign might develop a model to find voters who support climate change legislation. The model might use these data sets to spit out a list of voters ranked on a scale from 1 to 100, with 100 being those most likely to strongly support the cause. The campaign could then choose to send a message to voters with a score higher than 70 in an effort to encourage mobilization. 

Although it’s arguable whether targeted advertising shifts the way people vote, it has proved extremely useful in harvesting other contact information, like email addresses, and in raising money. 

What role does social media play?

Your social-media information—such as the public Facebook posts you’ve liked or the Twitter hashtags you’ve used—can be combined with other data at many different stages. Some vendors integrate social-media data into their main data set, especially for people whose profile matches their name. That information can help build better predictive models.

Infamously, Cambridge Analytica gamed Facebook by acquiring information on 270,000 users from a third-party app, and pulled the friend networks of those users until it had a data set covering 87 million people, most of whom had not consented and were not aware this was happening. It claimed to run models on that data to generate personalized and predictive political pictures of users.

But the effectiveness of such techniques is up for debate. 

A 2013 study by psychologist Michal Kosinski, on which Cambridge Analytica based many of its methods, argued that the data from 150 likes on Facebook is enough for an algorithm to know your “sensitive personal attributes” better than a family member does. But Cambridge Analytica was not able to produce any evidence that it succeeded in creating these algorithms, or that any of its targeting persuaded anybody. It’s incredibly hard to attribute any vote to a particular ad, article, or tweet. 

One of the most important uses of social-media information is to refine and target messaging. A/B testing has gotten so precise that campaigns can keep tweaking a given ad until it becomes hyper-specific to the user. 

What are the different kinds of targeted ads?

Targeted ads are messages directed to people on the basis of their confirmed or suspected political identities. Many focus on issue persuasion, voter mobilization, or fundraising, and some groups use much more sophisticated approaches than others.

Targeting methods include email, telephone, and text message, but much of the advertising takes place online—on Facebook, Google, and Instagram. Twitter banned political ads this November, though 501c(3) nonprofit groups are still able to use targeting on the platform. In order to target a voter, groups will use specific filters in order to reach exactly who they want—for example, women on college campuses in Michigan. On Facebook, and possibly on other social platforms as well, campaigns can actually target individuals directly by uploading a list of accounts—perhaps just a tiny number of people, if the advertiser wants to do extremely specific personalized messaging.

What rules are there about the way data gets used?

Different groups are subject to different rules. 501c(3) groups like Turning Point USA or the Tides Foundation can’t advance any electoral or candidate messages. They are also exempt from donor-disclosure laws. Political campaigns, on the other hand, are subject to campaign finance laws and oversight by the Federal Election Commission. 

But although campaign-sponsored advertisements must be identified as such, on the internet it is often unclear who exactly is trying to grab your attention and support. Misinformation and manipulation get confused with official campaign messaging, while campaigns can skirt accountability by distancing themselves from more controversial groups with parallel messages. 

Why does this matter for the 2020 election?

Polling data suggests it is likely that this election will be decided in the suburbs. In 2016, it was suburban counties that gave Trump the electoral edge even while he trailed in the popular vote. And suburban voters use Facebook … a lot. Campaigns and advocacy groups can use the growing power of data crunching to speak directly to those voters. So far, Donald Trump has spent twice as much on Facebook ads as Joe Biden.