How Obama Used Big Data to Rally Voters, Part 1

Part 1: How President Obama’s campaign used big data to rally individual voters.

Sasha Issenbergarchive page

December 16, 2012

Two years after Barack Obama’s election as president, Democrats suffered their worst defeat in decades. The congressional majorities that had given Obama his legislative successes, reforming the health-insurance and financial markets, were swept away in the midterm elections; control of the House flipped and the Democrats’ lead in the Senate shrank to an ungovernably slim margin. Pundits struggled to explain the rise of the Tea Party. Voters’ disappointment with the Obama agenda was evident as independents broke right and Democrats stayed home. In 2010, the Democratic National Committee failed its first test of the Obama era: it had not kept the Obama coalition together.

But for Democrats, there was bleak consolation in all this: Dan Wagner had seen it coming. When Wagner was hired as the DNC’s targeting director, in January of 2009, he became responsible for collecting voter information and analyzing it to help the committee approach individual voters by direct mail and phone. But he appreciated that the raw material he was feeding into his statistical models amounted to a series of surveys on voters’ attitudes and preferences. He asked the DNC’s technology department to develop software that could turn that information into tables, and he called the result Survey Manager.

That fall, when a special election was held to fill an open congressional seat in upstate New York, Wagner successfully predicted the final margin within 150 votes—well before Election Day. Months later, pollsters projected that Martha Coakley was certain to win another special election, to fill the Massachusetts Senate seat left empty by the death of Ted Kennedy. But Wagner’s Survey Manager correctly predicted that the Republican Scott Brown was likely to prevail in the strongly Democratic state. “It’s one thing to be right when you’re going to win,” says Jeremy Bird, who served as national deputy director of Organizing for America, the Obama campaign in abeyance, housed at the DNC. “It’s another thing to be right when you’re going to lose.”

It is yet another thing to be right five months before you’re going to lose. As the 2010 midterms approached, Wagner built statistical models for selected Senate races and 74 congressional districts. Starting in June, he began predicting the elections’ outcomes, forecasting the margins of victory with what turned out to be improbable accuracy. But he hadn’t gotten there with traditional polls. He had counted votes one by one. His first clue that the party was in trouble came from thousands of individual survey calls matched to rich statistical profiles in the DNC’s databases. Core Democratic voters were telling the DNC’s callers that they were much less likely to vote than statistical probability suggested. Wagner could also calculate how much the Democrats’ mobilization programs would do to increase turnout among supporters, and in most races he knew it wouldn’t be enough to cover the gap revealing itself in Survey Manager’s tables.

His congressional predictions were off by an average of only 2.5 percent. “That was a proof point for a lot of people who don’t understand the math behind it but understand the value of what that math produces,” says Mitch Stewart, Organizing for America’s director. “Once that first special [election] happened, his word was the gold standard at the DNC.”

The significance of Wagner’s achievement went far beyond his ability to declare winners months before Election Day. His approach amounted to a decisive break with 20th-century tools for tracking public opinion, which revolved around quarantining small samples that could be treated as representative of the whole. Wagner had emerged from a cadre of analysts who thought of voters as individuals and worked to aggregate projections about their opinions and behavior until they revealed a composite picture of everyone. His techniques marked the fulfillment of a new way of thinking, a decade in the making, in which voters were no longer trapped in old political geographies or tethered to traditional demographic categories, such as age or gender, depending on which attributes pollsters asked about or how consumer marketers classified them for commercial purposes. Instead, the electorate could be seen as a collection of individual citizens who could each be measured and assessed on their own terms. Now it was up to a candidate who wanted to lead those people to build a campaign that would interact with them the same way.

Dan Wagner, the chief analytics officer for Obama 2012, led the campaign’s “Cave” of data scientists.

After the voters returned Obama to office for a second term, his campaign became celebrated for its use of technology—much of it developed by an unusual team of coders and engineers—that redefined how individuals could use the Web, social media, and smartphones to participate in the political process. A mobile app allowed a canvasser to download and return walk sheets without ever entering a campaign office; a Web platform called Dashboard gamified volunteer activity by ranking the most active supporters; and “targeted sharing” protocols mined an Obama backer’s Facebook network in search of friends the campaign wanted to register, mobilize, or persuade.

But underneath all that were scores describing particular voters: a new political currency that predicted the behavior of individual humans. The campaign didn’t just know who you were; it knew exactly how it could turn you into the type of person it wanted you to be.

The Scores

Four years earlier, Dan Wagner had  been working at a Chicago economic consultancy, using forecasting skills developed studying econometrics at the University of Chicago, when he fell for Barack Obama and decided he wanted to work on his home-state senator’s 2008 presidential campaign. Wagner, then 24, was soon in Des Moines, handling data entry for the state voter file that guided Obama to his crucial victory in the Iowa caucuses. He bounced from state to state through the long primary calendar, growing familiar with voter data and the ways of using statistical models to intelligently sort the electorate. For the general election, he was named lead targeter for the Great Lakes/Ohio River Valley region, the most intense battleground in the country.

After Obama’s victory, many of his top advisors decamped to Washington to make preparations for governing. Wagner was told to stay behind and serve on a post-election task force that would review a campaign that had looked, to the outside world, technically flawless.

In the 2008 presidential election, Obama’s targeters had assigned every voter in the country a pair of scores based on the probability that the individual would perform two distinct actions that mattered to the campaign: casting a ballot and supporting Obama. These scores were derived from an unprecedented volume of ongoing survey work. For each battleground state every week, the campaign’s call centers conducted 5,000 to 10,000 so-called short-form interviews that quickly gauged a voter’s preferences, and 1,000 interviews in a long-form version that was more like a traditional poll. To derive individual-level predictions, algorithms trawled for patterns between these opinions and the data points the campaign had assembled for every voter—as many as one thousand variables each, drawn from voter registration records, consumer data warehouses, and past campaign contacts.

This innovation was most valued in the field. There, an almost perfect cycle of microtargeting models directed volunteers to scripted conversations with specific voters at the door or over the phone. Each of those interactions produced data that streamed back into Obama’s servers to refine the models pointing volunteers toward the next door worth a knock. The efficiency and scale of that process put the Democrats well ahead when it came to profiling voters. John McCain’s campaign had, in most states, run its statistical model just once, assigning each voter to one of its microtargeting segments in the summer. McCain’s advisors were unable to recalculate the probability that those voters would support their candidate as the dynamics of the race changed. Obama’s scores, on the other hand, adjusted weekly, responding to new events like Sarah Palin’s vice-presidential nomination or the collapse of Lehman Brothers.

Within the campaign, however, the Obama data operations were understood to have shortcomings. As was typical in political information infrastructure, knowledge about people was stored separately from data about the campaign’s interactions with them, mostly because the databases built for those purposes had been developed by different consultants who had no interest in making their systems work together.

But the task force knew the next campaign wasn’t stuck with that situation. Obama would run his final race not as an insurgent against a party establishment, but as the establishment itself. For four years, the task force members knew, their team would control the Democratic Party’s apparatus. Their demands, not the offerings of consultants and vendors, would shape the marketplace. Their report recommended developing a “constituent relationship management system” that would allow staff across the campaign to look up individuals not just as voters or volunteers or donors or website users but as citizens in full. “We realized there was a problem with how our data and infrastructure interacted with the rest of the campaign, and we ought to be able to offer it to all parts of the campaign,” says Chris Wegrzyn, a database applications developer who served on the task force.

Wegrzyn became the DNC’s lead targeting developer and oversaw a series of costly acquisitions, all intended to free the party from the traditional dependence on outside vendors. The committee installed a Siemens Enterprise System phone-dialing unit that could put out 1.2 million calls a day to survey voters’ opinions. Later, party leaders signed off on a $280,000 license to use Vertica software from Hewlett-Packard that allowed their servers to access not only the party’s 180-million-person voter file but all the data about volunteers, donors, and those who had interacted with Obama online.

Many of those who went to Washington after the 2008 election in order to further the president’s political agenda returned to Chicago in the spring of 2011 to work on his reëlection. The chastening losses they had experienced in Washington separated them from those who had known only the ecstasies of 2008. “People who did ’08, but didn’t do ’10, and came back in ’11 or ’12—they had the hardest culture clash,” says Jeremy Bird, who became national field director on the reëlection campaign. But those who went to Washington and returned to Chicago developed a particular appreciation for Wagner’s methods of working with the electorate at an atomic level. It was a way of thinking that perfectly aligned with their simple theory of what it would take to win the president reëlection: get everyone who had voted for him in 2008 to do it again. At the same time, they knew they would need to succeed at registering and mobilizing new voters, especially in some of the fastest-growing demographic categories, to make up for any 2008 voters who did defect.

Obama’s campaign began the election year confident it knew the name of every one of the 69,456,897 Americans whose votes had put him in the White House. They may have cast those votes by secret ballot, but Obama’s analysts could look at the Democrats’ vote totals in each precinct and identify the people most likely to have backed him. Pundits talked in the abstract about reassembling Obama’s 2008 coalition. But within the campaign, the goal was literal. They would reassemble the coalition, one by one, through personal contacts.

Tomorrow: Part 2—The Experiments

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.