Prediction Models Gone Wild: Why Election Forecasts and Polls Were So Wrong

The polls had Clinton ahead, the real-time data said she’d walk it—here’s what they missed.

Michael Reillyarchive page

November 9, 2016

If you tuned in to Vice News Tuesday afternoon or checked out election coverage on Slate, you’d have thought Hillary Clinton was all but assured of becoming America’s first female president.

In a first, both outlets were carrying real-time data streaming in from a startup called VoteCastr, which bills itself as giving America an unprecedented glimpse of “the game as it unfolds.” VoteCastr’s last predictions, made around 9 p.m. Tuesday evening, had Clinton up in Pennsylvania, Ohio, Florida, Wisconsin, and Iowa—all of which Donald Trump ended up winning on his way to becoming president-elect.

To be sure, VoteCastr wasn’t alone. Many election forecasts based on polling, demographics, and historical data, were similarly wide of the mark. The New York Times’ Upshot model, for example, gave Clinton around an 85 percent chance of winning, while the vaunted data-driven forecasting site FiveThirtyEight gave her a 72 percent chance.

This is at least partly explained by the fact that the polls the forecasts were based on were either way off or at least were making the most of their margin of error (though if Clinton ends up winning the popular vote, as it looks like she might, you could argue that the polls got it right—they just picked the wrong winner). An underappreciation of how powerful white, working-class voters are in this country may have also contributed, as could people simply not giving honest responses when asked who they supported. This is something that the Clinton campaign, which figured her Midwest “firewall” of states were all safe, got wrong. The data team working with the Trump campaign gave him a one-in-five chance of winning.

But VoteCastr’s failure—to say nothing of its rickety technical performance throughout election day—is unique because of the claims it made. The company, which has data gurus on staff who worked on the George W. Bush and Obama campaigns, gave the impression that by paying ultra-close attention to early voting results, voters’ identities, and exit polling they could somehow derive a real-time look at the election that wasn’t available before. It was meant to be the new cutting edge in data-driven election products.

Instead, it showed itself to be a flawed, incomplete means of election-tracking—a lot like the other products out there, as it turns out.

(Read more: Politico, FiveThirtyEight, The New York Times, Bloomberg)

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.