Skip to Content

Mistaken Analysis

It’s too easy to be led astray by the lure of big data.
April 23, 2014

Google flu trends has long been the go-to example for anyone asserting the revolutionary potential of big data. Since 2008 the company has claimed it could use counts of flu-related Web searches to forecast flu outbreaks weeks ahead of data from the Centers for Disease Control and Prevention.

David Lazer
David Lazer

Unfortunately, this turned out to be what I call big-data hubris. Colleagues and I recently showed that Google’s tool has drifted further and further from accurately predicting CDC data over time. Among the underlying problems was that Google assumed a constant relationship between flu-related searches and flu prevalence, even as the search technology changed and people began using it in different ways.

That failure is the big-data era’s equivalent of the Chicago Tribune’s “Dewey Defeats Truman” headline in 1948. After public-opinion surveys erroneously predicted Dewey’s victory, the New York Times declared polling “unable to compute statistically the unpredictable and unfathomable nuances of human character.” Yet 64 years later, polling is used widely and successfully. In aggregate it predicted the overall margin of the latest ­presidential election within tenths of a percentage point, as well as the outcome in all 50 states. Surveys remain the bread and butter of social-science research.

That turnaround happened in part thanks to soul-searching by humbled survey companies that led to the development of rigorous, reliable sampling and polling methods. Similar soul-searching is necessary for big data.

One lesson we should draw is that methods and data should be more open. If Google Flu Trends had been more transparent, researchers would have competed to extract a cleaner signal from the raw data. Instead, the tool was not recalibrated for years. A corollary is that we need ways for scholars to build on and use proprietary data while respecting the rights of the data’s owners and the privacy of people represented.

We also need to build multidisciplinary teams around big-data tools. Many problems with Google Flu Trends are of a type well known to generations of social scientists. Unfortunately, big-data analysis is rare in leading social-science journals, and basic social-science research concepts are missing from most big-data research.

Big data is surely being hyped (see “The Limits of Social Engineering”). Yet the essential promise of Google Flu Trends is fundamentally correct. We now have access to detailed data about individual movements, behaviors, and communication. Used correctly, this information could be the starting point for a new “societal science” that can illuminate and do good for the world.

David Lazer is a joint professor in political science and computer science at Northeastern University.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.