Skip to Content

Mistaken Analysis

It’s too easy to be led astray by the lure of big data.
April 23, 2014

Google flu trends has long been the go-to example for anyone asserting the revolutionary potential of big data. Since 2008 the company has claimed it could use counts of flu-related Web searches to forecast flu outbreaks weeks ahead of data from the Centers for Disease Control and Prevention.

David Lazer
David Lazer

Unfortunately, this turned out to be what I call big-data hubris. Colleagues and I recently showed that Google’s tool has drifted further and further from accurately predicting CDC data over time. Among the underlying problems was that Google assumed a constant relationship between flu-related searches and flu prevalence, even as the search technology changed and people began using it in different ways.

That failure is the big-data era’s equivalent of the Chicago Tribune’s “Dewey Defeats Truman” headline in 1948. After public-opinion surveys erroneously predicted Dewey’s victory, the New York Times declared polling “unable to compute statistically the unpredictable and unfathomable nuances of human character.” Yet 64 years later, polling is used widely and successfully. In aggregate it predicted the overall margin of the latest ­presidential election within tenths of a percentage point, as well as the outcome in all 50 states. Surveys remain the bread and butter of social-science research.

That turnaround happened in part thanks to soul-searching by humbled survey companies that led to the development of rigorous, reliable sampling and polling methods. Similar soul-searching is necessary for big data.

One lesson we should draw is that methods and data should be more open. If Google Flu Trends had been more transparent, researchers would have competed to extract a cleaner signal from the raw data. Instead, the tool was not recalibrated for years. A corollary is that we need ways for scholars to build on and use proprietary data while respecting the rights of the data’s owners and the privacy of people represented.

We also need to build multidisciplinary teams around big-data tools. Many problems with Google Flu Trends are of a type well known to generations of social scientists. Unfortunately, big-data analysis is rare in leading social-science journals, and basic social-science research concepts are missing from most big-data research.

Big data is surely being hyped (see “The Limits of Social Engineering”). Yet the essential promise of Google Flu Trends is fundamentally correct. We now have access to detailed data about individual movements, behaviors, and communication. Used correctly, this information could be the starting point for a new “societal science” that can illuminate and do good for the world.

David Lazer is a joint professor in political science and computer science at Northeastern University.

Keep Reading

Most Popular

computation concept
computation concept

How AI is reinventing what computers are

Three key ways artificial intelligence is changing what it means to compute.

still from Embodied Intelligence video
still from Embodied Intelligence video

These weird virtual creatures evolve their bodies to solve problems

They show how intelligence and body plans are closely linked—and could unlock AI for robots.

seeing is believing concept
seeing is believing concept

Our brains exist in a state of “controlled hallucination”

Three new books lay bare the weirdness of how our brains process the world around us.

We reviewed three at-home covid tests. The results were mixed.

Over-the-counter coronavirus tests are finally available in the US. Some are more accurate and easier to use than others.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.