A View from David Lazer

Mistaken Analysis

It’s too easy to be led astray by the lure of big data.

  • April 23, 2014

Google flu trends has long been the go-to example for anyone asserting the revolutionary potential of big data. Since 2008 the company has claimed it could use counts of flu-related Web searches to forecast flu outbreaks weeks ahead of data from the Centers for Disease Control and Prevention.

David Lazer
David Lazer

Unfortunately, this turned out to be what I call big-data hubris. Colleagues and I recently showed that Google’s tool has drifted further and further from accurately predicting CDC data over time. Among the underlying problems was that Google assumed a constant relationship between flu-related searches and flu prevalence, even as the search technology changed and people began using it in different ways.

This story is part of our May/June 2014 Issue
See the rest of the issue
Subscribe

That failure is the big-data era’s equivalent of the Chicago Tribune’s “Dewey Defeats Truman” headline in 1948. After public-opinion surveys erroneously predicted Dewey’s victory, the New York Times declared polling “unable to compute statistically the unpredictable and unfathomable nuances of human character.” Yet 64 years later, polling is used widely and successfully. In aggregate it predicted the overall margin of the latest ­presidential election within tenths of a percentage point, as well as the outcome in all 50 states. Surveys remain the bread and butter of social-science research.

That turnaround happened in part thanks to soul-searching by humbled survey companies that led to the development of rigorous, reliable sampling and polling methods. Similar soul-searching is necessary for big data.

One lesson we should draw is that methods and data should be more open. If Google Flu Trends had been more transparent, researchers would have competed to extract a cleaner signal from the raw data. Instead, the tool was not recalibrated for years. A corollary is that we need ways for scholars to build on and use proprietary data while respecting the rights of the data’s owners and the privacy of people represented.

We also need to build multidisciplinary teams around big-data tools. Many problems with Google Flu Trends are of a type well known to generations of social scientists. Unfortunately, big-data analysis is rare in leading social-science journals, and basic social-science research concepts are missing from most big-data research.

Big data is surely being hyped (see “The Limits of Social Engineering”). Yet the essential promise of Google Flu Trends is fundamentally correct. We now have access to detailed data about individual movements, behaviors, and communication. Used correctly, this information could be the starting point for a new “societal science” that can illuminate and do good for the world.

David Lazer is a joint professor in political science and computer science at Northeastern University.

Tech Obsessive?
Become an Insider to get the story behind the story — and before anyone else.
Subscribe today

Uh oh–you've read all five of your free articles for this month.

Insider Premium

$179.95/yr US PRICE

More from Intelligent Machines

Artificial intelligence and robots are transforming how we work and live.

Want more award-winning journalism? Subscribe to Insider Plus.

  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus ad-free web experience, select discounts to partner offerings and MIT Technology Review events

    See details+

    What's Included

    Bimonthly home delivery and unlimited 24/7 access to MIT Technology Review’s website.

    The Download. Our daily newsletter of what's important in technology and innovation.

    Access to the Magazine archive. Over 24,000 articles going back to 1899 at your fingertips.

    Special Discounts to select partner offerings

    Discount to MIT Technology Review events

    Ad-free web experience

You've read of free articles this month.