Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

A View from David Lazer

Mistaken Analysis

It’s too easy to be led astray by the lure of big data.

  • April 23, 2014

Google flu trends has long been the go-to example for anyone asserting the revolutionary potential of big data. Since 2008 the company has claimed it could use counts of flu-related Web searches to forecast flu outbreaks weeks ahead of data from the Centers for Disease Control and Prevention.

David Lazer
David Lazer

Unfortunately, this turned out to be what I call big-data hubris. Colleagues and I recently showed that Google’s tool has drifted further and further from accurately predicting CDC data over time. Among the underlying problems was that Google assumed a constant relationship between flu-related searches and flu prevalence, even as the search technology changed and people began using it in different ways.

This story is part of our May/June 2014 Issue
See the rest of the issue
Subscribe

That failure is the big-data era’s equivalent of the Chicago Tribune’s “Dewey Defeats Truman” headline in 1948. After public-opinion surveys erroneously predicted Dewey’s victory, the New York Times declared polling “unable to compute statistically the unpredictable and unfathomable nuances of human character.” Yet 64 years later, polling is used widely and successfully. In aggregate it predicted the overall margin of the latest ­presidential election within tenths of a percentage point, as well as the outcome in all 50 states. Surveys remain the bread and butter of social-science research.

That turnaround happened in part thanks to soul-searching by humbled survey companies that led to the development of rigorous, reliable sampling and polling methods. Similar soul-searching is necessary for big data.

One lesson we should draw is that methods and data should be more open. If Google Flu Trends had been more transparent, researchers would have competed to extract a cleaner signal from the raw data. Instead, the tool was not recalibrated for years. A corollary is that we need ways for scholars to build on and use proprietary data while respecting the rights of the data’s owners and the privacy of people represented.

We also need to build multidisciplinary teams around big-data tools. Many problems with Google Flu Trends are of a type well known to generations of social scientists. Unfortunately, big-data analysis is rare in leading social-science journals, and basic social-science research concepts are missing from most big-data research.

Big data is surely being hyped (see “The Limits of Social Engineering”). Yet the essential promise of Google Flu Trends is fundamentally correct. We now have access to detailed data about individual movements, behaviors, and communication. Used correctly, this information could be the starting point for a new “societal science” that can illuminate and do good for the world.

David Lazer is a joint professor in political science and computer science at Northeastern University.

Tech Obsessive?
Become an Insider to get the story behind the story — and before anyone else.

Subscribe today

Uh oh–you've read all of your free articles for this month.

Insider Premium
$179.95/yr US PRICE

More from Intelligent Machines

Artificial intelligence and robots are transforming how we work and live.

Want more award-winning journalism? Subscribe to Insider Basic.
  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning print magazine, unlimited online access plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Bimonthly print magazine (6 issues per year)

/
You've read all of your free articles this month. This is your last free article this month. You've read of free articles this month. or  for unlimited online access.