Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

David Lazer

David Lazer

Google flu trends has long been the go-to example for anyone asserting the revolutionary potential of big data. Since 2008 the company has claimed it could use counts of flu-related Web searches to forecast flu outbreaks weeks ahead of data from the Centers for Disease Control and Prevention.

Unfortunately, this turned out to be what I call big-data hubris. Colleagues and I recently showed that Google’s tool has drifted further and further from accurately predicting CDC data over time. Among the underlying problems was that Google assumed a constant relationship between flu-related searches and flu prevalence, even as the search technology changed and people began using it in different ways.

That failure is the big-data era’s equivalent of the Chicago Tribune’s “Dewey Defeats Truman” headline in 1948. After public-opinion surveys erroneously predicted Dewey’s victory, the New York Times declared polling “unable to compute statistically the unpredictable and unfathomable nuances of human character.” Yet 64 years later, polling is used widely and successfully. In aggregate it predicted the overall margin of the latest ­presidential election within tenths of a percentage point, as well as the outcome in all 50 states. Surveys remain the bread and butter of social-science research.

That turnaround happened in part thanks to soul-searching by humbled survey companies that led to the development of rigorous, reliable sampling and polling methods. Similar soul-searching is necessary for big data.

One lesson we should draw is that methods and data should be more open. If Google Flu Trends had been more transparent, researchers would have competed to extract a cleaner signal from the raw data. Instead, the tool was not recalibrated for years. A corollary is that we need ways for scholars to build on and use proprietary data while respecting the rights of the data’s owners and the privacy of people represented.

We also need to build multidisciplinary teams around big-data tools. Many problems with Google Flu Trends are of a type well known to generations of social scientists. Unfortunately, big-data analysis is rare in leading social-science journals, and basic social-science research concepts are missing from most big-data research.

Big data is surely being hyped (see “The Limits of Social Engineering”). Yet the essential promise of Google Flu Trends is fundamentally correct. We now have access to detailed data about individual movements, behaviors, and communication. Used correctly, this information could be the starting point for a new “societal science” that can illuminate and do good for the world.

David Lazer is a joint professor in political science and computer science at Northeastern University.

0 comments about this story. Start the discussion »

Credit: Illustration by Sam Kerr

Tagged: Computing

Reprints and Permissions | Send feedback to the editor

From the Archives

Close

Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me
×

A Place of Inspiration

Understand the technologies that are changing business and driving the new global economy.

September 23-25, 2014
Register »