Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Walter Frick

A View from Walter Frick

Mechanical Turk and the Limits of Big Data

The Internet is transforming how researchers perform experiments across the social sciences.

  • November 1, 2012

It’s telling that the most interesting presenter during MIT Technology Review’s EmTech session on big data last week was not really about big data at all. It was about Amazon’s Mechanical Turk, and the experiments it makes possible.

Like many other researchers, sociologist and Microsoft researcher Duncan Watts performs experiments using Mechanical Turk, an online marketplace that allows users to pay others to complete tasks. Used largely to fill in gaps in applications where human intelligence is required, social scientists are increasingly turning to the platform to test their hypotheses.

The point Watts made at EmTech was that, from his perspective, the data revolution has less to do with the amount of data available and more to do with the newly lowered cost of running online experiments.

Compare that to Facebook data scientists Eytan Bakshy and Andrew Fiore, who presented right before Watts. Facebook, of course, generates a massive amount of data, and the two spoke of the experiments they perform to inform the design of its products.

But what might have looked like two competing visions for the future of data and hypothesis testing are really two sides of the big data coin. That’s because data on its own isn’t enough. Even the kind of experiment Bakshy and Fiore discussed—essentially an elaborate A/B test—has its limits.

This is a point political forecaster and author Nate Silver discusses in his recent book The Signal and the Noise. After discussing economic forecasters who simply gather as much data as possible and then make inferences without respect for theory, he writes:

This kind of statement is becoming more common in the age of Big Data. Who needs theory when you have so much information? But this is categorically the wrong attitude to take toward forecasting, especially in a field like economics, where the data is so noisy. Statistical inferences are much stronger when backed up by theory or at least some deeper thinking about their root causes.

Bakshy and Fiore no doubt understand this, as they cited plenty of theory in their presentation. But Silver’s point is an important one. Data on its own won’t spit out answers; theory needs to progress as well. That’s where Watts’s work comes in. 

The Internet is transforming how researchers think of the “lab” and enabling new kinds of experiments across the social sciences. Those experiments will be critical in helping us collectively make sense of the huge amounts of data we’re now generating. And those huge data sets will help inform the direction of Watts’s and others’ experiments.

The value of big data isn’t simply in the answers it provides, but rather in the questions it suggests that we ask.

Tech Obsessive?
Become an Insider to get the story behind the story — and before anyone else.

Subscribe today

Uh oh–you've read all of your free articles for this month.

Insider Premium
$179.95/yr US PRICE

More from Intelligent Machines

Artificial intelligence and robots are transforming how we work and live.

Want more award-winning journalism? Subscribe to Insider Basic.
  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning magazine and daily delivery of The Download, our newsletter of what’s important in technology and innovation.

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

/
You've read all of your free articles this month. This is your last free article this month. You've read of free articles this month. or  for unlimited online access.