Skip to Content

Mechanical Turk and the Limits of Big Data

The Internet is transforming how researchers perform experiments across the social sciences.
November 1, 2012

It’s telling that the most interesting presenter during MIT Technology Review’s EmTech session on big data last week was not really about big data at all. It was about Amazon’s Mechanical Turk, and the experiments it makes possible.

Like many other researchers, sociologist and Microsoft researcher Duncan Watts performs experiments using Mechanical Turk, an online marketplace that allows users to pay others to complete tasks. Used largely to fill in gaps in applications where human intelligence is required, social scientists are increasingly turning to the platform to test their hypotheses.

The point Watts made at EmTech was that, from his perspective, the data revolution has less to do with the amount of data available and more to do with the newly lowered cost of running online experiments.

Compare that to Facebook data scientists Eytan Bakshy and Andrew Fiore, who presented right before Watts. Facebook, of course, generates a massive amount of data, and the two spoke of the experiments they perform to inform the design of its products.

But what might have looked like two competing visions for the future of data and hypothesis testing are really two sides of the big data coin. That’s because data on its own isn’t enough. Even the kind of experiment Bakshy and Fiore discussed—essentially an elaborate A/B test—has its limits.

This is a point political forecaster and author Nate Silver discusses in his recent book The Signal and the Noise. After discussing economic forecasters who simply gather as much data as possible and then make inferences without respect for theory, he writes:

This kind of statement is becoming more common in the age of Big Data. Who needs theory when you have so much information? But this is categorically the wrong attitude to take toward forecasting, especially in a field like economics, where the data is so noisy. Statistical inferences are much stronger when backed up by theory or at least some deeper thinking about their root causes.

Bakshy and Fiore no doubt understand this, as they cited plenty of theory in their presentation. But Silver’s point is an important one. Data on its own won’t spit out answers; theory needs to progress as well. That’s where Watts’s work comes in. 

The Internet is transforming how researchers think of the “lab” and enabling new kinds of experiments across the social sciences. Those experiments will be critical in helping us collectively make sense of the huge amounts of data we’re now generating. And those huge data sets will help inform the direction of Watts’s and others’ experiments.

The value of big data isn’t simply in the answers it provides, but rather in the questions it suggests that we ask.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.