It’s telling that the most interesting presenter during MIT Technology Review’s EmTech session on big data last week was not really about big data at all. It was about Amazon’s Mechanical Turk, and the experiments it makes possible.
Like many other researchers, sociologist and Microsoft researcher Duncan Watts performs experiments using Mechanical Turk, an online marketplace that allows users to pay others to complete tasks. Used largely to fill in gaps in applications where human intelligence is required, social scientists are increasingly turning to the platform to test their hypotheses.
The point Watts made at EmTech was that, from his perspective, the data revolution has less to do with the amount of data available and more to do with the newly lowered cost of running online experiments.
Compare that to Facebook data scientists Eytan Bakshy and Andrew Fiore, who presented right before Watts. Facebook, of course, generates a massive amount of data, and the two spoke of the experiments they perform to inform the design of its products.
But what might have looked like two competing visions for the future of data and hypothesis testing are really two sides of the big data coin. That’s because data on its own isn’t enough. Even the kind of experiment Bakshy and Fiore discussed—essentially an elaborate A/B test—has its limits.
This is a point political forecaster and author Nate Silver discusses in his recent book The Signal and the Noise. After discussing economic forecasters who simply gather as much data as possible and then make inferences without respect for theory, he writes:
This kind of statement is becoming more common in the age of Big Data. Who needs theory when you have so much information? But this is categorically the wrong attitude to take toward forecasting, especially in a field like economics, where the data is so noisy. Statistical inferences are much stronger when backed up by theory or at least some deeper thinking about their root causes.
Bakshy and Fiore no doubt understand this, as they cited plenty of theory in their presentation. But Silver’s point is an important one. Data on its own won’t spit out answers; theory needs to progress as well. That’s where Watts’s work comes in.
The Internet is transforming how researchers think of the “lab” and enabling new kinds of experiments across the social sciences. Those experiments will be critical in helping us collectively make sense of the huge amounts of data we’re now generating. And those huge data sets will help inform the direction of Watts’s and others’ experiments.
The value of big data isn’t simply in the answers it provides, but rather in the questions it suggests that we ask.
Why China is still obsessed with disinfecting everything
Most public health bodies dealing with covid have long since moved on from the idea of surface transmission. China’s didn’t—and that helps it control the narrative about the disease’s origins and danger.
Anti-aging drugs are being tested as a way to treat covid
Drugs that rejuvenate our immune systems and make us biologically younger could help protect us from the disease’s worst effects.
These materials were meant to revolutionize the solar industry. Why hasn’t it happened?
Perovskites are promising, but real-world conditions have held them back.
A quick guide to the most important AI law you’ve never heard of
The European Union is planning new legislation aimed at curbing the worst harms associated with artificial intelligence.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.