Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

The reams of data that many modern businesses collect—dubbed “big data”—can provide powerful insights. It is the key to Netflix’s recommendation engines, Facebook’s social ads, and even Amazon’s methods for speeding up the new Web browser, Silk, which comes with its new Fire tablet.

But big data is like any powerful tool. Using it carelessly can have dangerous results.

A new paper presented at a recent Symposium on the Dynamics of the Internet and Society spells out the reasons that businesses and academics should proceed with caution. While privacy invasions—both deliberate and accidental—are obvious issues, the paper also warns that data can easily be incomplete and distorted.

“With big data comes big responsibilities,” says Kate Crawford, an associate professor at the University of New South Wales, who was involved with the work. “There’s been the emergence of a philosophy that big data is all you need,” she adds. “We would suggest that, actually, numbers don’t speak for themselves.”

Crawford’s paper, written with Microsoft senior researcher Danah Boyd, illustrates the ways that big data sets can fall down, particularly when used to make claims about people’s behavior. “Big data sets are never complete,” Crawford says. For example, researchers often study Facebook to analyze people’s social relationships, using connections made through the social network as a stand-in for real-world ties. But it’s common for Facebook to show a distorted picture of people’s closest social relationships, such as with parents, live-in romantic partners, or friends seen daily. “Facebook is not the world,” Crawford says.

Google is a poster child for the power of data. The company has transformed a massive amount of information, gathered through its search engine, into a commanding ad network and powerful role as the gatekeeper of much of the world’s information.

At a conference on Knowledge Discovery and Data Mining in August, I watched Google’s director of research, Peter Norvig, demonstrate the true power of a large data set, using the example of machine translation. Norvig showed that training algorithms on very large data sets, like those it has collected from the many Web pages it crawls that are available in multiple languages, can produce dramatic results. With enough data, Norvig said, even the worst algorithm performs far better than what can be achieved with a smaller data set.

5 comments. Share your thoughts »

Credit: Spotify

Tagged: Computing, Google, Microsoft, privacy, data

Reprints and Permissions | Send feedback to the editor

From the Archives

Close

Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me