Computing

With Big Data Comes Big Responsibilities

(Page 2 of 2)

  • Wednesday, October 5, 2011
  • By Erica Naone

But Crawford and Boyd's work shows that studying large data still requires finesse. Twitter, which is commonly scrutinized for insights about people's moods, attitudes toward politics, and other aspects of daily life, presents a number of problems, the researchers say. About 40 percent of Twitter's active users sign in to listen, not to post, which, Crawford and Boyd say, suggests that posts could come from a certain type of person, rather than a random sample. They also note that few researchers have access to all Twitter posts—most use smaller samples provided by the company. Without better information about how those samples were collected, studies could arrive at skewed results, they argue.

Crawford notes that many big data sets—particularly social data—come from companies that have no obligation to support scientific inquiry. Getting access to the data might mean paying for it, or keeping the company happy by not performing certain types of studies.

The researchers add that big data can also raise serious ethical concerns.

Many times, Crawford notes, combining data from different sources can lead to unexpected results for the people involved. For example, other researchers have previously shown that they can identify individuals by using social media data in combination with supposedly anonymized behavioral data provided by companies.

Jennifer Chayes, managing director of Microsoft Research New England, says her lab has had firsthand experience with such problems. The lab wanted to run a contest for researchers to analyze a set of search data, she says, and was going over the data carefully to avoid the sorts of deanonymizing scandals that have occurred from search data releases in the past. They discovered that people often entered search terms that were personally identifying and embarrassing—such as, "Is my wife Jane Doe cheating on me?" The lab nixed the contest. Chayes says, "We began to realize how much we didn't understand about human behavior around search engines."

Handling big data sets takes almost impossible care, agrees Alessandro Acquisti, an associate professor at Carnegie Mellon who has studied the unintended information that data sets can reveal. Even public data sets raise questions, such as what to do with information that people post and then subsequently want to delete, he says.

Given the quantity of information now available on the Internet, Crawford argues, researchers need to slow down and think about the methods they use. "[The effect of the availability of big data] did shock a lot of people," she says. "And it should."

Print

Related Articles

The New Big Data

Today's big data is forcing researchers to find new techniques for knowledge discovery and data mining.

What Big Data Needs: A Code of Ethical Practices

Four key principles that companies should follow if they hope to analyze customers' data without alienating them.

The Challenges of Big Data on the Smart Grid

Installing "smart meters" and upgrading utility networks will force electricity providers to process far more information than they're accustomed to handling.

Advertisement

MAGAZINE

People Power 2.0

How civilians helped win the Libyan information war.

Sponsored Content

Technologies from National Instruments

Triggering
Learn how to configure a start trigger on a USB data acquisition device

> Click here for more National Instruments Videos <
Whitepaper

How To Measure Voltage

Voltage is the difference of electrical potential between two points of an electrical or electronic circuit, expressed in volts. It measures the potential energy of an electric field to cause an electric current in an electrical conductor.

Most measurement devices can measure voltage. Two common voltage measurements are direct current (DC) and alternating current (AC).

Learn the fundamentals of creating an AC or DC voltage measurement system. See how to properly connect the signals to your data acquisition system for accurate acquisition.

This document is part of the How-To Guide for Most Common Measurements centralized resource portal.

View full PDF > Listen to story >
Find us on Youtube

Videos

Interview with George Dyson

More

Advertisement
Advertisement
Advertisement