Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

Data analytics are being used to implement a subtle form of discrimination, while anonymous data sets can be mined to reveal health data and other private information, a Microsoft researcher warned this morning at MIT Technology Review’s EmTech conference.

Kate Crawford, principal researcher at Microsoft Research, argued that these problems could be addressed with new legal approaches to the use of personal data.

In a new paper, she and a colleague propose a system of “due process” that would give people more legal rights to understand how data analytics are used in determinations made against them, such as denial of health insurance or a job. “It’s the very start of a conversation about how to do this better,” Crawford, who is also a visiting professor at the MIT Center for Civic Media, said in an interview before the event. “People think ‘big data’ avoids the problem of discrimination, because you are dealing with big data sets, but in fact big data is being used for more and more precise forms of discrimination—a form of data redlining.”

During her talk this morning, Crawford added that with big data, “you will never know what those discriminations are, and I think that’s where the concern begins.”

Health data is particularly vulnerable, the researcher says. Search terms for disease symptoms, online purchases of medical supplies, and even the RFID tags on drug packaging can provide websites and retailers with information about a person’s health.

As Crawford and Jason Schultz, a professor at New York University Law School, wrote in their paper: “When these data sets are cross-referenced with traditional health information, as big data is designed to do, it is possible to generate a detailed picture about a person’s health, including information a person may never have disclosed to a health provider.”

And a recent Cambridge University study, which Crawford alluded to during her talk, found that “highly sensitive personal attributes”— including sexual orientation, personality traits, use of addictive substances, and even parental separation—are highly predictable by analyzing what people click on to indicate they “like” on Facebook. The study analyzed the “likes” of 58,000 Facebook users.

Similarly, purchasing histories, tweets, and demographic, location, and other information gathered about individual Web users, when combined with data from other sources, can result in new kinds of profiles that an employer or landlord might use to deny someone a job or an apartment.

In response to such risks, the paper’s authors propose a legal framework they call “big data due process.” Under this concept, a person who has been subject to some determination—whether denial of health insurance, rejection of a job or housing application, or an arrest—would have the right to learn how big data analytics were used.

This would entail the sorts of disclosure and cross-examination rights that are already enshrined in the legal systems of the United States and many other nations. “Before there can be greater social acceptance of big data’s role in decision-making, especially within government, it must also appear fair, and have an acceptable degree of predictability, transparency, and rationality,” the authors write.

Data analytics can also get things deeply wrong, Crawford notes. Even the formerly successful use of Google search terms to identify flu outbreaks failed last year, when actual cases fell far short of predictions. Increased flu-related media coverage and chatter about the flu in social media were mistaken for signs of people complaining they were sick, leading to the overestimates.  “This is where social media data can get complicated,” Crawford said.

And there can be more basic flaws in what data tells us. For example, after Hurricane Sandy, there were few tweets from hard-hit areas away from Manhattan. “If we start to use social media data sets to take the pulse of a nation or understand a crisis—or actually use it to deploy resources—we are getting a skewed picture of what is happening,” Crawford warned in her talk.

15 comments. Share your thoughts »

Credit: Photograph by Technology Review

Tagged: Computing, Communications, Web, Facebook, Microsoft, Twitter, EmTech2013, Microsoft Research, Kate Crawford

Reprints and Permissions | Send feedback to the editor

From the Archives

Close

Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me
×

A Place of Inspiration

Understand the technologies that are changing business and driving the new global economy.

September 23-25, 2014
Register »