Skip to Content

Data Discrimination Means the Poor May Experience a Different Internet

A Microsoft researcher proposes “big data due process” so citizens can learn how data analytics were used against them.
October 9, 2013

Data analytics are being used to implement a subtle form of discrimination, while anonymous data sets can be mined to reveal health data and other private information, a Microsoft researcher warned this morning at MIT Technology Review’s EmTech conference.

Kate Crawford
Data divide: Kate Crawford speaking today at the EmTech conference at MIT.

Kate Crawford, principal researcher at Microsoft Research, argued that these problems could be addressed with new legal approaches to the use of personal data.

In a new paper, she and a colleague propose a system of “due process” that would give people more legal rights to understand how data analytics are used in determinations made against them, such as denial of health insurance or a job. “It’s the very start of a conversation about how to do this better,” Crawford, who is also a visiting professor at the MIT Center for Civic Media, said in an interview before the event. “People think ‘big data’ avoids the problem of discrimination, because you are dealing with big data sets, but in fact big data is being used for more and more precise forms of discrimination—a form of data redlining.”

During her talk this morning, Crawford added that with big data, “you will never know what those discriminations are, and I think that’s where the concern begins.”

Health data is particularly vulnerable, the researcher says. Search terms for disease symptoms, online purchases of medical supplies, and even the RFID tags on drug packaging can provide websites and retailers with information about a person’s health.

As Crawford and Jason Schultz, a professor at New York University Law School, wrote in their paper: “When these data sets are cross-referenced with traditional health information, as big data is designed to do, it is possible to generate a detailed picture about a person’s health, including information a person may never have disclosed to a health provider.”

And a recent Cambridge University study, which Crawford alluded to during her talk, found that “highly sensitive personal attributes”— including sexual orientation, personality traits, use of addictive substances, and even parental separation—are highly predictable by analyzing what people click on to indicate they “like” on Facebook. The study analyzed the “likes” of 58,000 Facebook users.

Similarly, purchasing histories, tweets, and demographic, location, and other information gathered about individual Web users, when combined with data from other sources, can result in new kinds of profiles that an employer or landlord might use to deny someone a job or an apartment.

In response to such risks, the paper’s authors propose a legal framework they call “big data due process.” Under this concept, a person who has been subject to some determination—whether denial of health insurance, rejection of a job or housing application, or an arrest—would have the right to learn how big data analytics were used.

This would entail the sorts of disclosure and cross-examination rights that are already enshrined in the legal systems of the United States and many other nations. “Before there can be greater social acceptance of big data’s role in decision-making, especially within government, it must also appear fair, and have an acceptable degree of predictability, transparency, and rationality,” the authors write.

Data analytics can also get things deeply wrong, Crawford notes. Even the formerly successful use of Google search terms to identify flu outbreaks failed last year, when actual cases fell far short of predictions. Increased flu-related media coverage and chatter about the flu in social media were mistaken for signs of people complaining they were sick, leading to the overestimates.  “This is where social media data can get complicated,” Crawford said.

And there can be more basic flaws in what data tells us. For example, after Hurricane Sandy, there were few tweets from hard-hit areas away from Manhattan. “If we start to use social media data sets to take the pulse of a nation or understand a crisis—or actually use it to deploy resources—we are getting a skewed picture of what is happening,” Crawford warned in her talk.

Keep Reading

Most Popular

computation concept
computation concept

How AI is reinventing what computers are

Three key ways artificial intelligence is changing what it means to compute.

still from Embodied Intelligence video
still from Embodied Intelligence video

These weird virtual creatures evolve their bodies to solve problems

They show how intelligence and body plans are closely linked—and could unlock AI for robots.

We reviewed three at-home covid tests. The results were mixed.

Over-the-counter coronavirus tests are finally available in the US. Some are more accurate and easier to use than others.

conceptual illustration showing various women's faces being scanned
conceptual illustration showing various women's faces being scanned

A horrifying new AI app swaps women into porn videos with a click

Deepfake researchers have long feared the day this would arrive.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.