Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Data Sets Not So Anonymous

New research shows that it’s surprisingly easy to identify individuals from credit card metadata.

MIT researchers have found that the dates and locations of four purchases are enough to identify 90 percent of the people in a data set recording three months’ worth of credit card transactions by 1.1 million users.

Yves-Alexandre de Montjoye

When the researchers also considered coarse-grained information about the prices of purchases, just three data points were enough to identify an even larger percentage of people in the data set. That means that someone with copies of just three of your recent receipts—or one receipt, one Instagram photo of you having coffee with friends, and one tweet about the phone you just bought—would have a 94 percent chance of extracting your credit card records from those of a million other people. This is true, the researchers say, even in cases where no one in the data set is identified by name, address, credit card number, or anything else that we typically think of as personal information.

This story is part of the May/June 2015 Issue of the MIT News Magazine
See the rest of the issue
Subscribe

The data set the researchers analyzed included the names and locations of the shops at which purchases took place, the dates on which they took place, and the purchase amounts. Purchases made with the same credit card were all tagged with the same random identification number.

For each identification number—each customer in the data set—the researchers selected purchases at random, then determined how many other customers’ purchase histories contained the same data points. They varied the number of data points per customer over a range from two to five. Without price information, two data points were still sufficient to identify more than 40 percent of the people in the data set. At the other extreme, five points with price information were enough to identify almost everyone.

Preserving anonymity in large data sets is a pressing concern because public and private entities alike see aggregated digital data as a source of novel insights. Retailers studying anonymized credit card histories could certainly learn something about the tastes of their customers, but economists might also learn something about the relationship of, say, inflation or consumer spending to other economic factors.

Lead researcher Yves-Alexandre de Montjoye is a grad student in Sandy Pentland’s Human Dynamics Laboratory at the Media Lab. “Sandy and I do really believe that this data has great potential and should be used,” he says. “We, however, need to be aware and account for the risks of reidentification.”

Get stories like this before anyone else with First Look.

Subscribe today
Already a Premium subscriber? Log in.

Uh oh–you've read all of your free articles for this month.

Insider Premium
$179.95/yr US PRICE

Want more award-winning journalism? Subscribe and become an Insider.
  • Insider Plus {! insider.prices.plus !}* Best Value

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Bimonthly print magazine (6 issues per year)

    Bimonthly digital/PDF edition

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special interest publications

    Discount to MIT Technology Review events

    Special discounts to select partner offerings

    Ad-free web experience

  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning print magazine, unlimited online access plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Bimonthly print magazine (6 issues per year)

  • Insider Online Only {! insider.prices.online !}*

    {! insider.display.menuOptionsLabel !}

    Unlimited online access including articles and video, plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

/
You've read all of your free articles this month. This is your last free article this month. You've read of free articles this month. or  for unlimited online access.