Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Data Sets Not So Anonymous

New research shows that it’s surprisingly easy to identify individuals from credit card metadata.

MIT researchers have found that the dates and locations of four purchases are enough to identify 90 percent of the people in a data set recording three months’ worth of credit card transactions by 1.1 million users.

Yves-Alexandre de Montjoye

When the researchers also considered coarse-grained information about the prices of purchases, just three data points were enough to identify an even larger percentage of people in the data set. That means that someone with copies of just three of your recent receipts—or one receipt, one Instagram photo of you having coffee with friends, and one tweet about the phone you just bought—would have a 94 percent chance of extracting your credit card records from those of a million other people. This is true, the researchers say, even in cases where no one in the data set is identified by name, address, credit card number, or anything else that we typically think of as personal information.

This story is part of the May/June 2015 Issue of the MIT News Magazine
See the rest of the issue
Subscribe

The data set the researchers analyzed included the names and locations of the shops at which purchases took place, the dates on which they took place, and the purchase amounts. Purchases made with the same credit card were all tagged with the same random identification number.

For each identification number—each customer in the data set—the researchers selected purchases at random, then determined how many other customers’ purchase histories contained the same data points. They varied the number of data points per customer over a range from two to five. Without price information, two data points were still sufficient to identify more than 40 percent of the people in the data set. At the other extreme, five points with price information were enough to identify almost everyone.

Preserving anonymity in large data sets is a pressing concern because public and private entities alike see aggregated digital data as a source of novel insights. Retailers studying anonymized credit card histories could certainly learn something about the tastes of their customers, but economists might also learn something about the relationship of, say, inflation or consumer spending to other economic factors.

Lead researcher Yves-Alexandre de Montjoye is a grad student in Sandy Pentland’s Human Dynamics Laboratory at the Media Lab. “Sandy and I do really believe that this data has great potential and should be used,” he says. “We, however, need to be aware and account for the risks of reidentification.”

Couldn't make it to EmTech Next to meet experts in AI, Robotics and the Economy?

Go behind the scenes and check out our video
Next in MIT News
Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

/3
You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.