MIT researchers have found that the dates and locations of four purchases are enough to identify 90 percent of the people in a data set recording three months’ worth of credit card transactions by 1.1 million users.
When the researchers also considered coarse-grained information about the prices of purchases, just three data points were enough to identify an even larger percentage of people in the data set. That means that someone with copies of just three of your recent receipts—or one receipt, one Instagram photo of you having coffee with friends, and one tweet about the phone you just bought—would have a 94 percent chance of extracting your credit card records from those of a million other people. This is true, the researchers say, even in cases where no one in the data set is identified by name, address, credit card number, or anything else that we typically think of as personal information.
The data set the researchers analyzed included the names and locations of the shops at which purchases took place, the dates on which they took place, and the purchase amounts. Purchases made with the same credit card were all tagged with the same random identification number.
For each identification number—each customer in the data set—the researchers selected purchases at random, then determined how many other customers’ purchase histories contained the same data points. They varied the number of data points per customer over a range from two to five. Without price information, two data points were still sufficient to identify more than 40 percent of the people in the data set. At the other extreme, five points with price information were enough to identify almost everyone.
Preserving anonymity in large data sets is a pressing concern because public and private entities alike see aggregated digital data as a source of novel insights. Retailers studying anonymized credit card histories could certainly learn something about the tastes of their customers, but economists might also learn something about the relationship of, say, inflation or consumer spending to other economic factors.
Lead researcher Yves-Alexandre de Montjoye is a grad student in Sandy Pentland’s Human Dynamics Laboratory at the Media Lab. “Sandy and I do really believe that this data has great potential and should be used,” he says. “We, however, need to be aware and account for the risks of reidentification.”
A horrifying new AI app swaps women into porn videos with a click
Deepfake researchers have long feared the day this would arrive.
We can’t afford to stop solar geoengineering research
It is the wrong time to take this strategy for combating climate change off the table.
Meet Altos Labs, Silicon Valley’s latest wild bet on living forever
Funders of a deep-pocketed new "rejuvenation" startup are said to include Jeff Bezos and Yuri Milner.
The new version of GPT-3 is much better behaved (and should be less toxic)
OpenAI has trained its flagship language model to follow instructions, making it spit out less unwanted text—but there's still a way to go.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.