You’re very easy to track down, even when your data has been anonymized

A new study shows you can be easily re-identified from almost any database, even when your personal details have been stripped out.

Charlotte Jeearchive page

July 23, 2019

Faces blurred outGetty

The data trail we leave behind us grows all the time. Most of it isn’t that interesting—the takeout meal you ordered, that shower head you bought online—but some of it is deeply personal: your medical diagnoses, your sexual orientation, or your tax records.

The most common way public agencies protect our identities is anonymization. This involves stripping out obviously identifiable things such as names, phone numbers, email addresses, and so on. Data sets are also altered to be less precise, columns in spreadsheets are removed, and “noise” is introduced to the data. Privacy policies reassure us that this means there’s no risk we could be tracked down in the database.

However, a new study in Nature Communications suggests this is far from the case.

Researchers from Imperial College London and the University of Louvain have created a machine-learning model that estimates exactly how easy individuals are to reidentify from an anonymized data set. You can check your own score here, by entering your zip code, gender, and date of birth.

On average, in the US, using those three records, you could be correctly located in an “anonymized” database 81% of the time. Given 15 demographic attributes of someone living in Massachusetts, there’s a 99.98% chance you could find that person in any anonymized database.

“As the information piles up, the chances it isn’t you decrease very quickly,” says Yves-Alexandre de Montjoye, a researcher at Imperial College London and one of the study’s authors.

The tool was created by assembling a database of 210 different data sets from five sources, including the US Census. The researchers fed this data into a machine-learning model, which learned which combinations are more nearly unique and which are less so, and then assigns the probability of correct identification.

This isn’t the first study to show how easy it is to track down individuals from anonymized databases. A paper back in 2007 showed that just a few movie ratings on Netflix can identify a person as easily as a Social Security number, for example. However, it shows just how far current anonymization practices have fallen behind our ability to break them. The fact that the data set is incomplete does not protect people’s privacy, says de Montjoye.

It isn’t all bad news. These same reidentification techniques were used by journalists working at the New York Times earlier this year to expose Donald Trump’s tax returns from 1985 to 1994. However, the same method could be used by someone looking to commit ID fraud or obtain information for blackmail purposes.

“The issue is that we think when data has been anonymized it’s safe. Organizations and companies tell us it’s safe, and this proves it is not,” says de Montjoye.

For peace of mind, companies should be using differential privacy, a complex mathematical model that lets organizations share aggregate data about user habits while protecting an individual’s identity, argues Charlie Cabot, research lead at the privacy engineering firm Privitar.

The technique will get its first major test next year: it’s being used to secure the US Census database.

Deep Dive

Computing

Inside the hunt for new physics at the world’s largest particle collider

The Large Hadron Collider hasn’t seen any new particles since the discovery of the Higgs boson in 2012. Here’s what researchers are trying to do about it.

Dan Garistoarchive page

How ASML took over the chipmaking chessboard

MIT Technology Review sat down with outgoing CTO Martin van den Brink to talk about the company’s rise to dominance and the life and death of Moore’s Law.

How Wi-Fi sensing became usable tech

After a decade of obscurity, the technology is being used to track people’s movements.

Meg Duffarchive page

Algorithms are everywhere

Three new books warn against turning into the person the algorithm thinks you are.

Bryan Gardinerarchive page

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

You’re very easy to track down, even when your data has been anonymized

Deep Dive

Computing

Inside the hunt for new physics at the world’s largest particle collider

How ASML took over the chipmaking chessboard

How Wi-Fi sensing became usable tech

Algorithms are everywhere

Stay connected

Get the latest updates from
MIT Technology Review

The latest iteration of a legacy

Advertise with MIT Technology Review

About

Help

Deep Dive

Computing

Inside the hunt for new physics at the world’s largest particle collider

How ASML took over the chipmaking chessboard

How Wi-Fi sensing became usable tech

Algorithms are everywhere

Stay connected

Get the latest updates fromMIT Technology Review

Get the latest updates from
MIT Technology Review