Skip to Content
Computing

You’re very easy to track down, even when your data has been anonymized

A new study shows you can be easily re-identified from almost any database, even when your personal details have been stripped out.
Faces blurred out
Faces blurred out
Faces blurred outGetty

The data trail we leave behind us grows all the time. Most of it isn’t that interesting—the takeout meal you ordered, that shower head you bought online—but some of it is deeply personal: your medical diagnoses, your sexual orientation, or your tax records.

The most common way public agencies protect our identities is anonymization. This involves stripping out obviously identifiable things such as names, phone numbers, email addresses, and so on. Data sets are also altered to be less precise, columns in spreadsheets are removed, and “noise” is introduced to the data. Privacy policies reassure us that this means there’s no risk we could be tracked down in the database.

However, a new study in Nature Communications suggests this is far from the case.

Researchers from Imperial College London and the University of Louvain have created a machine-learning model that estimates exactly how easy individuals are to reidentify from an anonymized data set. You can check your own score here, by entering your zip code, gender, and date of birth.

On average, in the US, using those three records, you could be correctly located in an “anonymized” database 81% of the time. Given 15 demographic attributes of someone living in Massachusetts, there’s a 99.98% chance you could find that person in any anonymized database.

“As the information piles up, the chances it isn’t you decrease very quickly,” says Yves-Alexandre de Montjoye, a researcher at Imperial College London and one of the study’s authors.

The tool was created by assembling a database of 210 different data sets from five sources, including the US Census. The researchers fed this data into a machine-learning model, which learned which combinations are more nearly unique and which are less so, and then assigns the probability of correct identification.

This isn’t the first study to show how easy it is to track down individuals from anonymized databases. A paper back in 2007 showed that just a few movie ratings on Netflix can identify a person as easily as a Social Security number, for example. However, it shows just how far current anonymization practices have fallen behind our ability to break them. The fact that the data set is incomplete does not protect people’s privacy, says de Montjoye.

It isn’t all bad news. These same reidentification techniques were used by journalists working at the New York Times earlier this year to expose Donald Trump’s tax returns from 1985 to 1994. However, the same method could be used by someone looking to commit ID fraud or obtain information for blackmail purposes.

“The issue is that we think when data has been anonymized it’s safe. Organizations and companies tell us it’s safe, and this proves it is not,” says de Montjoye.

For peace of mind, companies should be using differential privacy, a complex mathematical model that lets organizations share aggregate data about user habits while protecting an individual’s identity, argues Charlie Cabot, research lead at the privacy engineering firm Privitar.

The technique will get its first major test next year: it’s being used to secure the US Census database.

Deep Dive

Computing

Russia is risking the creation of a “splinternet”—and it could be irreversible

If Russia disconnects from—or is booted from— the internet’s governing bodies, the internet may never be the same again for any of us.

Conceptual illustration of quantum computing circuity, in multiple colors
Conceptual illustration of quantum computing circuity, in multiple colors

Quantum computing has a hype problem

Quantum computing startups are all the rage, but it’s unclear if they’ll be able to produce anything of use in the near future.

winning team for Pwn2own 2022
winning team for Pwn2own 2022

These hackers showed just how easy it is to target critical infrastructure

Two Dutch researchers have won a major hacking championship by hitting the software that runs the world’s power grids, gas pipelines, and more. It was their easiest challenge yet.

child outside a destroyed residential building in Kiev
child outside a destroyed residential building in Kiev

Russia hacked an American satellite company one hour before the Ukraine invasion

The attack on Viasat showcases cyber’s emerging role in modern warfare.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.