Differential privacy

Angela Chen archive page

April 2, 2020

A technique to measure the privacy of a crucial data set.

Differential privacy

Why it matters
It is increasingly difficult for the US Census Bureau to keep the data it collects private. A technique called differential privacy could solve that problem, build trust, and also become a model for other countries.
Key players
US Census Bureau, Apple, Facebook
Availability
Its use in the 2020 US Census will be the biggest-scale application yet.

In 2020, the US government has a big task: collect data on the country’s 330 million residents while keeping their identities private. The data is released in statistical tables that policymakers and academics analyze when writing legislation or conducting research. By law, the Census Bureau must make sure that it can’t lead back to any individuals.

But there are tricks to “de-anonymize” individuals, especially if the census data is combined with other public statistics.

So the Census Bureau injects inaccuracies, or “noise,” into the data. It might make some people younger and others older, or label some white people as black and vice versa, while keeping the totals of each age or ethnic group the same. The more noise you inject, the harder de-anonymization becomes.

Differential privacy is a mathematical technique that makes this process rigorous by measuring how much privacy increases when noise is added. The method is already used by Apple and Facebook to collect aggregate data without identifying particular users.

But too much noise can render the data useless. One analysis showed that a differentially private version of the 2010 Census included households that supposedly had 90 people.

If all goes well, the method will likely be used by other federal agencies. Countries like Canada and the UK are watching too.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.