It’s time for a Bill of Data Rights

As the US Senate debates a new bill, a data-governance expert presents a plan to protect liberty and freedom in the digital age.

Martin Tisnearchive page

December 14, 2018

Ms. Tech

It is the summer of 2023, and Rachel is broke. Sitting in a bar one evening, browsing job ads on her phone, she gets a text message. Researchers doing a study on liver function have gotten her name from the bar’s loyalty program—she’d signed up to get a happy-hour discount on nachos. They’re offering $50 a week for access to her phone’s health data stream and her bar tab for the next three months.

At first, Rachel is annoyed at the intrusion. But she needs the money. So she nods at her phone—a subtle but distinct gesture of assent that is as legally binding as a signature—and goes back to her nachos and her job search.

But as the summer wears on, Rachel can’t help noticing that she’s getting rejection after rejection from employers, while her friends, one by one, line up jobs. Unbeknownst to her—because she didn’t read the fine print—some data from the research study, along with her liquor purchase history, has made it to one of the two employment agencies that have come to dominate the market. Every employer who screens her application with the agency now sees that she’s been profiled as a “depressed unreliable.” No wonder she can’t get work. But even if she could discover that she’s been profiled in this way, what recourse does she have?

A day in the life

If you’re reading this, chances are that, like Rachel, you created an enormous amount of data today—by reading or shopping online, tracking your workout, or just going somewhere with your phone in your pocket. Some of this data you created on purpose, but a great deal of it was created by your actions without your knowledge, let alone consent.

The proliferation of data in recent decades has led some reformers to a rallying cry: “You own your data!” Eric Posner of the University of Chicago, Eric Weyl of Microsoft Research, and virtual-reality guru Jaron Lanier, among others, argue that data should be treated as a possession. Mark Zuckerberg, the founder and head of Facebook, says so as well. Facebook now says that you “own all of the contact and information you post on Facebook” and “can control how it is shared.” The Financial Times argues that “a key part of the answer lies in giving consumers ownership of their own personal data.” In a recent speech, Tim Cook, Apple’s CEO, agreed, saying, “Companies should recognize that data belongs to users.”

“Data ownership” not only does not fix existing problems; it creates new ones.

This essay argues that “data ownership” is a flawed, counterproductive way of thinking about data. It not only does not fix existing problems; it creates new ones. Instead, we need a framework that gives people rights to stipulate how their data is used without requiring them to take ownership of it themselves. The Data Care Act, a bill introduced on December 12 by US senator Brian Schatz, a Democrat from Hawaii, is a good initial step in this direction (depending on how the fine print evolves). As Doug Jones, a Democratic senator from Alabama who is one of the bills cosponsors, said, “The right to online privacy and security should be a fundamental one.”

The notion of “ownership” is appealing because it suggests giving you power and control over your data. But owning and “renting” out data is a bad analogy. Control over how particular bits of data are used is only one problem among many. The real questions are questions about how data shapes society and individuals. Rachel’s story will show us why data rights are important and how they might work to protect not just Rachel as an individual, but society as a whole.

Tomorrow never knows

To see why data ownership is a flawed concept, first think about this article you’re reading. The very act of opening it on an electronic device created data—an entry in your browser’s history, cookies the website sent to your browser, an entry in the website’s server log to record a visit from your IP address. It’s virtually impossible to do anything online—reading, shopping, or even just going somewhere with an internet-connected phone in your pocket—without leaving a “digital shadow” behind. These shadows cannot be owned—the way you own, say, a bicycle—any more than can the ephemeral patches of shade that follow you around on sunny days.

Your data on its own is not very useful to a marketer or an insurer. Analyzed in conjunction with similar data from thousands of other people, however, it feeds algorithms and bucketizes you (e.g., “heavy smoker with a drink habit” or “healthy runner, always on time”). If an algorithm is unfair—if, for example, it wrongly classifies you as a health risk because it was trained on a skewed data set or simply because you’re an outlier—then letting you “own” your data won’t make it fair. The only way to avoid being affected by the algorithm would be to never, ever give anyone access to your data. But even if you tried to hoard data that pertains to you, corporations and governments with access to large amounts of data about other people could use that data to make inferences about you. Data is not a neutral impression of reality. The creation and consumption of data reflects how power is distributed in society.

You could, of course, choose to keep all your data private to avoid its being used against you. But if you follow that strategy, you may end up missing out on the benefits of sometimes making your data available. For example, when you’re driving, navigating by smartphone app, you share real-time, anonymized information that then translates into precise traffic conditions (e.g., it will take you 26 minutes to drive to work this morning if you leave at 8:16 a.m.). That data is individually private—strangers can’t see where you are—but cumulatively, it’s a collective good.

The creation and consumption of data reflects how power is distributed in society.

This example shows how data in the aggregate can be fundamentally different in character from the individual bits and bytes that make it up. Even well-intentioned arguments about data ownership assume that if you regulate personal data well, you’ll get good societal outcomes. And that’s just not true.

That’s why many of the problems about unfair uses of data can’t be solved by controlling who has access to it. For example, in certain US jurisdictions, judges use an algorithmically generated “risk score” in making bail and sentencing decisions. These software programs predict the likelihood that a person will commit future crimes. Imagine that such an algorithm says you have a 99% chance of committing another crime or missing a future bail appointment because people demographically similar to you are often criminals or bail jumpers. That may be unfair in your case, but you can’t “own” your demographic profile or your criminal record and refuse to let the legal system see it. Even if you deny consent to “your” data being used, an organization can use data about other people to make statistical extrapolations that affect you. This example underscores the point that data is about power—people accused of or convicted of crimes generally have less power than those making bail and sentencing decisions.

Similarly, existing solutions to unfair uses of data often involve controlling not who has access to data, but how data is used. Under the US Affordable Care Act, for instance, health insurance companies can’t deny or charge more for coverage just because someone has a preexisting condition. The government doesn’t tell the companies they can’t hold that data on patients; it just says they must ignore it. A person doesn’t “own” the fact that she has diabetes—but she can have the right not to be discriminated against because of it.

“Consent” is often mentioned as a basic principle that should be respected with regard to the use of data. But absent government regulation to prevent health insurance companies from using data about preexisting conditions, individual consumers lack the ability to withhold consent. The reason they lack that ability is that insurance companies have more power than they do. Consent, to put it bluntly, does not work.

Data rights should protect privacy, and should account for the fact that privacy is not a reactive right to shield oneself from society. It is about freedom to develop the self away from commerce and away from governmental control. But data rights are not only about privacy. Like other rights—to freedom of speech, for example—data rights are fundamentally about securing a space for individual freedom and agency while participating in modern society. The details should follow from basic principles, as with America’s existing Bill of Rights. Too often, attempts to enunciate such principles get bogged down in the weeds of things like “opt-in consent models,” which may fast become outdated.

Clear, broad principles are needed around the world, in ways that fit into the legal systems of individual countries. In the US, existing constitutional provisions—like equal protection under the law and prohibitions against “unreasonable searches and seizures”—are insufficient. It is, for instance, difficult to argue that continuous, persistent tracking of a person’s movements in public is a search. And yet such surveillance is comparable in its intrusive effects to an “unreasonable search.” It’s not enough to hope that courts will come up with favorable interpretations of 18th-century language applied to 21st-century technologies.

A Bill of Data Rights should include rights like these:

The right of the people to be secure against unreasonable surveillance shall not be violated.
No person shall have his or her behavior surreptitiously manipulated.
No person shall be unfairly discriminated against on the basis of data.

These are by no means all the provisions a durable and effective bill would need. They are meant to be a beginning, and examples of the sort of clarity and generality such a document needs.

To make a difference for people like Rachel, a Bill of Data Rights will need a new set of institutions and legal instruments to safeguard the rights it lays out. The state must protect and delimit those rights, which is what the European General Data Protection Regulation (GDPR) of 2018 has started to do. The new data-rights infrastructure should go further and include boards, data cooperatives (which would enable collective action and advocate on behalf of users), ethical data-certification schemes, specialized data-rights litigators and auditors, and data representatives who act as fiduciaries for members of the general public, able to parse the complex impacts that data can have on life.

With a little help from my friends

What does the future look like without data-rights protection? Let’s return to Rachel’s fruitless job search. Her characterization as a “depressed unreliable” may or not be correct. Perhaps the algorithm just made a mistake: Rachel is perfectly healthy and fit for work. But as algorithms get better and draw on larger data sets, it becomes less and less likely that they will be inaccurate. Still, would that make them any more fair?

What if Rachel was a little bit depressed? A good job might have helped her overcome a bout of depression. But instead, her profile fast becomes a self-fulfilling prophecy. Unable to get a job, she indeed grows depressed and unreliable.

Now consider Rachel’s dilemma in a world with stronger data-rights protections. She agrees to the liver-function study, but as she scans its terms and conditions, an algorithmic data representative flags the issue, somewhat the way algorithmic gatekeepers protect against computer viruses and spam. After the issue is flagged, it is referred to a team of auditors who report to the local data-rights board (in this hypothetical future). The team examines the algorithm used by the study and discovers the link to the employment profiling. The board determines that Rachel has been profiled and that, thanks to a newly established interpretation of the Employment Equalities Act and the Data Protection Bill (passed in 2022), such profiling is clearly illegal. Rachel doesn’t have to take action herself—the board sanctions the researchers for abusive data practices.

Come together

An incremental erosion of privacy is tough to notice and does little harm to anyone—just as trace amounts of carbon dioxide are scarcely detectable and do no environmental harm to speak of.

As I’ve argued, “data ownership” is a category error with pernicious consequences: you can’t really own most of your data, and even if you could, it often wouldn’t protect you from unfair practices. Why, then, is the idea of data ownership such a popular solution?

The answer is that policy experts and technologists too often tacitly accept the concept of “data capitalism.” They see data either as a source of capital (e.g., Facebook uses data about me to target ads) or as a product of labor (e.g., I should be paid for the data that is produced about me). It is neither of these things. Thinking of data as we think of a bicycle, oil, or money fails to capture how deeply relationships between citizens, the state, and the private sector have changed in the data era. A new paradigm for understanding what data is—and what rights pertain to it—is urgently needed if we are to forge an equitable 21st-century polity.

This paradigm might usefully draw on environmental analogies—thinking of data as akin to greenhouse gases or other externalities, where small bits of pollution, individually innocuous, have calamitous collective consequences. Most people value their own privacy, just as they value the ability to breathe clean air. An incremental erosion of privacy is tough to notice and does little harm to anyone—just as trace amounts of carbon dioxide are scarcely detectable and do no environmental harm to speak of. But in the aggregate, just as large amounts of greenhouse gases cause fundamental damage to the environment, a massive shift in the nature of privacy causes fundamental damage to the social fabric.

To understand this damage, we need a new paradigm. This paradigm must capture the ways in which an ambient blanket of data changes our relationships with one another—as family, as friends, as coworkers, as consumers, and as citizens. To do so, this paradigm must be grounded in a foundational understanding that people have data rights and that governments must safeguard those rights.

There will be challenges along the way. Neither the technical nor the legal infrastructures around data rights are straightforward. It will be difficult to come to a consensus about what rights exist. It will be even tougher to implement new legislation and regulations to protect those rights. As in the current debate in the US Congress, interest groups and industry lobbyists will fight over important details. The balances struck in different countries will be different. But without a strong and vigorous data-rights infrastructure, open democratic society cannot survive.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.