How a Box Could Solve the Personal Data Conundrum

Software known as a Databox could one day both safeguard your personal data and sell it, say computer scientists.

Emerging Technology from the arXivarchive page

January 26, 2015

One of the trickiest issues for anyone with an online presence is how to manage personal information. Almost any form of surfing leaves a data trail that advertisers, social networks and so on can use to their advantage.

This data gold rush is largely driven by the dominant online business model in which advertising is the primary source of revenue. The gathered data can sometimes be processed in a way that individuals find useful. But this information can also be abused, sometimes with severe consequences, as anyone who has suffered identity theft will testify.

What’s more, information can fall into the hands of companies almost by default, regardless of the wishes of the owner. For example, Google scans the contents of all e-mails on its Gmail service.

Of course, people can choose to use a different service if they object to this. But they will find it much harder to avoid other people with Gmail accounts. Send them an e-mail and Google will scan the contents anyway.

The options for avoiding these scenarios are not good. The ultimate possibility is opting out of the online world but that is simply not viable for most people. So what to do?

Today, Hamed Haddadi from Queen Mary University of London and a few pals from the University of Cambridge put forward their own manifesto for solving this problem. These guys say the solution is a piece of software that collects personal data and then manages how the information is made available to third parties.

Haddadi and co call this software a Databox and suggest that it could kickstart a new generation of business models in which both individuals and companies profit from the personal data revolution.

The basic idea behind the Databox is that it is a networked service that collates personal information from all of your devices and can also make that data available to organizations that the owner allows. This piece of software must have a number of important attributes.

First, it must be trusted by the individual who uses it. That’s a big ask. The Databox will gather information about browsing habits, buying behavior, financial details such as bank statements, e-mail and social media contacts as well as calendar entries and so on. To allow all this all to be stored in a single online repository will require remarkable act of faith for most people. Ensuring the security of a Databox is therefore a crucial requirement.

But the owner of the data is not the only one who needs to share this trust. Any company or organization that accesses the data must also have faith that it is reliable, something that will require third-party auditors who can verify that the system is operating is expected.

As well as gathering personal information, the Databox must allow controlled access to it. So third parties must be able to selectively query any information that the user allows them access to. At the same time, the user must be able to control how this data is accessed and be able to change the settings when necessary.

Finally, there must be incentives for all those involved to use the Databox. For example, ordinary people may be more likely to use the service if it contains a mechanism that allows third parties to pay for using the data.

It may also provide an incentive for third parties by reducing their exposure to sensitive data, such as health records. For example, an organization may need access to health data but not want the cost and responsibility of storing it securely. “An analogy might be the way online stores use third-party payment services such as PayPal or Google Wallet to avoid the overhead of Payment Card Infrastructure compliance for processing credit card fees,” say Haddadi and co.

That’s an interesting idea but one that faces numerous hurdles before it can come into being. Not least of these is whether there will be sufficient demand for a service like this and whether it can pay for itself. Then there are the challenges of dealing with widely differing data sources and the problem of getting access to proprietary devices such as iPhones.

It may be that governments will have a role to play in creating a regulatory landscape in which this kind of service can flourish. But for the moment, the future is far from certain.

That’s not stopping these guys from dreaming. Many of the authors of this manifesto are involved in a highly ambitious project called Nymote, which is building a software infrastructure that allows people to take control of their digital lives—a Databox in all but name.

It’s an area that is certainly worth watching. After the revelations in recent years about government-sponsored snooping, it’s not worth betting against the possibility of a Databox-like service becoming ubiquitous.

Ref: arxiv.org/abs/1501.04737 : Personal Data: Thinking Inside the Box

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.