Who Owns Big Data?

Aggregate data and decision making are being hoarded by a few technology companies with powerful data infrastructure. Does it have to be this way? Or could we create a future in which this data infrastructure is available for use by anyone in the world?

Michael Nielsenarchive page

January 5, 2015

Provided byBBVA

Since the early days of modern computing, science-fiction authors and other visionaries have been fantasizing about a database that could contain all of the world’s knowledge. This idea is now moving out of the realm of fantasy. A small number of technology companies are engaged in serious efforts to build databases that really will contain much of human knowledge. Facebook, for example, has mapped out the social connections among more than a billion people, and Google aspires to digitize all the books in the world.

It has become profitable to build a database containing the entire world’s knowledge. The few for-profit companies that own the data and the tools to mine it – the data infrastructure – possess great power to understand and predict the world. But could we create a similarly powerful public data infrastructure, a Big Data for the masses, that anyone in the world could access?

We are all better off when dominant technology platforms are operated in the public interest. That is what happened with both the Internet and the Web, and that is why those platforms have been such a powerful spur to innovation. For example, when the computer industry moved from a proprietary platform (Windows) to an open platform (the Web) not owned by anyone in particular, the result was a resurgence of software innovation.

View the full article provided by BBVA OpenMind:

• Who Owns Big Data

A public data infrastructure would require a large cluster of computers, and some entity would have to pay for this. That entity would be the owner. If the owner were a for-profit company, that company would always be tempted to use its ownership to co-opt innovation. But the owner wouldn’t necessarily have to be a for-profit company. The owner could also be a not-for-profit company, a government, or a network of individual contributors like those who create open-source software.

Initiatives such as data.gov, the U.S. government portal, will make a very important contribution to a public data infrastructure, but they will not be the core of a powerful, broad-ranging public data infrastructure. And, although networks of contributors have developed public data infrastructure – Wikipedia and OpenStreetMap are two examples – Big Data involves larger organizations and budgets that a network of individuals can’t provide. Open-source software projects often start as hobby projects or as byproducts of the work of for-profit companies, and all they cost is the hobbyist’s time. Building effective data infrastructure requires time, money, and a long-term commitment to providing reliable service, effective documentation, and support. That’s a much bigger barrier to entry.

A network of not-for-profit organizations could develop a public data infrastructure. Unlike for-profit companies, these entities would not be tempted to co-opt innovation. They could commit to encouraging innovation and helping it to flourish. But the traditional mechanisms for funding not-for-profits – foundations, grant agencies, and similar philanthropic sources – would present significant obstacles.

One obstacle is that a not-for-profit isn’t free to “pivot” – to change course along the way if it discovers its original business plan is flawed. True innovators don’t start out knowing what will work; they discover along the way what will work, and their initial plans usually need to change radically. Many technology investors have accepted this tendency to pivot. But the foundations that fund not-for-profits don’t take kindly to it.

Another obstacle is the risk-averse nature of not-for-profit funding. In the for-profit world, it’s understood that technology startups are extremely risky. It’s common for them to experience catastrophic failures on the road to success. But the not-for-profit world views such success rates as disastrous. And if failure means that funding will be harder to obtain in the future, a not-for-profit will tend to pursue projects that carry little risk of failure.

Not-for-profit funding of a public data infrastructure might still be possible, though. Projects such as Wikipedia and OpenStreetMap have found ways to be successful, and their success could inspire funders to adopt a more experimental and high-risk approach to funding technological innovation, an approach that could speed up the development of a powerful public data infrastructure.

Read the full article here.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.