Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Emerging Technology from the arXiv

A View from Emerging Technology from the arXiv

The Dizzying Data Rate Conundrum

The suffocating rate at which data is being produced in many experiments raises the question of how to store it for future generations.

  • June 8, 2009

When it is switched on later this year, the Large Hadron Collider (LHC) at CERN will smash particles together at the rate of 40 million collisions per second. It’s a process that will generate several petabytes of data per year, and one that the LHC has been set up specifically to handle. The data will be kicked, prodded, and crunched before being analyzed and eventually released into the community as a scientific paper for publication.

That’s its short-term fate. The question is what to do with the data in the long term. Should it be archived somewhere and kept for eternity, and if so, how (and why)?

The LHC is emblematic of a broader problem in science, say André Holzner and his buddies at CERN. That problem is a rapidly growing body of data from increasingly sophisticated experiments. Holzner and co say that an increasingly pressing problem is to understand how this data is being kept in disparate facilities around the world, so that future repositories can be designed to do the job in future.

With that in mind, and with generous funding from the European Union, they’ve questioned over 1,000 high-energy physicists linked to CERN about these questions and published the results on the arXiv.

There seems to be general agreement that data preservation is hugely important. But strangely, there is less agreement over what sort of data should be stored–for example, whether to preserve the raw data itself or some higher-level analysis of it. Stranger still is the broad range of opinion over why the data should be kept at all. Only 60 percent of respondents think that the data should be kept so that conclusions can be checked in future.

Clearly, the broad concern over the issue is matched only by the widespread befuddlement over what to do about it.

Which spells bad news for CERN and other data producers. CERN is about to switch on one of the greatest data fire hoses the world has ever seen. If there is to be any multilateral agreement over what to do in the long term with the data it and other projects produce, the discussions need to be settled sooner rather than later.

Ref: arxiv.org/abs/0906.0485: First results from the PARSE.Insight project: HEP survey on data preservation, re-use and (open) access

Become an MIT Technology Review Insider for in-depth analysis and unparalleled perspective.

Subscribe today
Want more award-winning journalism? Subscribe and become an Insider.
  • Insider Plus {! insider.prices.plus !}* Best Value

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Bimonthly print magazine (6 issues per year)

    Bimonthly digital/PDF edition

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special interest publications

    Discount to MIT Technology Review events

    Special discounts to select partner offerings

    Ad-free web experience

  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning print magazine, unlimited online access plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Bimonthly print magazine (6 issues per year)

  • Insider Online Only {! insider.prices.online !}*

    {! insider.display.menuOptionsLabel !}

    Unlimited online access including articles and video, plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

/3
You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.