The Battle to Preserve An Old Accelerator’s Data

The 100 TB of unique experimental data from the Large Electron Positron collider can never be reproduced. So how can it be preserved for future generations?

Emerging Technology from the arXivarchive page

December 11, 2009

The Large Electron Positron collider (LEP), a particle acccelator on the Swiss Franco border near Geneva was one of the great wonders of the engineering world when it was completed in 1989.

The machine was built in a circular tunnel 29 kilometres in circumference and about 100 metres beneath the ground. It accelerated counter-rotating beams of electrons and positrons to energies of 45 GeV.

Initially, it smashed these beams together to create Z bosons, particles with a mass of 91 GeV. Later, the accelerator was upgraded to search, unsuccesfully as it turned out, for the elusive Higgs boson at energies of up to 209 GeV. But in 2000, it was shut down and dismantled.

Today LEP is a distant memory for most physicists who are now focused on LEP’s successor. In its huge underground tunnel, CERN has built the Large Hadron Collider, currently the world’s most powerful accelerator, which switched on last month.

But physicists have a problem with LEP’s legacy. During its lifetime, the four experiments taking data from the accelerator, generated 100 TB of information. Physicists associated with these experiemtns are still working on about 20 papers. After these are published, the data will be “frozen”.

These experiments cannot be repeated and so the data they produced are unique. There’s no question that this data must be preserved. The question is how.

Today, Andre Holzner at CERN and a few buddies, outline the efforts to preserve LEP’s data. It is currently stored on a magnetic tape system at CERN called CASTOR. The idea is that whenever there is a media upgrade, the LEP data will be transferred.

That’s not a big deal given the money and personhours to do the job: although 100 TB was a lot of data in 1989, it is a mere drop compared to the output of the LHC which will soon produce about that much every day.

More difficult to preserve is the software necessary to make sense of the data. “Clearly, data is useless without the associated software to read and analyse it,” say Holzner and co.

The problem is that computer skills are changing. While much of the original LEP software was written in Fortran, the emphasis today is on C++. How the right kind of Fortran expertise can be preserved for future generations isn’t clear.

Another problem is that much of the high-level software used to analyse the data– user-specific analysis code and plotting macros–was never stored in a central database. Instead, it was kept in personal directories which are deleted a year after somebody leaves a lab. That is now lost.

So while future researchers will be able to access the raw data, they may never know exactly how it was processed into the form that appears in scientific publications.

More worrying than any of that is the news that some of the original data have been lost, probably because of wear and tear on the storage tapes due to overuse. These data can never be restored.

Holzner and co attempt to gloss over some of these problems by saying that if there is a need to re-analyse the LEP data, “there are still some collaboration members around to do this.”

That’s not entirely re-assuring. The message is that if you want to study the LEP data, you’ve got until the current generation of collaboration members retire. After that, you’re on your own.

Ref: arxiv.org/abs/0912.1803 : Data Preservation at LEP

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.