Skip to Content

Digital Preservation

Software
October 1, 2001

Increasingly, the record of our civilization is becoming digital, from census data to family photos. The Library of Congress alone has 35 terabytes of files. Yet rapid changes in computers and software could render this data unreadable.

Congress recently allocated the library $100 million to look for a way to preserve its files-one of the most ambitious efforts yet to tackle digital obsolescence. “With that money we’ll be able to gather the technical people and the archivists and start to develop a prototype,” says Abby Smith, preservation program officer with the Council on Library and Information Resources, which is working on the project.

Part of the challenge is that computers and software gallop ahead, while digital files remain static. The library’s current solution is to convert files to work with the updated systems every few years, but “every time you convert something, you change it,” says Jeff Rothenberg, researcher at the Rand Corporation in Santa Monica, CA. Rothenberg instead sees a solution in emulation software that can mimic a given hardware platform, allowing one computer to act like an earlier one. To demonstrate the approach’s feasibility, he created a chain of emulators linking a present-day PC to the 1949 EDSAC, one of the first computers. “I was able to run any of the original EDSAC programs that were saved on paper tape,” he says.

Ray Lorie, research fellow at IBM’s Almaden Research Center in San Jose, CA, is working on an approach that creates a digital road map of a document at the time of its creation. Write a document, say, in Adobe Premier, and the software generates a second file that describes the content and formatting of the original document using a simple code. That code would be readable by a “universal virtual computer”-an emulator that mimics, not an earlier machine, but a hypothetical, extremely simple computer. “In the future we’d only need some way of interpreting this single virtual computer,” says Lorie.

While the Library of Congress appropriation won’t solve the problem of digital preservation, it will allow for the first large-scale testing of possible solutions like Lorie’s and Rothenberg’s. “The Library of Congress project has a high enough profile that we might be able to get the attention of technology industry, and to finally get some answers,” says Smith.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.