What’s So Hard about Digital Preservation?
The naive view of digital preservation is that it’s merely a question of moving things periodically onto new storage media, of making sure you copy your files from eight-inch floppy disks to five-and-a-quarter, to three-and-a-half, to CD, and on to the next thing before the old format fades away completely. But moving bits is easy. The problem is that the decoding programs that translate the bits are usually junk within five years, while the languages and operating systems they use are in a state of constant change.Every piece of software, and every data file, is at its heart written to instruct a given piece of hardware to perform certain tasks. In other words, it is written in the language of a machine, not of humans. Whenever you create a digital thing, be it a document, a database, a program, an image or a piece of music, it is stored in a form that you can’t read. “It’s like it was written in invisible ink,” says Jeff Rothenberg, a researcher at Rand, a think tank in Santa Monica, CA. “As soon as it’s stored it disappears from human eyes, and you need the right resources to render it visible again, just like invisible ink needs some sort of solvent to be read.” Yet rebuilding old hardware or keeping it around forever to interpret nearly extinct software or formats is economically prohibitive: when shippers dropped one of Feinstein’s vintage arcade games, shattering it, its original manufacturer calculated the insurance costs to restore the cabinet alone at $150,000, while making new chips for the game-from dies that no longer exist-would have cost millions.
Software companies confront the problem of digital preservation every day as they update their code, making sure it works with the latest hardware and operating systems, while at the same time ensuring that customers can access old files for a reasonable amount of time. But without some sort of digital resuscitation, every application-from the original binary codes written in the 1940s to WordPerfect to the latest million-dollar database application-eventually stops working, and every data file eventually becomes unreadable. Every application and every file.
The evolution of operating systems-the programs that allow other programs to run-provides yet another challenge. As Microsoft improves Windows, for example, it introduces new guidelines for programmers, known as application programming interfaces every few months, adding some features and taking others away. In each new release, some interfaces are “deprecated,” meaning that programmers are advised to stop using them in the software they write. But what does that mean for programs written before the change? Most programs that use deprecated features will work for a while but they access the underlying architecture in a less direct way than the newer interfaces do, and the program is likely to run more slowly. How long before it stops? Most people actively trying to keep old files and applications operational say that five years is pushing it. “Interfaces change continually,” says one Windows developer. “It’s like asking how often the beach changes shape. Sometimes big storms come and nothing looks the same.”
But when programs are painstakingly rewritten to conform to new operating-system guidelines, they eventually become unable to access files created by their own precursors. “I frankly don’t expect to have a version of Quicken in 10 years that will be able to read my tax files from today,” says Gordon Bell, who led the development of some of the first minicomputers as vice president of research and development at Digital Equipment, and who now works as a senior researcher at Microsoft’s Bay Area Research Center. “Especially anything that is database oriented, with a lot of complexity in the data structure, is difficult to move from one generation to the next.”