Even without tackling problems like this, NARA has its hands full. For three years, at NARA’s request, a National Academy of Sciences panel has been advising the agency on its electronic-records program. The panel’s chairman, computer scientist Robert F. Sproull of Sun Microsystems Laboratories in Burlington, MA, says he has urged NARA officials to scale back their ambitions for the ERA, at least at the start. “They are going to the all-singing, all-dancing solution rather than an incremental approach,” Sproull says. “There are a few dozen formats that would cover most of what [NARA] has to do. They should get on with it. Make choices, encourage people submitting records to choose formats, and get on with it. If you become obsessed with getting the technical solution, you will never build an archive.” Sproull counsels pragmatism above all. He points to Google as an example of how to deploy a workable solution that satisfies most information-gathering needs for most of the millions of people who use it. “What Google says is, ‘We’ll take all comers, and use best efforts. It means we won’t find everything, but it does mean we can cope with all the data,’” Sproull says. Google is not an archive, he notes, but in the Google spirit, NARA should attack the problem in a practical manner. That would mean starting with the few dozen formats that are most common, using whatever off-the-shelf archiving technologies will likely emerge over the next few years. But this kind of preservation-by-triage may not be an option, says NARA’s Thibodeau. “NARA does not have discretion to refuse to preserve a format,” he says. “It is inconceivable to me that a court would approve of a decision not to preserve e-mail attachments, which often contain the main substance of the communication, because it’s not in a format NARA chose to preserve.”
Meanwhile, the data keep rolling in. After the 9/11 Commission issued its report on the attacks on the World Trade Center and the Pentagon, for example, it shut down and consigned all its records to NARA. A good deal of paper, along with 1.2 terabytes of digital information on computer hard disks and servers, was wheeled into NARA’s College Park facility, where it sits behind a door monitored by a video camera and secured with a black combination lock. Most of the data, which consist largely of word-processing files and e-mails and their attachments, are sealed by law until January 2, 2009. They will probably survive that long without heroic preservation efforts. But “there’s every reason to say that in 25 years, you won’t be able to read this stuff,” warns Thibodeau. “Our present will never become anybody’s past.”
It doesn’t have to be that way. Projects like DSpace are already dealing with the problem. Industry will provide a growing range of partial solutions, and researchers will continue to fill in the blanks. But clearly, in the decades to come, archives such as NARA will need to be staffed by a new kind of professional, an expert with the historian’s eye of an Allen Weinstein but a computer scientist’s understanding of storage technologies and a librarian’s fluency with metadata. “We will have to create a new profession of ‘data curator’ – a combination of scientist (or other data specialist), statistician, and information expert,” says MacKenzie Smith of the MIT Libraries.
The nation’s founding documents are preserved for the ages in their bath of argon gas. But in another 230 years or so, what of today’s electronic records will survive? With any luck, the warnings from air force historian Mark and NARA’s Thibodeau will be heeded. And historians and citizens alike will be able to go online and find that NARA made it to the moon, after all.