But for all this promise, NARA faces many problems that researchers haven’t even begun to think about. Consider Weinstein’s discovery of the Hoover marginalia. How could such a tidbit be preserved today? And how can any organization that needs to track information – where it goes, who uses it, and how it’s modified along the way – capture those bit streams and keep them as safe as older paper records? Saving the text of e-mail messages is technically easy; the challenge lies in managing a vast volume and saving only what’s relevant. It’s important, for example, to save the e-mails of major figures like cabinet members and White House personnel without also bequeathing to history trivial messages in which mid-level bureaucrats make lunch arrangements. The filtering problem gets harder as the e-mails pile up. “If you have 300 or 400 million of anything, the first thing you need is a rigorous technology that can deal with that volume and scale,” says Chadduck. More and more e-mails come with attachments, so NARA will ultimately need a system that can handle any type of attached file.
Version tracking is another headache. In an earlier era, scribbled cross-outs and margin notes on draft speeches were a boon to understanding the thinking of presidents and other public officials. To see all the features of a given Microsoft Word document, such as tracked changes, it’s best to open the document using the same version of Word that the document’s creator used. This means that future researchers will need not only a new piece of metadata – what software version was used–but perhaps even the software itself, in order to re-create fonts and other formatting details faithfully. But saving the functionality of software – from desktop programs like Word to the software NASA used to test a virtual-reality model of the Mars Global Surveyor, for example – is a key research problem. And not all software keeps track of how it was actually used. Why might this matter? Consider the 1999 U.S. bombing of the Chinese embassy in Belgrade. U.S. officials blamed the error on outdated maps used in targeting. But how would a future historian probe a comparable matter – to check the official story, for example – when decision-making occurred in a digital context? Today’s planners would open a map generated by GIS software, zoom in on a particular region, pan across to another site, run a calculation about the topography or other features, and make a targeting decision.
If a historian wanted to review these steps, he or she would need information on how the GIS map was used. But “currently there are no computer science tools that would allow you to reconstruct how computers were used in highconfidence decision-making scenarios,” says Peter Bajcsy, a computer scientist at the University of Illinois at Urbana-Champaign. “You might or might not have the same hardware, okay, or the same version of the software in 10 or 20 years. But you would still like to know what data sets were viewed and processed, the methods used for processing, and what the decision was based on.” That way, to stay with the Chinese embassy example, a future historian might be able to independently assess whether the database about the embassy was obsolete, or whether the fighter pilot who dropped the bomb had the right information before he took off. Producing such data is just a research proposal of Bajcsy’s. NARA says that if such data is collected in the future, the agency will add it to the list of things needing preservation.