The decay of 5-year-old digital humor may not be cause for mourning, except among scholars of new media. The loss of early Web sites isn’t entirely academic, though. Those who are plunging into startups today should look closely at what succeeded and failed back around 1995, when “.com” was a dirty phrase instead of a lucrative suffix. Take, for example, one of the Web’s most successful companies, Yahoo!, which started as a student home page at Stanford University. Yahoo! gained popularity and, with its more useful organizational scheme, eclipsed the well-established index of the day, NCSA’s What’s New. Of course, many of the sites this early Yahoo! actually linked to are gone. The real woe is that businesspeople hoping to emulate Yahoo!’s success, as well as students of computing and media history, can’t easily see what version 1.0 of Yahoo! looked like and compare it to the rival index. The original Stanford home page-the Web’s first table of contents-is long gone. The Internet Movie Database has been even more drastically transformed. The collection of movie reviews, originally contributed by volunteers who had no financial interest in the films they wrote about, is now owned by Amazon.com and used to market videos.
The disappearance of valuable Web content will not be stopped by simply selecting “Save As….” Despite the digital nature of online information, real archiving comes at a cost. For one thing, sites that are stored for posterity must be maintained in a way that is verifiably legal and respects the copyright and privacy interests of content creators. Legacy browsers have to be kept on hand, too, so one can see the early Web in the way surfers encountered it back around 1994. Finally, magnetic media and even CD-ROMs degrade after decades, and data has to be copied over every few years if the material is to be safely preserved.
The Internet Archive project is taking such factors seriously, although that project has made some curious omissions. For instance, although one of the project’s directors is a librarian, the Archive does not have an archivist on its board. The organization’s recent approach of focusing on a handful of specific sites is sound, but pages of greater importance to the culture and medium of the Web could have been selected. Choosing a few sites, though, is certainly a better idea than the Internet Archive’s original plan to preserve every bit of the Web using data from the company Alexa-a Sisyphean task to which the organization remains devoted. The issues of copyright, privacy and access are tractable when specific sites are chosen for preservation. To write the whole Web to a giant array of hard disks, on the other hand, is a showy and largely useless technological gesture.
The Web sites of lasting interest are early versions of innovative business, publishing and artistic ventures-not Ross Perot’s home page. Unless we act to preserve important sites, many of which are already offline, the Web’s origins will become even murkier. Developing the Web intelligently, and trying to understand it, will be made harder by our lack of perspective.
It’s quite possible that the origins of the most technologically advanced worldwide system for publishing and communication, now less than a decade old, may one day be known only through isolated scraps of information and the hazy recollections of aging geeks, trying to recall when they created their first animated GIF, or when they first used a Web conferencing system, or when they visited Yahoo! back in the day, when it was on konishiki.stanford.edu…or was that akebono.stanford.edu?