The challenge of collecting and preserving the Web, or even a representative sample of it, is a daunting one (see “Fire in the Library”). It is not enough to simply capture the information a website contained, be that text, images, or video. We must preserve something of the experience and activity a site supported. How a site was accessed, who linked to it, and how that changed over time provide important context for critical events such as the recent tsunami in Japan or the events of 9/11, which are relatively distant at the speed at which the Web evolves and leaves data behind. No lone institution can attempt to preserve all that. It will take the commitment of a critical mass of government institutions, companies, nonprofits, and more to ensure the longevity of our digital heritage, nationally and globally.
Current notions of what the Web represents socially, culturally, politically, economically, legally, and even scientifically vary depending on where you happen to live in the world. The value systems to which you subscribe shape what you see in the Web. This is an advantage when thinking of how to preserve the diversity of experience online. Unfortunately, many factors work against the cross-cultural collaboration needed to preserve the Web’s diversity at scale. Local legislation can hinder attempts to share information; companies can fear negative commercial consequences from providing access to their data; and limited budgets constrain the few organizations, such as the Internet Archive, that are dedicated to preserving the Web.
In a perfect world, this would not be the case. Individuals, governments, universities, libraries, and corporations would all work to preserve the world’s most vibrant cultural medium. Imagine for a moment an approach to preservation that builds on the fundamental strengths of the Internet itself—distributed, ubiquitous, relatively inexpensive, not easily quelled or manipulated by any single actor. “Netizens” from around the globe would work to build a unified Web archive spanning cultural, political, and commercial boundaries. Subject-matter experts would ensure that their spheres were adequately represented; others would confirm that a representative sample across all domains was being collected.
Don’t settle for half the story.
Get paywall-free access to technology news for the here and now.