Every year MIT researchers create at least 10,000 papers, data files, images, collections of field notes, and audio and video clips. The research often finds its way into professional journals, but the rest of the material remains squirreled away on personal computers, Web sites, and departmental servers. It’s accessible to only a few right now. And with computers and software evolving rapidly, the time is coming when files saved today will not be accessible to anyone at all.
Until recently there has been no overall plan to archive or preserve such work for posterity. But true to its problem-solving nature, MIT has come up with a solution. In September the Institute launched DSpace, a Web-based institutional repository where faculty and researchers can save their intellectual output and share it with their colleagues around the world and for centuries to come. The result of a two-year collaboration of the MIT Libraries and Hewlett-Packard, DSpace is built on open-source software and is available to anyone free of charge. But it’s even more important to note that many believe this groundbreaking effort will fundamentally change the way scholars disseminate their research findings.
The Case for DSpace
DSpace grew out of the mutual need of MIT’s libraries and researchers to preserve digital work, says Ann Wolpert, director of libraries for the Institute. A few years before the project began, she says, “faculty started coming to the library and saying, I have this stuff on my Web site. I want it to be more secure than it is on my computer. Will you figure out how to take my digitally formatted materials?’” Although the libraries have massive print archives dedicated to preserving a wide range of materials, they had no system for digital preservation. So Wolpert talked with the libraries’ faculty advisory committee and visited departments across campus to determine what was needed.
It “wasn’t but a blink of an eye before it became apparent that this kind of function was essential for educational technology as well as research,” says Wolpert. “If you’re going to spend all this money on online course content, where’s it all going to go?” During the meetings, it became clear that the convergence of faculty needs, the Institute’s commitment to OpenCourseWare and other campus initiatives made development of a digital archive a natural fit.
The MIT Libraries submitted a proposal to Hewlett-Packard, proposing that the two organizations form a partnership to develop a multidisciplinary digital repository. The libraries’ needs were a good match with HP’s desire to develop archival storage systems that eventually could be used in any business setting.
“The world is coming to grips with the sheer magnitude of digital content that will be produced over the next decade,” says Michael Bass, the HP project manager of DSpace. “We wanted to get to the bottom of the hard-core problems that are going to keep coming up until people address them.”
The $1.8 million project became part of the five-year, $25 million MIT-HP Alliance, a research effort to develop digital information systems. In the spring of 2000, the project team of HP software developers, MIT administrators, and a faculty advisory committee started to develop the system. The result of their efforts is a one-of-a-kind repository that can store all types of digital files and is accessible from any computer on campus. Every document stored in DSpace has a unique and permanent URL. Materials submitted to the repository are organized within a community-a school, department, lab, or center. Each community sets its standards for DSpace content and decides who will be authorized to upload its documents. Posted material with unrestricted access may be viewed by anyone.
When it first came online, DSpace could store almost a terabyte of data. While that’s enough room to accommodate the information on about 1,500 CD-ROMs, it is not large enough to hold all the work MIT faculty have stored on their own hard drives and CD-ROMs. MIT plans to add storage capacity as demand increases. “We wanted to have enough storage to bootstrap an interesting body of materials,” says Bass, “but we didn’t want to overbuild.”
DSpace is not the only digital archive in the United States, but it does occupy unique ground. “If you look at the landscape of digital repositories, there seem to be two types,” says MacKenzie Smith, associate director for technology for the MIT Libraries and the Institute’s project manager for DSpace. “One concerns library holdings that happen to be in digital format. The other is a preprint archive that is tailored to scholarly papers in a discipline and is a vehicle for getting them out quickly. They are not concerned with long-term preservation.” DSpace, however, is committed to preserving not only published papers, but also their supporting documentation.