The Challenges of Being First
Because DSpace is the first superarchive of its kind, the team had many problems to solve. It wanted to create a repository that would both serve the needs of MIT and other research universities and begin to address questions about long-term data preservation, finding solutions that would be applicable in any arena. The two goals weren't always compatible. "There's tension between wanting the system to work for the libraries and wanting it to work in a general sense for any kind of information or knowledge industry," says Smith. "And there's a tension about wanting to get something out fast and wanting to take advantage of new techniques and new technologies that aren't quite ready for prime time."
Robert Tansley, HP's lead software developer on the team, describes the project as "a lot of little problems that you have to solve all at the same time." The first and most obvious problem was the variety of applications contributors use to create their submissions. "Applications change over time," says Tansley, "so people have different versions of things, different operating systems, and a lot of them don't talk to each other." To address that problem, the libraries developed a way to catalog formats for which MIT has the specifications and is, therefore, able to develop software that converts files to other formats as needed.
The team also addressed the search function needs of DSpace users. The system needed to make it easy for people to find their way through the millions of documents that will end up in DSpace. The developers selected Lucene, an open-source search engine that can index so-called metadata as well as text and can be extended with additional sophisticated search capabilities. The team also puzzled over ways people from different communities could describe their documents using the conventions of their own disciplines and still provide easy access to users outside of those communities. DSpace now uses Dublin Core, an established standard for creating the metadata that describe the documents in DSpace, but the team is looking to future research for a better solution. Through another joint venture, MIT and HP will lead the way in this area of digital archiving. A three-year project will explore how to provide metadata that are customized to specific disciplines but searchable and manageable across the entire system.
There were other issues as well. The team had to develop distinct levels of authorization so that a range of access privileges could make specific materials open to the general public or restricted to the Institute, or to an even smaller group. The system needed to be flexible enough that each organizational segment of MIT could develop its own method for submitting documents. And, it had to interoperate, or share content seamlessly, with other institutional archives. To make DSpace inexpensive to upgrade, Tansley divided it into exchangeable modules that can be replaced as new versions become available.
Just about every technical challenge presented Smith and her part of the project team with a corresponding policy question. How the material will be made available to future users was one of the biggest issues the team had to tackle. To make sure that documents will be readable on computers of the future, the team developed a list of supported formats with the requirement that the libraries will keep them available and readable in the future. For unusual formats, the libraries guarantee bit preservation, that is, storing the ones and zeroes of the original documents. "If you've got the know-how to reverse engineer it and maybe write a compiler for the year 2050, then you'll be able to do something with that content," says Smith.
Whether to allow for content removal or modification posed another policy dilemma. "In archives, you never get rid of things," says Smith. But faculty wanted a way to suppress early, prepublication manuscripts. As a compromise, the team created a "tombstone" to acknowledge a preliminary document that did exist but is no longer available to the public. The document, however, remains in DSpace.
Other policy decisions covered what can be put into the online repository, what happens to the materials of a center or lab that closes, how to assign space to individual communities, and what extra services (such as scanning old papers) the library would provide.
Last April, in order to test the process and to provide feedback for improvements before the system went public in September, four representative MIT communities began to submit materials to DSpace. The early adopters were the Department of Ocean Engineering, the Center for Technology, Policy, and Industrial Development, the Laboratory for Information and Decision Systems, and the Sloan School of Management.
Don Lessard, deputy dean of the Sloan School, says the school volunteered to help test the system because "we think DSpace is going to be the key mechanism for maintaining and distributing research. We want a friendly portal into our research for academics, management professionals, and journalists."
Though Lessard was receptive to the new system, he and other early adopters were concerned that some faculty would resist using DSpace. Some communities have been slower than others to begin using the repository, but outreach from DSpace staff and the ease of posting documents have lowered barriers to use. Early adopters also made special efforts to encourage faculty and researchers to submit to DSpace.
Comments