Keeping data intact can be just as tricky as transmitting it: ask anyone who has left a personal digital assistant on a train or suffered a hard-drive crash. What’s needed, says Berkeley computer scientist John Kubiatowicz, is a way to spread data around so that we don’t have to carry it physically, but so it’s always available, invulnerable to loss or destruction, and inaccessible to unauthorized people.
That’s the grand vision behind OceanStore, a distributed storage system that’s also being tested on PlanetLab. OceanStore encrypts files-whether memos or other documents, financial records, or digital photos, music, or video clips-then breaks them into overlapping fragments. The system continually moves the fragments and replicates them on nodes around the planet. The original file can be reconstituted from just a subset of the fragments, so it’s virtually indestructible, even if a number of local nodes fail. PlanetLab nodes currently have enough memory to let a few hundred people store their records on OceanStore, says Kubiatowicz. Eventually, millions of nodes would be required to store everyone’s data. Kubiatowicz’s goal is to produce software capable of managing 100 trillion files, or 10,000 files for each of 10 billion people.
To keep track of distributed data, OceanStore assigns the fragments of each particular file their own ID code-a very long number called the Globally Unique Identifier. When a file’s owner wants to retrieve the file, her computer tells a node running OceanStore to search for the nearest copies of fragments with the right ID and reassemble them.
Privacy and security are built in. An owner who wants to retrieve a file must first present a key that has been generated using now common encryption methods and stored in a password-protected section of her personal computer. This key contains so many digits that it’s essentially impossible for others to guess it and gain unauthorized access. The key provides access to OceanStore directories that map human-readable names (such as “internet.draft”) to fragment ID codes. The ID codes are then used to search OceanStore for the nearest copies of the needed fragments, which are reassembled and decrypted. And there’s one more layer of protection: the ID codes are themselves generated from the data’s contents at the time the contents are saved using a secure cryptographic function. Like encryption keys, the codes are so long (160 binary digits) that even today’s most advanced supercomputers can’t guess or fake them. So if data retrieved from OceanStore has an unaltered ID, the owner can be sure the data itself hasn’t been changed or corrupted.
Kubiatowicz would like to see OceanStore become a utility similar to DSL or cable Internet service, with consumers paying a monthly access fee. “Say you just got back from a trip and you have a digital camera full of pictures,” he suggests. “One option is to put these pictures on your home computer or write them to CDs. Another option is that you put those pictures into OceanStore. You just copy them to a partition of your hard drive, and the data is replicated efficiently on a global scale.” That option could be available within three to five years, he predicts, but in the interim, two things need to happen. First, his team needs to produce sturdier versions of the OceanStore code. Second, someone needs to provide enough nodes to enlarge the system to a useful scale. That someone is likely to be a private company looking to enter the distributed-storage business, predicts Peterson. “I could imagine OceanStore attracting the next Hotmail-like startup as its first customer,” he says.
Beyond providing distributed, secure storage, OceanStore could eventually make every computer your personal one. At its next level of development, it could store your entire computing environment-your PC desktop, plus all of the applications you’re running and all the documents you have open-across the network and reconstitute it on demand, even if you popped up at an Internet terminal halfway around the world. This capability would be useful to the businessperson on the road, to a doctor who suddenly needs to review a chart, or to a contractor who wants to tweak a blueprint from home. Several companies are working to realize this vision. Intel calls it Internet Suspend/Resume, and Sun researchers are testing several approaches to “desktop mobility.” But PlanetLab could provide the infrastructure that makes such technology possible, by offering a means to manage the large amounts of data-perhaps tens of gigabytes-that personal-computer users might regularly rely on.