Computer researchers at Stanford want to throw away the hard disk and store information in data centers in random access memory, the more expensive temporary storage that makes programs run faster.
Today’s hard disks can hold roughly 10,000 times as much information as they did in the mid 1980s, but they can only transfer large amounts of data about 50 times as fast as they could back then. This is an increasingly significant bottleneck for data stored on a server in a data center—the kind becoming increasingly common as businesses push their data into cloud computing .
For applications that need to manipulate a lot of data very quickly, like high-frequency stock trading, or translating Web pages from one language to another, the delay is a problem, says John Ousterhout, research professor of computer science at Stanford and head of a new project based on the idea, dubbed RAMCloud. “We’re seeing more and more interesting applications that have huge data sets and access that data very intensively,” he says.
Ousterhout’s proposed system is based on dynamic random access memory (DRAM). In personal computers, after data is fetched from a disk or flash drive, it is temporarily stored in DRAM, which provides a program with very fast access. Data is stored as an electrical charge on a capacitor. In a data center, fetching bits from DRAM and sending them over the center’s internal network should be 100 to 1,000 times faster than getting it from a disk.
“You’ll be able to build new kinds of applications that just weren’t possible before,” says Ousterhout. “Can you ever think of a time in the history of technology that improving speed by a thousandfold … happened and nothing changed?”
Some other computer scientists are more skeptical. “I was hoping to hear a more convincing argument,” wrote Murat Demirbas, associate professor of computer science and engineering at the State University of New York, Buffalo, in a blog post reviewing Ousterhout’s RAMCloud paper. Demirbas also writes that using many disks in parallel might be another approach to cutting down retrieval times.
One concern is the potential cost of the RAMCloud. Ousterhout estimates that 2,000 servers could provide 48 terabytes of DRAM storage at $65 per gigabyte. That’s 50 to 100 times more expensive than disks. However, if you look at cost in terms of how many bits you can access per second, DRAM is actually 10 to 100 times cheaper than disk, Ousterhout says. And he projects that by 2020, with improvements in DRAM technology, a RAMCloud could store one to 10 quadrillion bytes at just $6 per gigabyte.
Ousterhout compares the situation to the 1970s, when hard disks supplanted tape drives as the main storage system for computers, not because they were less expensive but because they made computers run more efficiently. “Disks never got cheaper than tape,” Ousterhout says. “I think the same thing’s going to happen with DRAM.”
Another issue with DRAM is that it’s volatile, meaning it only holds information as long as electricity flows to it. So RAMCloud would still use disks as backup storage, along with extra copies of data in DRAM, allowing data lost during a crash to be recovered.
Luiz Barroso, a distinguished engineer at Google, says the Stanford group is tackling a very important problem, and he sees some promise. “The economics of current DRAM technology would rule out RAMCloud as the solution for some important big data problems, but it could be compelling for more modest sized workloads,” he says.