Custom versus Commodity
Over the last decade, everything we’ve heard about computers has been about making them smaller, faster, cheaper, more like commodities. Our laptops, for instance, have the same capabilities as a Cray computer from the mid-1970s. Then along comes the monstrous Earth Simulator: it’s the size of four tennis courts, and it cost almost twice as much as its closest competition, the ASCI Q machines. If this is the future of supercomputing, what are we to make of it? The machine doesn’t even boast a particularly new architecture: NEC used a technique called vector computing, which dates from Cray’s earliest days.But beyond its ungainliness and architectural oddities, the Earth Simulator exemplifies an approach to high-performance computing that is fundamentally different from the one followed by most U.S. computer makers today. The Earth Simulator was designed from the bottom up-from its processors to the communications buses that link processors and memory-to be the world’s fastest computer. When will the U.S. approach of linking general-purpose processors like those that serve up Web pages produce a result that can match the performance of a machine explicitly designed for performance? “My view is that you can’t do it,” says Bell. “I simply don’t see a way, with a general purpose computer, of getting from there to here.”
The performance challenge begins with the processor. The data crunched in scientific computation often take the shape of lists of numbers, the values associated with real-world observations. Traditionally, computers acted on these values sequentially, retrieving them from memory one by one. Then, in the early 1970s, Seymour Cray took an intuitive leap: why not design a computer so that its processors can request an entire list, or “vector,” all at once, rather than waiting for memory to respond to each request in turn? Such a processor would spend more time computing and less time waiting for data from memory. From the mid-1970s through the 1980s, Cray’s vector supercomputers set record after record. But they required expensive specialized chips, and vector computing was, therefore, largely abandoned in the United States after 1990, when the notion of massively parallel systems made from off-the-shelf processors took hold.
Vector computing nevertheless remained one of the most efficient ways to handle large-scale simulations, prompting NEC to adopt Cray’s approach when it bid for the government contract to build the Earth Simulator. Once NEC’s architects decided to build for speed rather than standardization, they were free to develop not only specialized processors, but also wider communications pathways between the processors, compounding the hardware’s speed advantage. Many such improvements are built into the NEC SX6, the fundamental building block of the Earth Simulator. “Vector architecture is the best fit for computer simulations of grand challenge’ scientific and engineering problems such as global warming, supersonic-airplane design, and nanoscale physics,” says Makoto Tsukakoshi, a general manager for the Earth Simulator project at NEC.
Yoking together commodity machines with standard commercial networks, on the other hand, shifts the speed burden from hardware to software. Computer scientists must write “parallel programs” that parse problems into chunks, then explicitly control which processors should handle each chunk-all in an effort to minimize the time spent passing bits through communications bottlenecks between processors.
Such programming has proved extremely difficult: a straightforward FORTRAN program becomes a noodly mess of code that calls for rewriting and debugging by parallel-programming specialists. “I hope to concentrate my attention on my research rather than on how to program,” says Hitoshi Sakagami, a researcher at Japan’s Himeji Institute of Technology and a Gordon Bell Prize finalist for work using the Earth Simulator. “I don’t consider parallel computers acceptable tools for my research if I’m constantly forced to code parallel programs.”
It’s not laziness that has kept programmers from finding better ways to write parallel code. “People have worked extremely hard trying to develop new application software based on different algorithms to use parallel machines, with little success,” says Jim Decker, principle deputy director of the Office of Science in the Department of Energy. (Decker’s agency is responsible for basic research in areas such as energy and the environment.) Vector machines often employ their own form of parallel processing, but the mathematics for doing so is far less complicated; Earth Simulator scientists, for example, are able to program using a flavor of the classic FORTRAN computer language that takes a much more direct approach.
A supercomputer comprising large numbers of commercial processors isn’t just hard to program. It has become clear that the gains from adding more processors to a commodity system eventually flatten into insignificance as coaxing them to work together grows more difficult. What really got computational scientists’ hearts racing about the Earth Simulator was not the peak-or maximum number of calculations performed per second-which is roughly four times the capacity of the next fastest machine and in itself is impressive enough. Instead, it was the computer’s capability for real problem solving (which, after all, is what scientists care about). The Earth Simulator can crunch computations at up to 67 percent of its peak capacity over a sustained period. In comparison, the massively parallel approach-well, it doesn’t compare.
“If you throw enough of these commodity processors into a system, and you’re not overwhelmed by the cost of the communications network to link them together, then you might eventually reach the peak performance of the Earth Simulator,” says Sterling. “But what is rarely reported publicly about these systems is that their sustained performance is frequently below five percent of peak, or even one percent of peak.”
Although it’s certainly cheaper to build supercomputers out of commodity parts, many computational scientists suspect that the cost of developing parallel software actually makes it more expensive to run scientific applications on such a machine.
“People have gotten enamored of the low cost for what sounds like a very high level of performance on commodity machines,” says Decker. “But they aren’t really cheaper to build. We need to look at sustained performance, as well as the cost of developing software. Software costs are generally larger than hardware costs, so if there are hardware approaches that make it easy to solve the problem, we’re better off investing in hardware. In hindsight, I believe we would have been better off taking a different path.”