To try to express the speed of modern supercomputers in conventional notation is an exercise in absurdity: a petaflop is one quadrillion floating point operations per second, while the next milestone, an exaflop, is a million trillion calculations per second. Got that?
The easiest way to compare them is to note that an exaflop is a thousand times as large as a petaflop. An exascale computer would be a thousand times as powerful as the fastest supercomputers on earth today, each of which cost in excess of $100 million to construct.
“To me the major happening over the past few years is that we were able to reach petascale using conventional technology,” says Thom Dunning, head of the National Center for Supercomputing Applications. “It was an evolution of a technology path we’ve been on for the past five years. We can’t reach exascale with same technology path.”
No one knows how we’re going to get to that scale, but just about every high-performance computing scientist seems to view it as inevitable. The Defense Advanced Research Projects Agency (DARPA), the government body tasked with funding the riskiest, most far-out research projects, thinks getting there will require that we “reinvent computing.”
Dunning speculates that a move from conventional architectures (CPUs) to GPUs, while difficult, might be a piece of the puzzle. “It may be that GPUs and other new technology may be useful in the longer term, but it’s going to require a very substantial effort to rewrite all the [currently library of] code to make effective use of that technology,” he says.
A hybrid CPU / GPU approach with the GPUs doing most of the heavy lifting is exactly the strategy that was adopted for the world’s newly crowned champion supercomputer, China’s Tianhe 1A.
This approach is risky: GPUs can have explosive peak performance, but their sustained performance tends to be relatively sluggish because their memory systems simply can’t feed them data fast enough. More importantly, all existing supercomputer software wasn’t written to take advantage of their novel brand of parallelism, originally designed for processing high-end graphics.
Dunning is doubtful that, at least at first, the Tianhe 1A will have the kind of general utility for scientific applications that more conventional supercomputers possess.
Regardless, those machines are running up against the limits of current technology in terms of operations per watt. Incremental improvements can be wrung from further shrinking of the “process technology” – literally, the size of the individual features on the chip, which is now approaching 22 nanometers in the lab.
But basic physics – quantum tunneling, the difficulty of keeping electrons in their paths once the barriers between features are only hundreds or thousands of atoms thick – is likely to put the kibosh on further improvements by simply endlessly shrinking everything on a chip.
GPUs have the advantage of being a substantially different way of accomplishing the same thing as CPUs - but not every problem is amenable to their variety of parallelism, so the quest continues.
Check out additional posts in this series of conversations with Thom Dunning of NCSA: