Before specialized graphics-processing chips existed, pioneers in the field of visualization used multicore supercomputers to realize data in three dimensions. Today, however, the speed at which supercomputers can process data is rapidly outstripping the speed at which they can input and output that data. Graphics-processing clusters are becoming obsolete.
Researchers at Argonne National Laboratory and elsewhere are working on a solution. Rather than moving massive datasets to a specialized graphics-processing cluster for rendering, which is how things are done now, they are writing software that allows the thousands of processors in a supercomputer to do the visualization themselves.
Tom Peterka and Rob Ross, computer scientists at Argonne National Laboratory, and Hongfeng Yu and Kwan-Liu Ma of the University of California at Davis, have written software for Intrepid, an IBM Blue Gene/P supercomputer, that bypasses the graphics-processing cluster entirely. “It allows us to [visualize experiments] in a place that’s closer to where data reside–on the same machine,” says Peterka. His team’s solution obviates the need to take the time-consuming step of moving the data from where it was generated to a secondary computer cluster.
Peterka’s test data, obtained from John Blondin of North Carolina State University and Anthony Mezzacappa of Oak Ridge National Laboratory, represent 30 sequential steps in the simulated explosive death of a star, and are typical of the sort of information a supercomputer like Argonne’s might tackle. Peterka’s largest test with the data maxed out at a three-dimensional resolution of 89 billion voxels (three-dimensional pixels) and resulted in two-dimensional images 4,096 pixels on a side. Processing the data required 32,768 of Intrepid’s 163,840 cores. Two-dimensional images were generated with a parallel volume-rendering algorithm, a classic approach to creating a two-dimensional snapshot of a three-dimensional dataset.
Normally, visualization and post-processing of data generated by Intrepid, which, at 557 teraflops, is the world’s seventh-fastest supercomputer, requires a separate graphics-processing unit known as Eureka. (A teraflop is the equivalent of a trillion calculations per second.) Built from NVIDIA Quadro Plex S4 GPUs (graphics-processing units), Eureka runs at 111 teraflops. More-powerful supercomputers, in the petaflop range, present even bigger challenges.
“The bigger we go, the more the problem is bounded by [input/output speeds],” says Peterka. Merely writing to disk the amount of data produced by a simulation run on a petaflop supercomputer could take an unreasonable amount of time. The reason is simple: from one generation of supercomputer to the next, storage capacity and storage bandwidth aren’t increasing as quickly as processing speed.
This disparity means that future supercomputing centers simply might not be able to afford separate graphics-processing units. “At petascale, [separate graphics-processing units] are less cost-effective,” says Hank Childs, a computer systems engineer and visualization expert at Lawrence Berkeley National Laboratory. Childs points out that a dedicated visualization cluster, like the one for Argonne’s Intrepid supercomputer, often costs around $1 million, but in the future that cost might increase by a factor of 20.
Pat McCormick, who works on visualization on the world’s fastest supercomputer, the AMD Opteron and IBM Cell-powered “Roadrunner” at Los Alamos National Laboratory, says that Peterka’s work on direct visualization of data is critical because “these machines are getting so big that you really don’t have a choice.” Existing, GPU-based methods of visualization will continue to be appropriate only for certain kinds of simulations, McCormick says.
“If you’re going to consume an entire supercomputer with calculations, I don’t think you have a choice,” says McCormick. “If you’re running at that scale, you’ll have to do the work in place, because it would take forever to move it out, and where else will you be able to process that much data?”
Peterka, McCormick, and Childs envision a future in which supercomputers perform what’s known as in-situ processing, in which simulations are visualized as they’re running, rather than after the fact.
“The idea behind in-situ processing is you bypass I/O altogether,” says Childs. “You never write anything to disk. You take visualization routines and link them directly to simulation code and output an image as it happens.”
This approach is not without its pitfalls, however. For one thing, it would take a whole second or more to render each image, precluding the possibility of interacting with three-dimensional models in a natural fashion. Another pitfall is the fact that interacting with data in this way burns up cycles on the world’s most expensive mainframes.
“Supercomputers are incredibly valuable resources,” notes Childs. “That someone would do a simulation and then interact with the data for an hour–that’s a very expensive resource to hold hostage for an hour.”
As desktop computers follow supercomputers and GPUs into the world of multiple cores and massively parallel processing, Peterka speculates that there could be a trend away from processors specialized for particular functions. Already, AMD offers the OpenCL code library, which makes it possible to run code designed for a GPU on any x86 chip–and vice versa.
Xavier Cavin, founder and CEO of Scalable Graphics, a company that designs software for the largest graphics-processing units used by businesses, points out that the very first parallel volume-rendering algorithm ran on the CPUs of a supercomputer. “After that, people started to use GPUs and GPU clusters to do the same thing,” Cavin says. “And now it comes back to CPUs. It’s come full circle.”