Before specialized graphics-processing chips existed, pioneers in the field of visualization used multicore supercomputers to realize data in three dimensions. Today, however, the speed at which supercomputers can process data is rapidly outstripping the speed at which they can input and output that data. Graphics-processing clusters are becoming obsolete.
Researchers at Argonne National Laboratory and elsewhere are working on a solution. Rather than moving massive datasets to a specialized graphics-processing cluster for rendering, which is how things are done now, they are writing software that allows the thousands of processors in a supercomputer to do the visualization themselves.
Tom Peterka and Rob Ross, computer scientists at Argonne National Laboratory, and Hongfeng Yu and Kwan-Liu Ma of the University of California at Davis, have written software for Intrepid, an IBM Blue Gene/P supercomputer, that bypasses the graphics-processing cluster entirely. “It allows us to [visualize experiments] in a place that’s closer to where data reside–on the same machine,” says Peterka. His team’s solution obviates the need to take the time-consuming step of moving the data from where it was generated to a secondary computer cluster.
Peterka’s test data, obtained from John Blondin of North Carolina State University and Anthony Mezzacappa of Oak Ridge National Laboratory, represent 30 sequential steps in the simulated explosive death of a star, and are typical of the sort of information a supercomputer like Argonne’s might tackle. Peterka’s largest test with the data maxed out at a three-dimensional resolution of 89 billion voxels (three-dimensional pixels) and resulted in two-dimensional images 4,096 pixels on a side. Processing the data required 32,768 of Intrepid’s 163,840 cores. Two-dimensional images were generated with a parallel volume-rendering algorithm, a classic approach to creating a two-dimensional snapshot of a three-dimensional dataset.
Normally, visualization and post-processing of data generated by Intrepid, which, at 557 teraflops, is the world’s seventh-fastest supercomputer, requires a separate graphics-processing unit known as Eureka. (A teraflop is the equivalent of a trillion calculations per second.) Built from NVIDIA Quadro Plex S4 GPUs (graphics-processing units), Eureka runs at 111 teraflops. More-powerful supercomputers, in the petaflop range, present even bigger challenges.
“The bigger we go, the more the problem is bounded by [input/output speeds],” says Peterka. Merely writing to disk the amount of data produced by a simulation run on a petaflop supercomputer could take an unreasonable amount of time. The reason is simple: from one generation of supercomputer to the next, storage capacity and storage bandwidth aren’t increasing as quickly as processing speed.
Smaller design teams can now prototype and deploy faster.