Breaking a Supercomputing Speed Record
By fine-tuning its software and hardware, IBM has boosted the speed of a supercomputer by nearly 600 percent.
This week, IBM announced that it has smashed the speed record for data-sharing across a network within a supercomputer. Previously, files could zip from a supercomputer’s memory devices to its processors at about 15 gigabytes per second, a record that IBM had established, says Rama Govindaraju, an IBM distinguished engineer who worked on the recent project. Now, with advanced file-management software and some system tweaks, his group has boosted that rate to 102 gigabytes per second (the equivalent of downloading about 25,000 songs from the Web).
This file-managing feat, which they call “Project Fastball,” was conducted using IBM’s General Parallel File System software on the world’s third-fastest supercomputer, ASC Purple, which resides at Lawrence Livermore National Laboratory (LLNL) in Livermore, CA. ASC Purple is a giant collection of thousands of storage devices and processors that house and crunch 1.6 petabytes of data (1.6 million gigabytes). It’s used to run data-intensive simulations that model nuclear reactions and to assess the safety of nuclear stockpiles.
But scientific research is not the only field that can benefit from faster-flowing data, Govindaraju says. “What this does is enable a new class of applications to become possible,” he says. Indeed, the breakthrough could be used with any large system of data that needs to be accessed or processed. Thus, it could be highly desirable to an increasing number of organizations that store data on remote, networked server farms throughout the world. In particular, it might benefit the health-care industry, video-on-demand providers, and homeland security workers.
At the heart of the breakthrough is the General Parallel File System (GPFS), IBM software that has been available since 2001. A supercomputer’s file system – GPFS is one of a handful of options – is, in essence, similar to the folders system on a home computer that allows one to type a file name or pathway and locate a particular file, says Steve Scott, CTO at Cray, the Seattle-based supercomputing company.
Unlike most PCs, however, a supercomputer has multiple processors, performing tasks simultaneously (in parallel), and which need to be constantly fed data from files located throughout the thousands of storage devices in the system. GPFS efficiently manages the flow of data at the supercomputer scale.
The chore of directing the data traffic to and from processors and storage devices is challenging in such large systems – and is also one of the limiting factors in computational efficiency, says Eng Lim Goh, senior vice president and CTO of Mountain View, CA-based Silicon Graphics, Inc. “The thickness of the connection between the processors will decide how long it takes for a huge piece of data to get through,” he says. If the data gets backed up because the bandwidth isn’t large enough, “you are not computing, and efficiency keeps dropping.”
GPFS addresses this issue by breaking up the files into chunks that range in size between 256 and 1,024 kilobytes, depending on system resources, and storing these chunks across all of the disks in the file system. To access a file, GPFS initiates multiple pathways in parallel. This parallel file system differs from a so-called “distributed” file system, in which data is transferred through a single path. Moreover, because the system disperses the data traffic, it is effectively self-healing: if one pathway fails, data immediately flows to another one.
Because IBM manages each component of the supercomputer at LLNL, it has been able to finely tune the hardware – processors, networking devices, and storage controllers – to work optimally with GPFS, and thus achieve the record 102 gigabytes per second. “Few companies have ownership of all the components,” Govindaraju says, “so IBM is uniquely positioned to explore the breadth and depth [of the technology].”
But even companies that don’t have IBM systems in their data centers can speed up their supercomputers by using GPFS. Govindaraju says that because the technology is open source, customers without IBM components can incorporate it into their systems and modify it appropriately.
In one example of such an application, IBM is working with the University of Pennsylvania on a project called the National Digital Mammography Archive, an effort to store digitized images of mammograms and make them easily accessible to doctors around the world. Govindaraju gives this example of how it works: a woman has a mammogram in Pennsylvania, then moves to California, where she has another scan. Her doctor in California wants to compare the test results, which he can do by accessing the mammography database. The physician notices a change between the results of the mammograms and is able to compare that change to changes in thousands of other mammograms throughout the world via the database. Also, the doctor can quickly search for the course of treatment that was most effective in similar cases and prescribe that treatment to his or her patient.
Further, Govindaraju adds, all personal identifying information can be stripped off, so that privacy is maintained. The database could also be used to train medical students to make more informed treatment decisions. “What started off as a simple file-sharing experiment is spawning much more powerful applications, to make the medical system more efficient,” he says.
Become an MIT Technology Review Insider for in-depth analysis and unparalleled perspective.Subscribe today