It’s official: China’s next supercomputer, the petascale Dawning 6000, will be constructed exclusively with home-grown microprocessors. Weiwu Hu, chief architect of the Loongson (also known as “Godson”) family of CPUs at the Institute of Computing Technology (ICT), a division of the Chinese Academy of Sciences, also confirms that the supercomputer will run Linux. This is a sharp departure from China’s last supercomputer, the Dawning 5000a, which debuted at number 11 on the list of the world’s fastest supercomputers in 2008, and was built with AMD chips and ran Windows HPC Server.
The arrival of Dawning 6000 will be an important landmark for the Loongson processor family, which to date has been used only in inexpensive, low-power netbooks and nettop PCs. When the Dawning 5000a was initially announced, it too was meant to be built with Loongson processors, but the Dawning Information Industry Company, which built the computer, eventually went with AMD chips, citing a lack of support for Windows, and the ICT’s failure to deliver a sufficiently powerful chip in time.
The Dawning 6000 will be completed by mid-2010 at the latest, says Hu, and could be up and running as early as the end of 2010. It is the second time that a representative from the ICT has promised a supercomputer built entirely using Loongson processors.
The development of Loongson 3 began in 2001 as a product of China’s 10th five-year program. All of the chips in the Loongson family are based on the MIPS instruction set–originally developed in the 1980s but now out of favor in desktop and server computers, although still used in many embedded devices. Currently, the Top 500 list is dominated by x86 chips, with non-x86 CPUs powering less than 15 percent of the high-performance systems on the list.
“This is a very high-performance MIPS architecture where, when it’s run in a cluster configuration, it becomes very powerful,” says Art Swift, vice president of marketing at Sunnyvale, CA-based MIPS Technologies, which developed the MIPS architecture.
A paper published in 2009 proposes using Loongson 3 chips in clusters of up to 16 cores to accomplish extremely high performance. Tom Halfhill, analyst at Microprocessor Report, calculates that in this configuration, meeting the petaflop performance mark (one quadrillion operations per second) could require as few as 782 16-core chips.
Halfhill says the Loongson 3 is little different from the latest-generation chip, Loongson 2F, which is already available in consumer PCs. The main differences are that it includes hardware translation of x86 instructions (used in most of the microprocessors made by Intel and AMD), and it incorporates multiple cores–from four up to a proposed 16–each capable of processing commands independently. Conspicuously absent from the Loongson 3 is multithreading, which allows a single core to execute multiple instructions simultaneously. (Both Intel and Sun have already incorporated multithreading into some of their chips.)
Generations 2 and 3 of the Loongson use the same general-purpose core, but the Loongson 3 tethers more cores together. A quad-core Loongson 3 chip is currently in prototype, and a final, 64-nanometer version of the chip was “taped out” in late December, meaning the final description of the chip will soon be sent to the manufacturer, STMicroelectronics.
While the quad-core Loongson 3 could find applications in everything from desktop PCs to set-top boxes (the chip incorporates additional instructions designed specifically to speed up multimedia playback), an eight-core version will likely be need for the proposed petascale supercomputer. That version will incorporate four regular cores, along with four “GStera” coprocessors designed especially for mathematically intensive calculations. These coprocessors are especially significant because they are better at handling intensive mathematical calculations, including the LINPACK test, which uses linear algebra to benchmark the world’s fastest supercomputers, and to determine their ranking (and their owners’ bragging rights) in the Top 500 list of supercomputers.
Jack Dongarra, the computer scientist who introduced the LINPACK benchmark, says that the proposed architecture of the Dawning 6000–multi-purpose cores coupled to coprocessors for certain types of mathematical calculations–follows the standard supercomputer design.
The quad-core Loongson 3 already incorporates two 64-bit floating-point units in each of its cores. So in theory it could be used as the commodity chip in a supercomputer. However, it would require vastly more of these cores to achieve the same processing power, says Dongarra.
Intel remains unfazed by the prospect of a new, state-sponsored contender in the field of high-performance computing. “Measuring competitive impact for a product that does not exist [yet] is always problematic, and we generally refrain from doing so,” says Chuck Mulloy a spokesperson for Intel. “In our entire history there has never been a time when we didn’t face a competitor. We don’t expect that to change–in fact we welcome it.”
Dongarra cautions that it’s pointless to speculate about the performance of the forthcoming Dawning 6000 until benchmarks have been run, not least because the MIPS architecture is nonstandard in high-performance computing. “While I wish them well, I see a lot of challenges to making the whole system work, ” says Dongarra. These challenges include having to adapt the software that Dawning runs.
Halfhill, who has traveled to the ICT in Beijing to report on the birth of the Loongson 3, believes that whatever the performance of the system, it’s only a matter of time before China builds a home-grown chip competitive with those produced in the West. “Technically there’s nothing to stop them from doing world-class processors,” he says. “They’ve got architects and computer scientists just as smart as ours.”