Computing

Cooler Supercomputers

"Superclusters" of supercomputers are getting too hot. SGI's Eng Lim Goh wants to solve that problem.

  • Tuesday, February 14, 2006
  • By Wade Roush

Thanks to advances in the speed of supercomputer simulations, complex phenomena such as weather systems, protein folding, and nuclear explosions are becoming easier to model and understand. But only a small part of this speedup is due to faster processors. Instead, the most common way to reach supercomputing capacities is to assemble hundreds or thousands of separate machines in clusters. When yoked together, such a cluster will shares a single memory and can perform massive simulations in parallel by breaking up the work into many small parts.

Even this approach, however, has its limits. For one thing, the larger the memory, the more likely some parts of it will fail during a computation. Also, the more machines that are assembled into a cluster, the more heat they produce. Indeed, in some large computing centers, ventilation and air conditioning systems, fans, and liquid cooling systems are hard-pressed to keep machines from overheating.

Advertisement

Mountain View, CA-based Silicon Graphics, also known as SGI, builds some of the world's largest supercomputing clusters. The fourth-fastest one in the world, for instance, is Columbia, a system that SGI built for NASA Ames Research Center in 2004. Columbia includes 20 SGI Altix "superclusters," each with 512 processors, for a total of 10,240 processors that share a 20-terabyte memory. Cooling this behemoth (which NASA uses to model problems involving large amounts of data, such as climate change, magnetic storms, and designs for hypersonic aircraft) is currently a very low-tech affair: it's accomplished mainly by blowing air past the processors at high speed.

Eng Lim Goh, a computer scientist and chief technical officer at SGI, says one NASA administrator told him, "'I spent millions of dollars on your supercomputer just so we could run simulations that replace our wind tunnel -- and you gave us a new wind tunnel.'"

Goh is now the leader of Project Ultraviolet, SGI's effort to develop its next generation of superclusters. The chips that SGI is designing for Ultraviolet will run applications faster -- yet use less electricity and produce less heat. Technology Review interviewed Goh about the project on February 2.

Technology Review: What are the goals of Project Ultraviolet?

Eng Lim Goh: Ultraviolet is where I've been spending 80 percent of my time for the last three years, with the goal of having a system shipped by end of this decade. We're building ASICs [application-specific integrated circuits] to accelerate certain memory functions and to make applications run cooler; and those have a long development cycle, typically two and a half years.

We started with what we have -- basically, a system capable of huge memory. We build huge systems, managing up to 512 processors sharing up to tens of terabytes of memory. The advantage of such systems is that you can load huge databases in memory without the ten times slowdown when you have to get data from a disk. You want the ability to hold all the data in memory and zip around at a high speed, which is important for advanced business analytics and intelligence.

Our ASICs fit below the [Intel] Itanium processor, with memory below each of these, and they talk to each other to give a virtual, single view of all the memory to the user and the operating system. We make sure to use this low-cost, off-the-shelf memory. However, along with this came off-the-shelf reliability. So, in Ultraviolet we are putting in features to make memory more reliable. For example, there are intelligent agents in our chipset that can go out and scrub unused memory, to force parts that were about to fail to do so during the scrubbing process, not the application process. The agents quickly deallocate that memory, very much like a bad disk.

TR: Some of what you're doing, when you talk about agents, sounds like what IBM calls "autonomous computing."

ELG: People use different names for making computing more self-healing. We were thinking of whether we should use "autonomic memory," or "self-healing systems," like IBM and other vendors. But we got a bit concerned because that sets a really high expectation.

Print

Related Articles

Supercomputer Salvo

Two U.S. installations will boost science and surpass Japan

Grids Unleash the Power of Many

To paraphrase a wise science officer, the computing needs of the many outweigh the needs of the few. Scientists say new computing grids that put supercomputers in the hands of the masses could exponentially increase scientific discovery and spark new industries.

Supercomputing Resurrected

Last year, Japan fired up an ultrafast computer that puts its closest competitors to shame. What will it take for the United States to catch up?

Close Comments

To comment, please sign in or register

Forgot my password

Guest (Tim)

  • 2189 Days Ago
  • 02/14/2006

Idiotic and bad use of Energy

Build these clusters in Antartica and use some of the waste heat to warm the scientists there.

Reply

Guest (dbs)

  • 2189 Days Ago
  • 02/14/2006

Re: bad use of Energy

You don't have the right order of
magnitude for the power dissipated
by these computers.  It's measured
in megawatts, not kilowatts.  It
wouldn't just warm the scientists,
it would start melting the ice cap.

Reply

Guest (Daniel Velázquez)

  • 2189 Days Ago
  • 02/14/2006

Liquid Nitrogen

Easier to cool a computer with that magnitud using it.

Reply

Advertisement

MAGAZINE

Can We Build Tomorrow's Breakthroughs?

Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.

Sponsored Content

Technologies from National Instruments

Adding Data Logging
Log measured data to a file and open it in Microsoft Excel

> Click here for more National Instruments Videos <
Whitepaper

Temperature Measurements with Thermocouples: How-To Guide

This document is part of the “How-To Guide for Most Common Measurements” centralized resource portal. This tutorial provides a detailed guide for measurement and device considerations to take temperature measurements using thermocouples. Get an introduction to thermocouples, which are inexpensive sensing devices widely used with PC-based data acquisition systems. Also review some specific thermocouple examples and learn how thermocouples work and ways to integrate them into a data acquisition measurement system.

View full PDF > Listen to story >
Find us on Youtube

Videos

A Robot Recruit that Can Do It All

More

Advertisement

Technology Review Lists

TR50

Our list of the 50 most innovative companies, including the following:

1366 Technologies

Novartis

A123 Systems

Lattice Power

More

Advertisement

Facebook

Advertisement