Computing

Cooler Supercomputers

(Page 2 of 2)

  • Tuesday, February 14, 2006
  • By Wade Roush

TR: What about the heat problem? I assume the systems people will build using your next-generation systems will have even more than 512 processors, all in one room, putting out huge amounts of heat.

ELG: We can force the heat out of the racks with faster fans, but then the computer room becomes very difficult to cool. The only way to deal with more heat is to move the heat faster. There are computer rooms where if you open up a floorboard, they tell you 'Don't put your foot in there,' because the air down there is moving at 100 kilometers per hour.

So the other part of what we want to explore with Ultraviolet is how to reduce this heat and how to deal with applications that are not scaling [that is, do not run as fast as expected when running on more processors in parallel]. These two are related. Let's say an application runs for 100 seconds on a single processor. And let's say that on 100 processors it runs ten times faster -- it runs for 10 seconds. That is a great improvement -- but you're using 100 times as many processors to get there. As such, you're only 10 percent efficient; the application is using ten times more energy and putting out ten times more heat than it needs to.

TR: So how do you make applications run cooler?

ELG: One big part is the way we break problems down into pieces and the way we allocate those to the processors. We did an analysis of about 50 customer applications to see what was going wrong with these applications when they're running in parallel. We identified four or five major areas.

One is communications latency [delays]. The problem is that most applications require constant synchronization, to make sure every process is ready before the next step of the computation. This synchronization uses a lot of time. It's like having six people trying to stand in a straight line -- they have to check with each other. With 60, or 600, or 6,000 people, it takes exponentially longer to get in a straight line.

Second after latency is the communications bandwidth issue. Sometimes you want to transfer a lot of data and the thickness of the connection between the processors will then decide how long it takes for that huge piece of data to get through. If you're waiting, you're not computing. That's another area where efficiency drops.

The third area is load imbalance, which is a huge problem. Say you want to model the weather in your area. You assume that the volume of air in your area is a huge cube, and you divide that into eight sub-cubes and you distribute those sub-cubes to different processors. On a day when the weather is homogeneous across the big cube, the load on those processors may be balanced; but if there is local turbulence in one of the sub-cubes, there will be processors that sit waiting while other processors finish.

The fourth area is when an application needs a piece of data and the data is not in the processor's own cache, and it has to go out to memory. When it goes out to memory there is a huge latency impact.

So these would be the tenets of Ultraviolet design [more reliable memory, less communications latency, more communications bandwidth, better load balancing, and less memory latency]. Say you have an application that is topping out at 128 processors, because it is bottlenecking on communications latency. This chip we're designing is going to drastically reduce latency, which will now allow this application to run on more processors. Or, if you're still running that same application on 128 processors, you should perform better and create less heat.

Caption for home page image: A view from the top: Bridges connect nodes of the 20-node SGI Altix supercomputer housed at the NASA Advanced Supercomputing facility.

Home page image courtesy of NASA Ames Research Center/Tom Trower

Print

Related Articles

Supercomputer Salvo

Two U.S. installations will boost science and surpass Japan

Grids Unleash the Power of Many

To paraphrase a wise science officer, the computing needs of the many outweigh the needs of the few. Scientists say new computing grids that put supercomputers in the hands of the masses could exponentially increase scientific discovery and spark new industries.

Supercomputing Resurrected

Last year, Japan fired up an ultrafast computer that puts its closest competitors to shame. What will it take for the United States to catch up?

Close Comments

To comment, please sign in or register

Forgot my password

Guest (Tim)

  • 2192 Days Ago
  • 02/14/2006

Idiotic and bad use of Energy

Build these clusters in Antartica and use some of the waste heat to warm the scientists there.

Reply

Guest (dbs)

  • 2192 Days Ago
  • 02/14/2006

Re: bad use of Energy

You don't have the right order of
magnitude for the power dissipated
by these computers.  It's measured
in megawatts, not kilowatts.  It
wouldn't just warm the scientists,
it would start melting the ice cap.

Reply

Guest (Daniel Velázquez)

  • 2192 Days Ago
  • 02/14/2006

Liquid Nitrogen

Easier to cool a computer with that magnitud using it.

Reply

Advertisement

MAGAZINE

Can We Build Tomorrow's Breakthroughs?

Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.

Sponsored Content

Technologies from National Instruments

Adding Data Logging
Log measured data to a file and open it in Microsoft Excel

> Click here for more National Instruments Videos <
Whitepaper

Temperature Measurements with Thermocouples: How-To Guide

This document is part of the “How-To Guide for Most Common Measurements” centralized resource portal. This tutorial provides a detailed guide for measurement and device considerations to take temperature measurements using thermocouples. Get an introduction to thermocouples, which are inexpensive sensing devices widely used with PC-based data acquisition systems. Also review some specific thermocouple examples and learn how thermocouples work and ways to integrate them into a data acquisition measurement system.

View full PDF > Listen to story >
Find us on Youtube

Videos

A Robot Recruit that Can Do It All

More

Advertisement

Technology Review Lists

TR50

Our list of the 50 most innovative companies, including the following:

Roche

Lattice Power

First Solar

ARM Holdings

More

Advertisement

Facebook

Advertisement