Netbook Chips Create a Low-Power Cloud

A “fast array of wimpy nodes” could replace behemoth server infrastructure.

Christopher Mimsarchive page

April 16, 2009

Using a cluster of the same processors that normally show up in netbooks and similar mobile devices, researchers have created a powerful server architecture that draws less power than a lightbulb.

**Small wonder:** Each node in the “fast array of wimpy nodes” (FAWN) has a single 500-megahertz AMD Geode processor, 256 megabytes of RAM, and a single four-gigabyte compact flash card. The largest FAWN cluster built to date consists of 21 nodes.

The architecture, dubbed a “fast array of wimpy nodes,” or FAWN, offers a way to decrease by an order of magnitude the amount of power used by the computational infrastructure of Internet giants like Google, Microsoft, Amazon, eBay, Facebook, and others. If the predictions of its inventors are borne out, it could have a significant impact on both the bottom line and the environmental impact of cloud computing.

Power now accounts for up to 50 percent of the cost of operating data centers, and in the United States, its cost per kilowatt-hour is increasing. Even relative newcomers like Facebook use up to $1 million a month in electricity, and the Environmental Protection Agency (EPA) projects that by 2011, data centers in the United States could use up to 100 billion kilowatt-hours of electricity, for a total annual cost of $7.4 billion, with an estimated emissions impact of 59 million metric tons of CO².

FAWN, which is described in an as-yet-unpublished paper by David Andersen and his team at Carnegie Mellon University, tackles this problem with a combination of relatively slow processors (the kind used in netbooks and other mobile devices) and flash memory (the kind that stores data in digital cameras and USB drives). The somewhat counterintuitive result is an architecture whose performance per watt of energy is a hundred times better than that of traditional servers, which use faster (but much more energy-hungry) processors and disk-based storage.

The exceptional performance of FAWN is limited to certain kinds of problems–random access of small bits of information–but this kind of input/output-intensive task is exactly what strains the existing infrastructure of Web companies like Facebook.

“When you go to Facebook.com, the home page has hundreds of individual data elements on it, which get translated into hundreds of internal lookups,” says Andersen. Requests for those hundreds of elements, which include friends’ updates, the number of messages in an inbox, and more, are handed off to a specialized piece of software, called memcached, that stores relevant data in RAM. Memcached prevents Facebook’s disk-based databases from being overwhelmed by a fire hose of millions of simultaneous requests for small chunks of information. Amazon, which has more or less the same problem as Facebook with its shopping cart and custom recommendations, uses a similar piece of custom-built software, called Dynamo, to perform nearly the same function.

One way that FAWN replaces software like memcached and Dynamo is by conquering what computer scientists call the memory wall, which is the huge disparity between the rate at which disk-based storage can feed data to a CPU and the rate at which a CPU, which is much faster, can chew through that data. (Andersen points out that modern CPUs use an enormous number of transistors trying to guess what data to expect, fetching data in advance or caching it in memory to make sure that the chip always has a steady supply of bits to process.)

There are two ways to get around the memory wall: the first is to increase the performance of a system’s memory, and the second is simply to slow down its CPU. FAWN does both: flash memory has much faster random access than disk-based storage, and FAWN’s slower processors require less power and waste fewer transistors trying to guess what’s coming next.

FAWN is composed of many individual nodes, each with a single 500-megahertz AMD Geode processor (the same chip used in the first One Laptop Per Child $100 laptop) with 256 megabytes of RAM and a single four-gigabyte compact flash card. The largest FAWN cluster built to date, consisting of 21 nodes, draws a maximum of 85 watts under real-world conditions.

Each FAWN node performs 364 queries per second per watt, which is a hundred times better than can be accomplished by a traditional disk-based system working on an input/output-intensive task, such as gathering all the disparate bits of information required to display a Facebook or FriendFeed page or a Google search result.

This kind of performance may have applications beyond the data center, says Steven Swanson, an assistant professor in the department of computer science and engineering at the University of California, San Diego. Swanson’s own high-performance, flash-memory-based server, called Gordon, which currently exists only as a simulation, is similar to FAWN in its architecture but was designed with scientific applications as well as data centers in mind.

Swanson’s goal is to exploit the unique qualities of flash memory to handle problems that are currently impossible to address with anything other than the most powerful and expensive supercomputers on earth–systems with up to a petabyte of RAM. “We work with the San Diego Supercomputing Center on large genomics and bioinformatics patterns,” says Swanson. “We want to do queries very quickly, and if the data graphs won’t fit in RAM, they get very slow, which means you have to give up fidelity in the simulation.”

FAWN is “the right direction to push,” says Niraj Tolia, a researcher in the Exascale Computing Lab at HP Labs. “The days are gone when we simply looked at raw performance as a metric,” he adds.

Currently, FAWN is not suitable for CPU-intensive tasks such as processing video, but Andersen says that future iterations will use the more powerful Atom processors (which Swanson is also contemplating for his Gordon system). Having been designed for netbooks, these more powerful processors draw the same amount of power as the AMD chips–about four watts each. Throw in a power supply and some networking equipment, and “you could very easily run a small website on one of these servers, and it would draw 10 watts,” says Andersen–a tenth of what a typical Web server draws.

The next generation of FAWN is something that Andersen hopes the largest users of data centers will investigate. “I would love it if we could get Facebook or Google or Microsoft to start building clusters with this,” he says.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.