Skip to Content


Vivek Pai’s new method for storing Web content could make Internet access more affordable around the world.
February 24, 2009
Closing the divide: Students surf the Web at Ghana’s Kokrobitey Institute, a conference center with an Internet connection only about four times as fast as dial-up. The link is enhanced by Princeton’s low-cost, low-power HashCache technology, which stores frequently accessed Web content.

Throughout the developing world, scarce Internet access is a more conspicuous and stubborn aspect of the digital divide than a dearth of computers. “In most places, networking is more expensive–not only in relative terms but even in absolute terms–than it is in United States,” says Vivek Pai, a computer scientist at Princeton University. Often, even universities in poor countries can afford only low-bandwidth connections; individual users receive the equivalent of a fraction of a dial-up connection. To boost the utility of these connections, Pai and his group created HashCache, a highly efficient method of caching–that is, storing frequently accessed Web content on a local hard drive instead of using precious bandwidth to retrieve the same information repeatedly.

Despite the Web’s protean nature, a surprising amount of its content doesn’t change often or by very much. But current caching technologies require not only large hard disks to hold data but also lots of random-access memory (RAM) to store an index that contains the “address” of each piece of content on the disk. RAM is expensive relative to hard-disk capacity, and it works only when supplied with electricity–which, like bandwidth, is often both expensive and scarce in the developing world.

HashCache abolishes the index, slashing RAM and electricity requirements by roughly a factor of 10. It starts by transforming the URL of each stored Web “object”–an image, graphic, or block of text on a Web page–into a shorter number, using a bit of math called a hash function. While most other caching systems do this, they also store each hash number in a RAM-hogging table that correlates it with a hard-disk memory address. Pai’s technology can skip this step because it uses a novel hash function: the number that the function produces defines the spot on the disk where the corresponding Web object can be found. “By using the hash to directly compute the location, we can get rid of the index entirely,” Pai says.

To be sure, some RAM is still needed, but only enough to run the hash function and to actually retrieve a specific Web object, Pai says. Though still at a very early stage of development, HashCache is being field-tested at the Kokrobitey Institute in Ghana and Obafemi Awolowo University in Nigeria.

The technology ends a long drought in fundamental caching advances, says Jim Gettys, a coauthor of the HTTP specification that serves as the basis of Internet communication. While it’s increasingly feasible for a school in a poor country to buy hundreds of gigabytes of hard-disk memory, Gettys says, those same schools–if they use today’s best available software–can typically afford only enough RAM to support tens of gigabytes of cached content. With HashCache, a classroom equipped with pretty much any kind of computers, even castoff PCs, could store and cheaply access one terabyte of Web data. That’s enough to store all of Wikipedia’s content, for example, or all the coursework freely available from colleges such as Rice University and MT.

Even with new fiber-optic cables connecting East Africa to the Internet, thousands of students at some African universities share connections that have roughly the same speed as a home DSL line, says Ethan Zuckerman, a fellow at the Berkman Center for Internet and Society at Harvard University. “These universities are extremely bandwidth constrained,” he says. “All their students want to have computers but almost never have sufficient bandwidth. This innovation makes it significantly cheaper to run a very large caching server.”

Pai plans to license HashCache in a way that makes it free for nonprofits but leaves the door open to future commercialization. And that means that it could democratize Internet access in wealthy countries, too.

Keep Reading

Most Popular

open sourcing language models concept
open sourcing language models concept

Meta has built a massive new language AI—and it’s giving it away for free

Facebook’s parent company is inviting researchers to pore over and pick apart the flaws in its version of GPT-3

transplant surgery
transplant surgery

The gene-edited pig heart given to a dying patient was infected with a pig virus

The first transplant of a genetically-modified pig heart into a human may have ended prematurely because of a well-known—and avoidable—risk.

Muhammad bin Salman funds anti-aging research
Muhammad bin Salman funds anti-aging research

Saudi Arabia plans to spend $1 billion a year discovering treatments to slow aging

The oil kingdom fears that its population is aging at an accelerated rate and hopes to test drugs to reverse the problem. First up might be the diabetes drug metformin.

Yann LeCun
Yann LeCun

Yann LeCun has a bold new vision for the future of AI

One of the godfathers of deep learning pulls together old ideas to sketch out a fresh path for AI, but raises as many questions as he answers.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.