The Kingdom of Openness
There is another company that has perfected the art of running massive numbers of computers with a comparatively tiny staff. That company is Akamai.Akamai isn’t a household word now, but it did make the front pages when the company went public in November 1999 with what was, at the time, the fourth most successful initial public offering in history. Akamai’s stock soared and made billionaires of its founders. In the years that followed, however, Akamai has fallen on hard times. It wasn’t just the dot-com crash that caused significant layoffs and the abandonment of the company’s California offices: Akamai’s cofounder and chief technology officer Danny Lewin was aboard American Airlines Flight 11 on September 11 and was killed when the plane was flown into the World Trade Center. Company morale was devastated.
Akamai’s network operates on the same complexity scale as Google’s. Although Akamai has only 14,000 machines, those servers are located in 2,500 different locations scattered around the globe. The servers are used by companies like CNN and Microsoft to deliver Web pages. Just as Google’s servers are used by practically everyone on the Internet today, so are Akamai’s.
Because of their scale, both Akamai and Google have had to develop tools and techniques for managing these machines, debugging performance problems, and handling errors. This isn’t software that a company can buy off the shelf-they require laborious in-house development. It is, in fact, software that is one of Akamai’s key competitive advantages.
Yes, a few other organizations are also running large clusters of computers. Both NASA’s Ames Research Center and Virginia Tech have large clusters devoted to scientific computing. But there are key differences between these systems and the clusters that both Google and Akamai have created. The scientific systems are located in a single place, not spread all over the world. They are generally not directly exposed to the Internet. And perhaps most importantly, the scientific systems are not providing a commodity service to hundreds of millions of Internet users every day: Google and Akamai must deliver 100 percent uptime. It’s easy to go out and buy 10,000 computers-all you need is cash. It’s much harder to make those computers all work together as a single service that supports millions of simultaneous users.
To be fair, there are important differences between Google and Akamai-differences that assure that Google won’t be breaking into Akamai’s business anytime soon, nor Akamai moving into Google’s. Both companies have developed infrastructure for running massively parallel systems, but the applications that they are running on top of those systems are different. Google’s primary application is a search engine. Akamai, by contrast, has developed a system for delivering Web pages, streaming media, and a variety of other standard Internet protocols.
Another important difference, says Christy, “is that Akamai has had a very hard time creating a clear business model that works, whereas Google has been unbelievably successful.” Akamai has thus started looking for new ways that it can sell services that only a massive distributed network can deliver. Struggling for profitability, the company has been aggressively looking for new opportunities for its technology. This might be the reason that Akamai, unlike Google, was willing to be interviewed for this article.
“We started with basic bit delivery-objects, photos, banners, ads,” says Tom Leighton, Akamai’s chief scientist. “We do it locally. Make it fast. Make it reliable. Make the sites better.”
Now Akamai is developing techniques for letting customers run their applications directly on the company’s distributed servers. Leighton says that 25 of Akamai’s largest customers have done this. The system can handle sudden surges, making it ideal for cases where it is impossible to anticipate demand.
For example, says Leighton, Akamai’s network was used to handle a keyboard giveaway contest sponsored by Logitech. Thinking that its contest might be popular, Logitech created an elaborate series of rules, assuring that only so many keyboards would be given away to every state and within any given time period. But Logitech grossly underestimated how many people would click in to the contest. In the past, such underestimates have caused highly publicized Internet events like the Victoria’s Secret webcast to crash, frustrating millions of Web surfers and embarrassing the company. But not this time: Logitech’s contest ran on the Akamai network without a hitch.
Of course, Logitech could have tried to build the system itself. It could have designed and tested a server capable of handling 100 simultaneous users. That server might cost $5,000. Then Logitech could have bought 20 of those servers for $100,000 and put them in a data center. But a single data center could get congested, so it might make more sense to put 10 of them in one data center on the East Coast and 10 in another data center on the West Coast. Still, that system could only handle 2,000 simultaneous users: it might be better to buy 100 servers, for a total cost of $500,000, and put them at 10 different data centers. But even if they had done this, the engineers at Logitech would have had no way of knowing if the system would actually have worked when it was put to the test-and they would have invested a huge amount of money in engineering that wouldn’t have been needed after the event.
And contests aren’t the only thing that can run on Akamai’s network. Practically any program written in the Java programming language can run on the company’s infrastructure. The system can handle mortgage applications, catalogs, and electronic shopping carts. Akamai even runs the backend for Apple’s iTunes 99-cent music service.
Perhaps because Akamai is so proud of the system that it has built, the company is very open about the network’s technical details. Its network operations center in Cambridge, MA, has a glass wall allowing visitors to see a big screen with statistics. When I visited the company in January, the screen said that Akamai was serving 591,763 hits per second, with 14,372 CPUs online, 14,563 gigahertz of total processing power, and 650 terabytes of total storage. On April 14, the number had jumped to a peak rate of 900,000 hits per second and 43.71 billion requests delivered in a 24-hour period. (Akamai wouldn’t disclose the number of CPUs online because that number is part of its quarterly earnings report, to be released on April 28. “But it hasn’t changed much,” the company’s spokesperson told me.)