Technology Review - Published By MIT
Advertisement

Google and Akamai: Cult of Secrecy vs. Kingdom of Openness

The king of search is tapping into what may be the largest grid of computers on the planet. And it remains extraordinarily secretive about its core technologies-perhaps because it senses a potential competitor in dotcom era flameout Akamai.

By Simson Garfinkel

April 21, 2004

smaller text tool iconmedium text tool iconlarger text tool icon

 "You should never trust this number," said Martin Farach-Colton, a professor of computer science at Rutgers University, speaking a little more than a year ago. "People make a big deal about it, and it's not true."

Farach-Colton was giving a public lecture about his two-year sabbatical working at Google. The number that he was disparaging was in the middle of his PowerPoint slide:
  • 150 million queries/day

The next slide had a few more numbers:

  • 1,000 queries/sec (peak)
  • 10,000+ servers
  • More than 4 tera-ops/sec at daily peak
  • Index: 3 billion Web pages 
  • 4 billion total docs
  • 4+ petabytes disk storage

A few people in the audience started to giggle: the Google figures didn't add up.

I started running the numbers myself. Let's see: "4 tera-ops/sec" means 4,000 billion operations per second; a top-of-the-line server can do perhaps two billion operations per second, so that translates to perhaps 2,000 servers-not 10,000. Four petabytes is 4x1015 bytes of storage; spread that over 10,000 servers and you'd have 400 gigabytes per server, which again seems wrong, since Farach-Colton had previously said that Google puts two 80-gigabyte hard drives into each server.

And then there is that issue of 150 million queries per day. If the system is handling a peak load of 1,000 queries per second, that translates to a peak rate of 86.4 million queries per day-or perhaps 40 million queries per day if you assume that the system spends only half its time at peak capacity. No matter how you crank the math, Google's statistics are not self-consistent.

"These numbers are all crazily low," Farach-Colton continued. "Google always reports much, much lower numbers than are true."

Whenever somebody from Google puts together a new presentation, he explained, the PR department vets the talk and hacks down the numbers. Originally, he said, the slide with the numbers said that 1,000 queries/sec was the "minimum" rate, not the peak. "We have 10,000-plus servers. That's plus a lot."

Just as Google's search engine comes back instantly and seemingly effortlessly with a response to any query that you throw it, hiding the true difficulty of the task from users, the company also wants its competitors kept in the dark about the difficulty of the problem. After all, if Google publicized how many pages it has indexed and how many computers it has in its data centers around the world, search competitors like Yahoo!, Teoma, and Mooter would know how much capital they had to raise in order to have a hope of displacing the king at the top of the hill.

Google has at times had a hard time keeping its story straight. When vice president of engineering Urs Hoelzle gave a talk about Google's Linux clusters at the University of Washington in November of 2002, he repeated that figure of 1,000 queries per second-but he said that the measure was made at 2:00 a.m. on December 25, 2001. His point, obvious to everybody in the room, is that even by November 2002, Google was doing a lot more than 1,000 queries per second-just how many more, though, was anybody's guess.

The facts may be seeping out. Last Thanksgiving, the New York Times reported that Google had crossed the 100,000-server mark. If true, that means Google is operating perhaps the largest grid of computers on the planet. "The simple fact that they can build and operate data centers of that size is astounding," says Peter Christy, co-founder of the NetsEdge Research Group, a market research and strategy firm in Silicon Valley. Christy, who has worked in the industry for more than 30 years, is astounded by the scale of Google's systems and the company's competence in operating them. "I don't think that there is anyone close."

It's this ability to build and operate incredibly dense clusters that is as much as anything else the secret of Google's success. And the reason, explains Marissa Mayer, the company's director of consumer Web products, has to do with the way that Google started at Stanford.

Instead of getting a few fast computers and running them to the max, Mayer explained at a recruiting event at MIT, founders Sergey Brin and Larry Page had to make do with hand-me-downs from Stanford's computer science department. They would go to the loading dock to see who was getting new computers, then ask if they could have the old, obsolete machines that the new ones were replacing. Thus, from the very beginning, Brin and Page were forced to develop distributed algorithms that ran on a network of not-very-reliable machines.

Story continues below

Today this philosophy is built into the company's DNA. Google buys the cheapest computers that it can find and crams them in racks and racks in its six (or more) data centers. "PCs are reasonably reliable, but if you have a thousand of them, one is going to fail every day," said Hoelzle. "So if you can just buy 10 percent extra, it's still cheaper than buying a more reliable machine."

Working at Google, an engineer told me recently, is the nearest you can get to having an unlimited amount of computing power at your disposal.

Comments

Log In

Forgot your password?     Register »
Advertisement

Videos

Making 3D Maps on the Move
Technology Review November/December 2009

Current Issue

Natural Gas Changes the Energy Map
The United States has vast supplies of this cleaner fossil fuel. But how should we use it?
Featured Content
Sponsored by:
White Papers

Twelve ways to reduce costs with SQL Server 2008
Find out how to reduce costs and get more efficient

Download

Total Economic Impact of SQL Server 2008 Upgrade
Forrester reports on increasing productivity and management capabilities

Download 

Achieving Cost and Resource Savings with UC
How Office Communications Server R2 and Exchange Server can make your business smarter and more efficient

Download 

The Compelling Case for Conferencing
Read how you can improve workload support and find IT efficiencies

Download

How Windows Server 2008 R2 Helps Optimize IT and Save you Money
Read how you can improve workload support and find IT efficiencies

Download

Windows Server 2008 R2 Hyper-V Live Migration
See how Windows Server 2008 R2 and Hyper-V enable virtualization and Live Migration

Download
Advertisement
Subscribe to Technology Review's daily e-mail update. Enter your e-mail address

TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology © 2009 Technology Review. All Rights Reserved.