We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Splitting Up Search

Distributing a search engine’s index around the world could make it faster and more efficient, researchers say.

Searching the Web could become faster for users and much more efficient for search companies if search engines were split up and distributed around the world, according to researchers at Yahoo.

Currently, search engines are based on a centralized model, explains Ricardo Baeza-Yates, a researcher at Yahoo’s Labs in Barcelona, Spain. This means that a search engine’s index–the core database that lists the location and relative importance of information stored across the Web–as well as additional data, such as cached copies of content, are replicated within several data centers at different locations. The tendency among search companies, says Baeza-Yates, has been to operate a relatively small number of very large data centers across the globe.

Baeza-Yates and his colleagues devised another way: a “distributed” approach, with both the search index and the additional data spread out over a larger number of smaller data centers. With this approach, smaller data centers would contain locally relevant information and a small proportion of globally replicated data. Many search queries common to a particular area could be answered using the content stored in a local data center, while other queries would be passed on to different data centers.

“Many people have talked about this in the past,” says Baeza-Yates. But there was resistance, he says, because many assumed that such an approach would be too slow or expensive. It was also unclear how to ensure that each query got the best global result and not just the best that the local center had to offer. A few start-up companies have even launched peer-to-peer search engines that harness the power of users’ own machines. But this approach hasn’t proven very scalable.

To achieve a workable distributed system, Baeza-Yates and colleagues designed it so that statistical information about page rankings could be shared between the different data centers. This would allow each data center to run an algorithm that compares its results with those of others. If another data center gave a statistically better result, the query would be forwarded to it.

The group put the distributed approach to the test in a feasibility study, using real search data. They present their findings this week at the Association for Computing Machinery’s Conference on Information and Knowledge Management in Hong Kong, where they will receive the award for the best paper.

“We wanted to prove that we could achieve the same performance [as the centralized model] without it costing too much,” says Baeza-Yates. In fact, they found that their approach could reduce the overall costs of operating a search engine by as much as 15 percent without compromising the quality of the answers.

“It’s a valid approach,” says Bruce Maggs, a professor of computer science at Duke University in Durham, NC, and vice president of research at Akamai, a Web content delivery and caching company based in Cambridge, MA. Fully replicating a database at multiple sites, as search companies typically do now, is inefficient, Maggs says, since only a small proportion of data is accessed at each site. A distributed approach “also saves considerably on everything else in the same proportion, such as capital costs and real estate,” he says. This is because, overall, the number of servers required goes down.

For users, the advantage would be quicker search results. This is because most answers would come from a data center that’s geographically closer. A small number of results would take longer than normal–but only 20 to 30 percent longer, says Baeza-Yates. “On average, most queries will be faster,” he says.

Maggs says the performance improvement would need to be high enough to counteract any delay in those search queries that have to be sent further afield.

Another trade-off is that more users would get different results, depending on where they were, than is currently the case, says Peter Triantafillou, a researcher at the University of Patras in Greece who studies large-scale search. This already happens to some extent even under a centralized model, he says, but it could be a bigger concern if many more searches were inconsistent.

However, with search engine data centers already housing tens of thousands of servers, it’s questionable whether they can continue to grow and still function efficiently, Triantafillou says. “Will they be able to go to hundreds of thousands or millions?” he says. Just the practicality of installing the cabling and optics in and out of such facilities would pose serious problems, he says.

The distributed approach remains a long-term aim, Baeza-Yates admits. “But for the Internet,” he adds, “long-term is only about five years.”

Get stories like this before anyone else with First Look.

Subscribe today
Already a Premium subscriber? Log in.

Uh oh–you've read all of your free articles for this month.

Insider Premium
$179.95/yr US PRICE

Want more award-winning journalism? Subscribe to Insider Premium.
  • Insider Premium {! insider.prices.premium !}*

    {! insider.display.menuOptionsLabel !}

    Our award winning magazine, unlimited access to our story archive, special discounts to MIT Technology Review Events, and exclusive content.

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special discounts to select partner offerings

    Discount to MIT Technology Review events

    Ad-free web experience

    First Look: exclusive early access to important stories, before they’re available to anyone else

    Insider Conversations: listen in on in-depth calls between our editors and today’s thought leaders

You've read all of your free articles this month. This is your last free article this month. You've read of free articles this month. or  for unlimited online access.