The Chinese Solar Machine Layer by Layer Fire in the Library The Mystery Behind Anesthesia
(Page 2 of 2)
"It's a valid approach," says Bruce Maggs, a professor of computer science at Duke University in Durham, NC, and vice president of research at Akamai, a Web content delivery and caching company based in Cambridge, MA. Fully replicating a database at multiple sites, as search companies typically do now, is inefficient, Maggs says, since only a small proportion of data is accessed at each site. A distributed approach "also saves considerably on everything else in the same proportion, such as capital costs and real estate," he says. This is because, overall, the number of servers required goes down.
For users, the advantage would be quicker search results. This is because most answers would come from a data center that's geographically closer. A small number of results would take longer than normal--but only 20 to 30 percent longer, says Baeza-Yates. "On average, most queries will be faster," he says.
Maggs says the performance improvement would need to be high enough to counteract any delay in those search queries that have to be sent further afield.
Another trade-off is that more users would get different results, depending on where they were, than is currently the case, says Peter Triantafillou, a researcher at the University of Patras in Greece who studies large-scale search. This already happens to some extent even under a centralized model, he says, but it could be a bigger concern if many more searches were inconsistent.
However, with search engine data centers already housing tens of thousands of servers, it's questionable whether they can continue to grow and still function efficiently, Triantafillou says. "Will they be able to go to hundreds of thousands or millions?" he says. Just the practicality of installing the cabling and optics in and out of such facilities would pose serious problems, he says.
The distributed approach remains a long-term aim, Baeza-Yates admits. "But for the Internet," he adds, "long-term is only about five years."
Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.
Our list of the 50 most innovative companies, including the following:
bagapiev
2 Comments
Distributed Index
Great article, all these points are very true. For disclosure, I am the founder of Wowd, a distributed search & discovery startup which uses a distributed cloud (or p2p) approach to completely distribute index across user desktops.
I was definitely puzzled when I saw in the article that "this (p2p) approach hasn't proven very scalable". On the opposite, our approach is all about scalability and the scale of our system is limited only by the number of users, something that cannot be said about distributed data centers (one still needs many of them).
In fact, it is quite natural to ask a question why stop the idea of distribution, as they rightly point out, at the boundary of data centers. One can distribute it on user desktops, with many benefits: great geographic distribution since, on average, there will be users very close by to serve answers; there is also massive replication as well as natural proximity of users and their attention data to the system resources (on the edge of the network). Our system is in very early stages but we are already seeing benefits of this massive diversity of geo-distribution.
One needs to go no further than BitTorrent to see benefits of massively distributed systems in terms of performance. Of course, the key point is that BitTorrent is a read-only CDN but we are addressing that point with our DHTFS (DHT-based file system) which allows writes.
In summary, the article makes great points, but they can be naturally extended much further than just distributed data centers, which is something we are in the process of doing :)
Reply