Deysarkar says the approach significantly reduces costs for 80legs, allowing the company to offer its service for far less than would be possible if it used a data center, or even a cloud-computing service such as Amazon Web Services.
Daniel Tunkelang, cofounder of the search company Endeca, based in Cambridge, MA, says that a good Web crawling service could be useful for startups that want to focus on building the search experience rather than on collecting the data. But Tunkelang says the success of 80legs may depend on how easy it is for users to customize the crawl. “The big question is, how adaptive and programmable is the crawl?” he says.
Tunkelang also notes that it’s important for a Web crawler to capture as much information as possible. For example, the path a crawler took to arrive at a particular page can provide a search company with useful information about the contents of that page.
A service such as 80legs could also be useful for university researchers. “Crawling at large scale is indeed an expensive hurdle to cross for experimental search projects in academia, which often are lacking large-scale infrastructure,” says Kevin Chang, an associate professor of computer science at the University of Illinois at Urbana-Champaign.
Chang thinks the distributed nature of 80legs is “an interesting direction and sounds promising [for lowering] the cost of crawling.” At the same time, he agrees that a lot depends on how efficiently the system operates and how effectively users can customize what data they want to process.
80legs plans to launch a market where nontechnical users will be able to purchase applications that can control how a crawler functions. Partner companies will also be able to sell access to applications that control 80legs’s crawlers.