Web

A Web Spider for Everyone

(Page 2 of 2)

  • Friday, September 25, 2009
  • By Erica Naone

Deysarkar says the approach significantly reduces costs for 80legs, allowing the company to offer its service for far less than would be possible if it used a data center, or even a cloud-computing service such as Amazon Web Services.

Daniel Tunkelang, cofounder of the search company Endeca, based in Cambridge, MA, says that a good Web crawling service could be useful for startups that want to focus on building the search experience rather than on collecting the data. But Tunkelang says the success of 80legs may depend on how easy it is for users to customize the crawl. "The big question is, how adaptive and programmable is the crawl?" he says.

Tunkelang also notes that it's important for a Web crawler to capture as much information as possible. For example, the path a crawler took to arrive at a particular page can provide a search company with useful information about the contents of that page.

A service such as 80legs could also be useful for university researchers. "Crawling at large scale is indeed an expensive hurdle to cross for experimental search projects in academia, which often are lacking large-scale infrastructure," says Kevin Chang, an associate professor of computer science at the University of Illinois at Urbana-Champaign.

Chang thinks the distributed nature of 80legs is "an interesting direction and sounds promising [for lowering] the cost of crawling." At the same time, he agrees that a lot depends on how efficiently the system operates and how effectively users can customize what data they want to process.

80legs plans to launch a market where nontechnical users will be able to purchase applications that can control how a crawler functions. Partner companies will also be able to sell access to applications that control 80legs's crawlers.

Print

Related Articles

In Search of What Everyone's Clicking

A real-time search engine bases its results on users' browsing habits.

Search Me

Inside the launch of Stephen Wolfram's new "computational knowledge engine."

On Answers

Four kinds of search engines.

Close Comments

To comment, please sign in or register

Forgot my password

erbium

340 Comments

  • 871 Days Ago
  • 09/27/2009

Great

another Majestik pounding away at websites?  When I get a bot hitting my sites I find this is a treasure trove IP address ranges for server farms I can also ban.  real customers / visitors are generally not coming from server farms. 

Having them come from end user ranges will make banning more than just the single IP address a bit more difficult.  Every one of these search engines think they have the god given right to pound away at your website, and also ignore your robots file.

And copy your content for god-knows-what use.

IncrediBill on webmaster world seems to express my views the best with his endless rants against stupid bots, & who mostly never produce a product end users can see, or are taking the info for unspecified hidden commercial use or spam harvesting emails.

Reply

unitedcolors

1 Comment

  • 869 Days Ago
  • 09/29/2009

On-Demand Web Spider

80legs.com technology seems very interesting.  Another service called BuildaSearch.com offers very similar services but at a fixed cost and without the distributed computers.  Would need to test 80legs.com to determine which on-demand spidering system works better and produces better results.  

Reply

alexiaalline

3 Comments

  • 853 Days Ago
  • 10/15/2009

b12

I think your reviews are extremely well done and your choice of language makes them even more interesting. An occasional bit of irony or an elegant twist in the phrase is a welcome relief.

Reply

Advertisement

MAGAZINE

Can We Build Tomorrow's Breakthroughs?

Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.

Videos

A Social-Media Decoder

More

Advertisement

Technology Review Lists

TR50

Our list of the 50 most innovative companies, including the following:

Groupon

Google

Siemens

Complete Genomics

More

Advertisement

Facebook

Advertisement