Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

{ action.text }

As the quantity of information on the Internet continues to grow, so does the question of how to process it all and make it useful. A startup called 80legs, based in Houston, TX, is hoping that an inexpensive, distributed Web crawling service could help startups mine the Web for information without having to build the giant server farms used by major search engines. The company launched this week at DEMO, a conference in San Diego that showcases new companies.

Web crawlers, or spiders, are software that automatically visit pages on the Internet and can be used to index them and gather bits of information from different pages. Crawlers are used by search engines, for example, to monitor the location of information on the Web. But the scale of the Web means that comprehensive crawling consumes a lot of processing power, which typically means building huge data centers to power the software.

80legs hopes to make this technology more accessible to small companies and individuals by allowing leasing access and letting customers pay only for what they crawl.

Web crawling technology is also crucial for semantic sites and services designed to process natural-language queries. While 80legs expects to see users interested in search and semantic applications, CEO Shion Deysarkar says that those testing the service also included customers with less technical interests. Some market researchers, for example, use 80legs to uncover mentions of specific companies or topics across the Web.

A user can start a Web crawl through 80legs’s Web-based interface. The form on the company’s site lets them set parameters for the project and upload custom code needed to control how the crawler does its job. For example, a user might want the crawler to find images and check them against a database of copyrighted ones. Deysarkar says his company’s crawlers are capable of processing up to two billion pages a day. The company charges $2 for every million pages crawled, plus a fee of three cents per hour of processing used.

Many startups struggle to find the funding needed to build large data centers, but that’s not the approach 80legs took to construct its Web crawling infrastructure. The company instead runs its software on a distributed network of personal computers, much like the ones used for projects such as SETI@home. The distributed computing network is put together by Plura Processing, which rents it to 80legs. Plura gets computer users to supply unused processing power in exchange for access to games, donations to charities, and other rewards.

3 comments. Share your thoughts »

Credit: Technology Review

Tagged: Business, Web, Internet, web services, DEMO

Reprints and Permissions | Send feedback to the editor

From the Archives

Close

Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me