Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Untangling Web Searches

A more discerning way to find information

Web searches often evoke a two-part reaction. First: Wow, that was fast! Followed sadly by: But none of this is what I want. Lightning-quick online searches typically lead Web users into piles of documents that are, to be kind, of dubious reliability. Unlike the carefully catalogued stacks in a library, the Web often appears to be untouched by human judgment.

This chaos has been the price Web users pay for an open system to which anyone can contribute. But it is an unnecessary price, says Jon M. Kleinberg, a professor of computer science at Cornell University. Kleinberg has devised an approach for sifting the contents of the Web that could go a long way toward solving what he calls the Web’s “abundance problem.”

Kleinberg’s technique relies on the premise that despite the jumbled appearance of the Web, critical thinking is in fact woven throughout it. Every time a page’s creator includes a link to another site, that is a vote of confidence in the linked-to page. Thus a rough measure of a site’s value can be derived by counting how many other sites are linked to it. “The Web is explicitly annotated with precisely the type of human judgment that we need in order to formulate a notion of authority,” says Kleinberg.

This story is part of our July/August 1998 Issue
See the rest of the issue
Subscribe

But this measure needs to be refined, because if it were used alone, the Yahoo search directory and the Netscape homepage would come out near the top every time. “We need a way to throw those pages out,” Kleinberg explains. The solution? Kleinberg applies a second level of filtering that assigns higher value to pages that include lots of links to other sites that are themselves relevant to the search.

By viewing the Web through its linkages and not merely by key words, Kleinberg’s search algorithm solves another common search problem. A conventional Web search on the word “jaguar,” for example, generates an unsorted roster of sites-most related to the sports car or to an obsolete computer with the same name. Information on the jungle cat that inspired these brands, however, is harder to come by. Kleinberg’s system automatically groups hit lists into “communities” of sites that reference one another, in this case providing a list subdivided by those related to cars, computers and cats.

Kleinberg developed the algorithm while at IBM’s Almaden Research Center in San Jose, Calif., which still owns it. For now, the enhanced searching tool remains experimental, but IBM researchers are shopping it around to companies that run online search services, including Alta Vista operator Digital Equipment Corp. Widespread availability is “inevitable,” says Prabhakar Raghavah, manager of computer science principles at IBM Almaden. “This is a great idea whose time will surely come.”

Tech Obsessive?
Become an Insider to get the story behind the story — and before anyone else.

Subscribe today
Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

/3
You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.