Technology Review

Web

A Better Way to Rank Expertise Online

New software distinguishes between experts and spammers, showing who can be trusted.

  • Friday, July 31, 2009
  • By Brittany Sauser

Websites where users can organize and share information are flourishing, but it can be hard to know which users and information to trust. Now a team of European researchers has developed an algorithm that ranks the expertise of users and can spot those who are using a site only to spam.

The technique works in a way similar to Amazon's reputation engine or the ratings of Wikipedia pages, but it evaluates users based on a new set of criteria that makes intuitive assumptions about experts.

The algorithm draws on a method applied in ranking Web pages, but takes it an interesting step further, says Jon Kleinberg, a professor of computer science at Cornell University in Ithaca, NY, who was not involved with the work. "It distinguishes between 'discoverers' and 'followers,'" Kleinberg says, "focusing on users who are the first to tag something that subsequently becomes popular."

The new work focuses on collaborative tagging systems such as Delicious, a social bookmarking website, and Flickr, a photo-sharing site. These sites let users add relevant keywords to "tag" Web links or photos and then share them. Normally, users are ranked by how frequently or how recently they add content to the system. "It's quantity over quality, so the more you do, the more credit you get," says Michael Noll, a researcher in computer science at Hasso Plattner Institute in Potsdam, Germany, and a researcher on the new software. "But the fact is [that] quantity does not imply quality."

Advertisement

The conventional approach also leaves the system very vulnerable to Web spammers, says Ciro Cattuto, a researcher at the Complex Network and Systems Group of the Institute for Scientific Interchange Foundation in Italy. Spammers adapt to the social behavior of other users, Cattuto says, so they see the most popular tags and start loading advertising content with those tags. To combat this, you need an algorithm that can search, rank, and present information in a usable way, says Cattuto. "The new method performs better than anything currently available--spammers rank very low, their content is not exposed, and eventually they stop polluting the system."

The new algorithm is called Spamming-resistant Expertise Analysis and Ranking (SPEAR) and is based on the well-known information-retrieval algorithm called HITS that is used by search engines like Google to rank Web pages. Like HITS, SPEAR is a method of "mutual reinforcement," says Kleinberg. In other words, the algorithm evaluates popular users and popular content and declares expert users to be the ones who identify the most important content, while important content is that which is identified by the most expert users. "The result is a way of identifying both expert users and high-quality content," he says.

Print

Related Articles

Computers Can't Answer Everything

A startup says natural language processing works best with human intelligence.

Cuil Tries to Rise Again

Last year's "Google-killer" plans a comeback with social search.

A Smarter Way to Dig Up Experts

Data-mining techniques could make it easier to locate expertise.

Close Comments

To comment, please sign in or register

Forgot my password

joelsapp

21 Comments

  • 925 Days Ago
  • 08/04/2009

A Better Way to Rank Expertise Online

Ok, so I understand how they show quality creators except for the part of a trendsetter sharing something that ultimately becomes popular.

So how do they rank popularity ?

If they rank it by how many times it is open, then this model will fail. If they rank it by how much the item is tagged, this can be spammed as well.

Reply

terrycojones

1 Comment

  • 706 Days Ago
  • 03/11/2010

Where to store the rankings

This is great - very interesting. The perfect place to store such a ranking is into FluidDB (see http://doc.fluidinfo.com/fluidDB/) which is designed to let anyone add any information about anything to any of its objects. For example, FluidDB has info on about 0.5M Twitter users (a small number, I know) and a SPEAR ranking could be added to these objects. It could then immediately be searched on, combined with other data, etc. FluidDB is designed as an always-writable database for exactly this kind of thing. An app, http://tickery.net is putting lots of Twitter information into FluidDB and its advanced tab would allow queries on a SPEAR ranking, but any other application could also query the SPEAR ranking via the FluidDB API.

Sorry for so many words, and for sounding like an advetisement! I hope this will sound interesting.

Regards & congrats on the results.

Terry Jones (terry -at- fluidinfo com)

Reply

Advertisement

MAGAZINE

Can We Build Tomorrow's Breakthroughs?

Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.

Videos

A Social-Media Decoder

More

Advertisement

Technology Review Lists

TR50

Our list of the 50 most innovative companies, including the following:

Silver Spring Networks

eSolar

Joule Unlimited

Facebook

More

Advertisement

Facebook

Advertisement