Websites where users can organize and share information are flourishing, but it can be hard to know which users and information to trust. Now a team of European researchers has developed an algorithm that ranks the expertise of users and can spot those who are using a site only to spam.
The technique works in a way similar to Amazon’s reputation engine or the ratings of Wikipedia pages, but it evaluates users based on a new set of criteria that makes intuitive assumptions about experts.
The algorithm draws on a method applied in ranking Web pages, but takes it an interesting step further, says Jon Kleinberg, a professor of computer science at Cornell University in Ithaca, NY, who was not involved with the work. “It distinguishes between ‘discoverers’ and ‘followers,’” Kleinberg says, “focusing on users who are the first to tag something that subsequently becomes popular.”
The new work focuses on collaborative tagging systems such as Delicious, a social bookmarking website, and Flickr, a photo-sharing site. These sites let users add relevant keywords to “tag” Web links or photos and then share them. Normally, users are ranked by how frequently or how recently they add content to the system. “It’s quantity over quality, so the more you do, the more credit you get,” says Michael Noll, a researcher in computer science at Hasso Plattner Institute in Potsdam, Germany, and a researcher on the new software. “But the fact is [that] quantity does not imply quality.”
The conventional approach also leaves the system very vulnerable to Web spammers, says Ciro Cattuto, a researcher at the Complex Network and Systems Group of the Institute for Scientific Interchange Foundation in Italy. Spammers adapt to the social behavior of other users, Cattuto says, so they see the most popular tags and start loading advertising content with those tags. To combat this, you need an algorithm that can search, rank, and present information in a usable way, says Cattuto. “The new method performs better than anything currently available–spammers rank very low, their content is not exposed, and eventually they stop polluting the system.”
The new algorithm is called Spamming-resistant Expertise Analysis and Ranking (SPEAR) and is based on the well-known information-retrieval algorithm called HITS that is used by search engines like Google to rank Web pages. Like HITS, SPEAR is a method of “mutual reinforcement,” says Kleinberg. In other words, the algorithm evaluates popular users and popular content and declares expert users to be the ones who identify the most important content, while important content is that which is identified by the most expert users. “The result is a way of identifying both expert users and high-quality content,” he says.