The Chinese Solar Machine Layer by Layer Fire in the Library The Mystery Behind Anesthesia
Technology Review
New software distinguishes between experts and spammers, showing who can be trusted.
Websites where users can organize and share information are flourishing, but it can be hard to know which users and information to trust. Now a team of European researchers has developed an algorithm that ranks the expertise of users and can spot those who are using a site only to spam.
The technique works in a way similar to Amazon's reputation engine or the ratings of Wikipedia pages, but it evaluates users based on a new set of criteria that makes intuitive assumptions about experts.
The algorithm draws on a method applied in ranking Web pages, but takes it an interesting step further, says Jon Kleinberg, a professor of computer science at Cornell University in Ithaca, NY, who was not involved with the work. "It distinguishes between 'discoverers' and 'followers,'" Kleinberg says, "focusing on users who are the first to tag something that subsequently becomes popular."
The new work focuses on collaborative tagging systems such as Delicious, a social bookmarking website, and Flickr, a photo-sharing site. These sites let users add relevant keywords to "tag" Web links or photos and then share them. Normally, users are ranked by how frequently or how recently they add content to the system. "It's quantity over quality, so the more you do, the more credit you get," says Michael Noll, a researcher in computer science at Hasso Plattner Institute in Potsdam, Germany, and a researcher on the new software. "But the fact is [that] quantity does not imply quality."
The conventional approach also leaves the system very vulnerable to Web spammers, says Ciro Cattuto, a researcher at the Complex Network and Systems Group of the Institute for Scientific Interchange Foundation in Italy. Spammers adapt to the social behavior of other users, Cattuto says, so they see the most popular tags and start loading advertising content with those tags. To combat this, you need an algorithm that can search, rank, and present information in a usable way, says Cattuto. "The new method performs better than anything currently available--spammers rank very low, their content is not exposed, and eventually they stop polluting the system."
The new algorithm is called Spamming-resistant Expertise Analysis and Ranking (SPEAR) and is based on the well-known information-retrieval algorithm called HITS that is used by search engines like Google to rank Web pages. Like HITS, SPEAR is a method of "mutual reinforcement," says Kleinberg. In other words, the algorithm evaluates popular users and popular content and declares expert users to be the ones who identify the most important content, while important content is that which is identified by the most expert users. "The result is a way of identifying both expert users and high-quality content," he says.
This is great - very interesting. The perfect place to store such a ranking is into FluidDB (see http://doc.fluidinfo.com/fluidDB/) which is designed to let anyone add any information about anything to any of its objects. For example, FluidDB has info on about 0.5M Twitter users (a small number, I know) and a SPEAR ranking could be added to these objects. It could then immediately be searched on, combined with other data, etc. FluidDB is designed as an always-writable database for exactly this kind of thing. An app, http://tickery.net is putting lots of Twitter information into FluidDB and its advanced tab would allow queries on a SPEAR ranking, but any other application could also query the SPEAR ranking via the FluidDB API.
Sorry for so many words, and for sounding like an advetisement! I hope this will sound interesting.
Regards & congrats on the results.
Terry Jones (terry -at- fluidinfo com)
Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.
Our list of the 50 most innovative companies, including the following:
joelsapp
21 Comments
A Better Way to Rank Expertise Online
Ok, so I understand how they show quality creators except for the part of a trendsetter sharing something that ultimately becomes popular.
So how do they rank popularity ?
If they rank it by how many times it is open, then this model will fail. If they rank it by how much the item is tagged, this can be spammed as well.
Reply