Technology Review - Published By MIT
Advertisement

arXiv blog

The Physics arXiv Blog produces daily coverage of the best new ideas from an online forum called the Physics arXiv on which scientists post early versions of their latest ideas. Contact me at KentuckyFC @ arxivblog.com

Email Subscription

Recently on the arXiv blog...

Recent comments on the arXivblog

  • Mr CP : Wow!  These games are almost as fun as throwing a real ball around!  Almost.  :)
  • scoubidoo : I suggest you visit the NooJ website: http://www.nooj4nlp.net/pages/nooj.htmland discover the...
  • coolmike : The other issue that is being overlooked is the type of prepaid service, and how the cost impacts...
  • jhertzberg : I checked the date as I read this (nope, not April 1st). How long until we start using...
  • jtempere : Both yttrium barium copper oxide and the family of bismuth strontium calcium copper oxides have...
  • sleeprun : We read a Wharton study of doctors influencing new treatment adoption.  It was not the most...
  • ZephirAWT : isn't completelly new in this context, as the simmilar concept was proposed by Mark Hucko in 1985...
  • ZephirAWT : If I understood it well, by your theory matter universe is surrounded by antimatter universe and...
  • matt_s : Couldn't we theoretically save the earth from the eventual expansion of the sun in it's...
  • ... : This is not about the originators of ideas, but about how the ideas spread. A well connected...
  • debu : Please read my ether-gravity or theory of gravitoethertons which explains many aspects of quantum...
  • ms : So is AMSC selling superconducting wire that doesn't exist?
  • shazl : I believe the results are not only because of somebody being post-paid or pre-paid. It's...
  • ... : I am surprised no one is addressing an immediate need for energy here on earth, and what this...
  • IXANTI666 : THE NEXT STEP HUMAN TELEPORTATION.SINCE WE ARE MADE UP OF QUATUM MATTER TO BEGIN WITH TO BORROW...
  • 020648 : try www.prisonplanet.tv
  • ZephirAWT : And what prohibits scientists in ATTEMPT to replicate J.F.Prins experiments? Are they so...
  • ... : Make me some 90K Tc superconductor and I'll finish my PhD in a month! How's this for a conspiracy...
  • sfrysfry : For ideas on entangling larger structures, I introduce the conjecture of Nicholas Greaves, an...
  • snedunuri : This might explain how UFOs get the energy they need to travel interstellar distances. Either...
Advertisement
Monday, March 30, 2009

BlogRank v PageRank

The flaw behind the latest approach to blog search

Use a search engine to hunt for a decent blog on an unusual topic and the chances are that you'll find little of real value.

The problem is that blogs are not knitted together with hyperlinks in the same way as the rest of the web, which is why traditional search algorithms do a poor job of ranking them. For example, the famous Google algorithm, PageRank, works by counting the number of links that point to a webpage. The more a page receives, the higher it is ranked. But it also assumes that links from popular sites are more valuable, so a ranking is also weighted according to the popularity of the linking pages.

That works pretty well for ordinary web pages but fails miserably for most blogs because the links they receive offer little indication of their quality or content. For example, there is no way to distinguish between a link made with pleasure or in anger. And blogs generally receive fewer links than other types of page, even when they are widely read and influential among a specific but small group of people.

So how to improve blog search? Apostolos Kritikopoulos and pals at the Athens University of Economics and Business in Greece say the key is to bolster the PageRank approach with other information about how blogs may be related. They say for example that it is possible to expoit the blog's topic as determined by tags and also to exploit the contribution made by authors and commenters and the way in which these individuals add content to more than one blog.

Taking these factors into consideration radically changes the kind of rankings that a blog search throws up. Kritikopoulos and co show how different their approach is by ranking the top 1000 blogs in a dataset from 2006 using PageRank and then using their own algorithm called BlogRank. These lists (and another ranking method) share only 139 common entries.

The top 3 PageRanked blogs are:

  1. BoingBoing
  2. Engadget
  3. grahame.livejournal.com/

The top 3 BlogRank blogs are:

  1. nocapital.blogspot.com/
  2. pseudomanitou.livejournal.com/
  3. tbogg.blogspot.com/


What this tells us about how BlogRank would work for actual keyword searches isn't clear.

But it is apparent that the team will have its work cut out getting the kind of information that it claims gives BlogRank an advantage. For example, it's not always possible to get hold of a full list of commenters on many blogs. And even when it is possible, identifying authors and commenters is tricky because they often hide their identity (ahem).

That may be BlogRank's fatal failing.

Kritikopoulos and co conclude that their experimental results are "quite encouraging". Maybe. They also say: "Much more experimental evaluation of our method, as well as tuning of its parameters is needed."

That's for sure. It may be some time before we get a search engine that does half as good a job for blogs as it does for the rest of the web.

Ref: arxiv.org/abs/0903.4035: BlogRank: Ranking Weblogs Based on Connectivity and Similarity Features


Comments

  • Bloggers Can't be Trusted
    The problem with links from blogs and blog trustworthiness in general goes back to the time when Google was testing its Trustmark algorithms alongside its nodal magnitude algos.
    Bloggers got wind of text link spamming and certain large blogs cleaned Google out of a load of Adsense money and forced it to totally change the mechanisms it uses for its PPC lifeblood revenue.
    Consequently Google does not trust blogs for backlinks - unless they come from blogs that google has decided to rank as trustworthy!
    Rate this comment: 12345

    TheSystem
    04/03/2009
    Posts:1
Advertisement

Log In

Forgot your password?     Register »
Advertisement
Technology Review January/February 2010

Current Issue

Security in the Ether
Information technology's next grand challenge will be to secure the cloud--and prove we can trust it.
•  Subscribe
Save 36%
•  Table of Contents
•  MIT News
» Gift Subscription
» Digital Subscription
» Reprints, Back Issues
» Subscribe
» Table of Contents
» MIT News

More Technology News from Forbes

Advertisement
MIT Massachusetts Institute of Technology © 2010 Technology Review. All Rights Reserved.