arXiv blog

Scientist Finds PageRank-Type Algorithm from the 1940s

Google's PageRank algorithm was developed in 1998. But a project to trace the history of such algorithms reveals an example from the 1940s.

kfc 02/17/2010

  • 8 Comments

The PageRank algorithm is a key part of Google's method of ranking web pages in search results. It uses the network of links between web pages to determine their value and, famously, judges a page to be important if it is linked to by other important pages.

One crucial feature of this idea is that it requires an iterative approach to constantly re-evaluate the value of a page as the importance of others varies. Iterative ranking algorithms have since become an important part of network theory.

PageRank was developed in 1998 by Google's founders Sergey Brin and Larry Page and its impact has been such that it's easy to forget that the approach was not entirely novel. Massimo Franceschet at the University of Udine in Italy points out that the idea has been successfully exploited a number of times in 20th century science, even before Brin and Page were born. Today, he presents a short history of iterative ranking algorithms and charts their evolution prior to Google's emergence.

He begins in reverse chronological order with the work of Jon Kleinberg, a computer scientist at Cornell University, who developed an almost identical approach to PageRank, just a few years earlier. Brin and Page even reference his work in their famous paper introducing PageRank.

Kleinberg called his algorithm Hypertext Induced Topic Search or HITS and it treated web pages as "hubs" and "authorities". It used the circular definition that authorities are pages that are pointed to by hubs and hubs are pages that point to authorities and requires an iterative approach to solve.

In the heady days of the dotcom boom in the late 20th century, before Google became so successful, Kleinberg's work received considerable media coverage.

Franceschet also examines the work of Gabriel Pinski and Francis Narin who developed a way of ranking journals. Their rule was that a journal is important if it is cited by other important journals. Like PageRank and HITS, this requires an iterative method to exploit the structure of links between journals to come up with a ranking.

Long before this, however, Charles H Hubbell at the University of Califronia , Santa Barbara, was analysing social networks in a similar way. In 1965, he published a technique for determining the importance of individuals based on the importance of the people who endorse them. This again has the characteristic circular definition and iterative solution. Hubbell is acknowledged by many including Kleinberg as a pioneer in iterative ranking theory.

But the big surprise is Franceschet's discovery of an even earlier forerunner to PageRank in the work of the Harvard economist Wassily Leontief. In 1941, Leontief published a paper in which he divides a country's economy into sectors that both supply and receive resources from each other, although not in equal measure. One important question is: what is the value of each sector when they are so tightly integrated? Leontief's answer was to develop an iterative method of valuing each sector based on the importance of the sectors that supply it. Sound familiar? In 1973, Leontief was awarded the Nobel Prize in economics for this work.

What's clear is that the ideas behind PageRank have a venerable history but the surprise is that they date back to at least the 1940s. It'll be interesting to see if anybody can find any similar work that predates this.

Ref:arxiv.org/abs/1002.2858: PageRank: Stand On The Shoulders Of Giants

TRSF: Read the Best New Science Fiction inspired by today’s emerging technologies.

Print

Close Comments

To comment, please sign in or register

Forgot my password

GrantRobertson

3 Comments

  • 721 Days Ago
  • 02/17/2010

Aristocracy

Ever read anything by any of the Bronte sisters? All of aristocracy has functioned in the exact same manner as "PageRank" for hundreds, if not thousands, of years. One's social standing was based on who one knew. And their social standing was based on who they knew, and so on. All any of these researchers have done is codify what the Bronte sisters knew and clearly illustrated back in the 1800s.

Reply

imd

2 Comments

  • 721 Days Ago
  • 02/17/2010

Re: Aristocracy

When "all you do" is codify and formalize intuitive knowledge, you're doing a service to mankind.

Reply

bravid

6 Comments

  • 720 Days Ago
  • 02/18/2010

Re: Aristocracy

So are you saying Sergey Brin and Larry Page are actually the Bronte sisters?

Reply

eionmac

1 Comment

  • 711 Days Ago
  • 02/27/2010

Almanachs

The aristocracy almanacs ranked these from the 14th century on on same  basis as linked pages, who you knew and who they know and how important. The Bronte sisters had in in their mind but could also if wanted purchase an almanac

Reply

vigna

2 Comments

  • 700 Days Ago
  • 03/10/2010

This is a bit of a red herring

Well, actually Leontief's theory just shows how to define a sustainable economy through the fixpoint of a linear operator. I see no connection with PageRank, except that PageRank is a fixpoint, too. But then Markov's work (1906) predates PageRank even more, as any work about fixed point of  linear operators. Rather, Seeley (1949) introduced reputation as a recursive concept using exactly PageRank's formulation, and is the oldest currently known instance of the idea. Kats (1953) and Wei (1952) did essentially the same using different starting points. They are psychologists and sociologists, but they can all be found in Wasserman and Faust's book about social networks (1994), which makes this news a bit late and quite a red herring. A rather detailed mathematical account can be found here: http://arxiv.org/abs/0912.0238

Reply

franceschet

1 Comment

  • 696 Days Ago
  • 03/14/2010

Heavily smoked (herring)?

In fact, the connections are striking:

a) A Leontief input-output table is nothing more than a matrix;
b) Leontief closed system (which excludes exogenous demand and profit) is essentially Pinski and Narin bibliometric method, which is endorsed by Larry Page in PageRank patent;
c) The stochastic reformulation of Leontief closed system (along the route that Geller traced for Pinski and Narin technique) is a weighted teleportation-free version of the PageRank algorithm.

Furthermore:

a) The solution to the stochastic reformulation of Leontief closed system is the leading eigenvector of a stochastic matrix (of normalized production coefficients), in analogy with the solution of the PageRank problem (the leading eigenvector of Google matrix);
b) Individual solution scores correspond to total revenues of industries. In particular, the revenue of an industry B depends on the revenue of industries A that produce products for B weighted by the proportion of product that A produces for B. Hence, highly remunerated sectors are those that receive inputs from other highly remunerated industries with low propensity to differentiate their outputs among the other industries. Sounds familiar? It's PageRank logic!

Details on: http://arxiv.org/abs/1002.2858

Massimo Franceschet
http://users.dimi.uniud.it/~massimo.franceschet/

Reply

vigna

2 Comments

  • 665 Days Ago
  • 04/14/2010

Re: Heavily smoked (herring)?

These connections are very vague, and mostly related to the mathematical tools involved, which are quite common. They are no stronger, say, than those with Markov's work (1906!), as observed also by Massimo Marchiori (of HyperSearch fame). Seeley's paper, on the other hand, has the specific purpose of estimating reputation recursively starting from directed endorsement.

Reply

Advertisement

Kry4ma

1 Comment

  • 419 Days Ago
  • 12/16/2010

Good information

It is really an interesting and good information. I translated the important part of it into Bulgarian in my forum and put a link to this website there:

http://kry4ma.com/index.php/topic,2193.0.html

Thank you so much for your project!

Reply

Bio

The Physics arXiv Blog produces daily coverage of the best new ideas from an online forum called the Physics arXiv on which scientists post early versions of their latest ideas. Contact me at KentuckyFC @ arxivblog.com

Follow The Physics arXiv Blog on Twitter

Subscribe to the arXiv blog RSS Feed

Advertisement
Advertisement

Facebook

Advertisement