We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Emerging Technology from the arXiv

A View from Emerging Technology from the arXiv

The Database for Individuals Who Have Transcended Linguistic, Temporal, and Geographic Boundaries

A new list of the most culturally influential individuals since 4,000 B.C. could help revolutionize the study of human culture.

  • March 9, 2015

Various organizations and individuals have attempted to list the most important people in history. These rankings are notoriously difficult to justify because they are often biased by language, geography, gender, and so on.

That’s not just a problem for list collectors. Social scientists are also interested in these kinds of rankings because they provide insight into the way humans transmit nongenetic information from one generation to the next. This information includes science, literature, and other works of art but also beliefs, tastes, expectations, and so on.

The systematic study of this kind cultural production has the potential to revolutionize our understanding of humanity. But the lack of good databases to base this work on is something of a drawback.

Today, Amy Zhao Yu and co at MIT’s Media Lab attempt to transcend these limitations by creating a list with cultural input from almost every country and from over 280 languages.

They call this new database “Pantheon 1.0: a manually curated dataset of individuals that have transcended linguistic, temporal, and geographic boundaries.” And they expect it to kick-start the study of culture with the same kind of data mining techniques that are revolutionizing other areas of sociology and anthropology.

Their approach is straightforward. These guys download over 11,000 biographies from the online data repository Freebase and link them to their pages in 277 language versions of Wikipedia.

They then create a list of occupations subdivided by cultural domain. For example, the occupation ‘magician’ is a type of media personality in the category of public figure.

Next, they allocate a single occupation to each person on the list, even if they might be known for more than one thing. “For example, Barack Obama is a politician (although he is also listed as a writer on Freebase), and Shaquille O’Neal is a basketball player (although he is also listed as an actor on Freebase),” say Yu and co. “The challenge of fairly distributing the cultural impact of polymaths will be left for future consideration.”

Having created this database, the next question is how to rank the individuals on it. Yu and co do this in two ways. The first is simply to count the number of Wikipedia language editions that have an article about that person. The thinking is that this is an indication of the extent to which this person’s impact has been felt across cultural boundaries.

Yu and co call the second method the Historical Popularity Index. For this they take the number of Wikipedia language-mentions and adjust it for factors such as the number of page views, the concentration of page views in different languages and the number of non-English page views as well as the time since that person was born.

That produces some interesting rankings. Of people born before 500 AD, the person with the most Wikipedia language mentions is Jesus, followed by Confucius. However, the two people who head the Historical Popularity Index are Aristotle and Plato.

For the most recent period—1900 to 1950—the person mentioned in the most Wikipedia language versions is the Brazilian singer Hebe Camargo while the Marxist revolutionary Che Guevara tops the Historical Popularity Index. Adolf Hitler tops the previous period of 1850 to 1899 on both counts (the period refers to birthdates).

Others who head the lists from the period in which they lived include: Muhammad, Leonardo Da Vinci, Isaac Newton, William Shakespeare, and Mozart.

Yu and co are quick to point out the biases in their database. For example, using data from Wikipedia introduces an important bias because Wikipedia editors are not a representative sample of the world population. And using biographies as a proxy for cultural production only works when there is a clear link between an individual and a specific piece of cultural expression but otherwise tends to exclude group efforts.

That’s an interesting piece of work that produces many potential talking points—these kinds of rankings are always fascinating to pore over. And it opens up the possibility of studying the evolution of cultural production and how this kind of nongenetic information has swept through different parts of the world at different times.

It may not be perfect but it is an interesting first step toward a new kind of data-based science.

Ref: arxiv.org/abs/1502.07310 : Pantheon: A Dataset for the Study of Global Cultural Production

Tech Obsessive?
Become an Insider to get the story behind the story — and before anyone else.

Subscribe today
Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.