Wikipedia, the encyclopaedia that anybody can edit, is one of the more extraordinary collective efforts of the crowd. Wikipedia’s own estimate is that it has some 77,000 contributors working on more than 22 million articles in 285 languages. The largest edition, the English version, alone offers over 4 million articles.
So it’s not surprising that disputes arise over the wording of these articles. Indeed, the controversy can sometimes reach war-like proportions with one editor changing the wording and another immediately changing it back again.
These so-called edit wars can be used to identify controversial topics but an interesting question is how controversy varies across languages and cultures. Given its unique position that straddles multiple languages and cultures, Wikipedia is in the perfect position to provide some answers.
Today, Taha Yasseri at the University of Oxford in the UK and a few pals have ranked the most controversial topics in 10 different languages according to the intensity of the editing wars they generate.
The result is a fascinating insight into the way conflicts emerge in different languages and how they are resolved. Yasseri and co also reveal the controversies that are common across language groups and how they vary around the world.
These guys begin by defining what they mean by a controversy. In Wikipedia, the editorial history of every article is easily accessible but the number of changes is by no means a measure of controversy; it may simply indicate a rapidly changing topic.
Instead, Yasseri and co focus on “reverts”, edits in which one author completely undoes changes made by another and so returns the article to an earlier version. Reverts are relatively common in Wikipedia and may not necessarily be indicative of controversy.
So Yasseri and co looked instead for “mutual reverts” in which one editor reverts another’s work and vice versa, so both editors are undoing each other’s changes.
That leads to a relatively straightforward definition of controversiality: “The controversiality of an article is defined by summing the weights of all mutually reverting editor pairs, excluding the topmost pair, and multiplying this number by the total number of editors involved in the article,” say Yasseri and co.
They then went through each language version of Wikipedia searching for mutual reverts and calculating the controversiality of the stories these reverts are associated with.
That gives a simple list of the most controversial articles in each language. In English, the top 10 most controversial articles are as follows:
- George W Bush
- List of World Wrestling Entertainment, Inc. employees
- Global Warming
- United States
- Race and intelligence
But it is also possible to group the languages into three sets 1) English, German, French, Spanish; 2) Czech, Hungarian, Romanian; 3) Arabic, Persian, Hebrew. Yasseri and co then compared the lists from each group to see which topics overlapped.
They say that, in general, major religions and religious figures as well as articles related to anti-Semitism and Israel are highly contested in many languages. In particular: “The articles Israel, Adolf Hitler, The Holocaust and God are highly contested in all the three language sets,” they report.
However, while there are a small number of topics that seem to be controversial in most languages, most of the controversial articles are language dependent. So they may be controversial in one language but not another, the Islas Malvinas/Falkland islands article in the Spanish Wikipedia, for example.
That’s an interesting insight into the topics that different language communities consider worth fighting about.
Yasseri and co have plans for the future. They say their measure of controversiality clearly varies with time as the nature of the topic and the editors working on it changes. So they plan to study this dynamic aspect to see how the patterns of controversy change over time. It’ll be interesting to see what emerges.
Ref: arxiv.org/abs/1305.5566:The Most Controversial Topics In Wikipedia: A Multilingual And Geographical Analysis