Skip to Content

Wikipedia’s Secret Multilingual Workforce

Wikipedia’s various language editions often carry entirely different content. Now one researcher has identified a small band of multilingual editors who are working to change that.

Wikipedia aims to provide free online access to all human knowledge. And a cursory look at its vital statistics appear to indicate that it’s well on its way to achieving that. The organisation has 77,000 active contributors working on over 22 million articles in 285 languages. All this attracts some 500 million unique visitors a month.

And yet a look beyond these figures reveals a subtle but important problem: there is surprisingly little overlap between the content in different language editions. No one edition contains all the information found in other language editions. And the largest language edition, English, contains only 51 per cent of the articles in the second largest edition, German.  

This problem is known as self-focus bias and it places a significant limit on the access to knowledge that Wikipedia provides. It means that Wikipedia not only offers people access to a mere fraction of human knowledge but to a mere fraction of its own articles.

There are a group of people who could change this, says Scott Hale at the University of Oxford in the UK. He believes that people who edit Wikipedia in more than one language are the key. “Such multilingual users may serve an important function in diffusing information across different language editions of the project,” he says.

But do they actually play this role? Today, Hale reveals the results of his study of multilingual editors of Wikipedia. He says they turn out to be a small but important minority of editors who play a crucial role in helping to reduce the level of self-focus bias in each edition.

Hale began by crawling the edits to Wikipedia between 8 July and 9 August this year, which are broadcast in near real-time over Internet Relay Chat. He excluded minor edits and those made by bots and unregistered users. That left 3.5 million significant edits by 55,000 editors.

Hale then looked for editors who were active in more than one language edition and found more than 8,000 of them or about 15 per cent of the total. It was these multilingual editors that he studied further.

It turns out that some editions have more multilingual editors than others and in general smaller editions have a higher percentage of multilingual editors. The most significant outliers with the highest proportion of multilinguals were Esperanto and Malay while Japan had significantly fewer multilingual editors than its size would suggest.

Significantly, these multilingual editors are more active than their monolingual counterparts making, on average, 2.3 times as many edits.

What’s more, almost half of the articles added by multilingual editors are not edited at all by monolingual editors. Multilinguals also tend to edit the same articles in different languages. That’s significant because it implies that they are transferring new articles from one edition to another.

“This suggests that multilingual users are making unique contributions not duplicated by monolingual users and that in many cases multilingual users are working on the same article in multiple languages,” says Hale.

That’s interesting work. “Overall, this study shows multilingual users play a unique role on Wikipedia editing articles different to those edited by monolingual users,” concludes Hale.

And that’s an important job. If Wikipedia is to tackle the problem of self-focusing bias, it will need more editors like them. But just where they will come from is another question altogether.

Ref:  arxiv.org/abs/1312.0976: Multilinguals and Wikipedia Editing

Keep Reading

Most Popular

conceptual illustration of a heart with an arrow going in on one side and a cursor coming out on the other
conceptual illustration of a heart with an arrow going in on one side and a cursor coming out on the other

Forget dating apps: Here’s how the net’s newest matchmakers help you find love

Fed up with apps, people looking for romance are finding inspiration on Twitter, TikTok—and even email newsletters.

digital twins concept
digital twins concept

How AI could solve supply chain shortages and save Christmas

Just-in-time shipping is dead. Long live supply chains stress-tested with AI digital twins.

still from Embodied Intelligence video
still from Embodied Intelligence video

These weird virtual creatures evolve their bodies to solve problems

They show how intelligence and body plans are closely linked—and could unlock AI for robots.

computation concept
computation concept

How AI is reinventing what computers are

Three key ways artificial intelligence is changing what it means to compute.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.