We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Emerging Technology from the arXiv

A View from Emerging Technology from the arXiv

The Shadowy World of Wikipedia's Editing Bots

Much of the editing work on Wikipedia is too mind-numbingly repetitive for humans, so automated bots do it instead. But keeping track of automated editing has always been hard … until now.

  • February 13, 2014

In a little over a decade, Wikipedia has evolved from an Internet experiment into a global crowdsourcing phenomenon. Today, this online encyclopedia provides free access to more than 30 million articles in 287 languages.

Less well known is Wikidata, an information repository designed to share basic facts for use on different language versions of Wikipedia. Wikidata therefore plays a crucial role in lubricating the flow of information between these online communities.

Maintaining all this data is a difficult job. It requires significant editing and polishing, mostly involving mindless, repetitive tasks such as formatting links and sources but also adding basic facts.

So much of this kind of work is automated. Behind the scenes, automated bots scan Wikipedia and Wikidata pages continually polishing the content for human consumption.

But that raises an important question. How much bot activity is there? What are these bots doing and how does it compare to human activity?

Today, we get an answer thanks to the work of Thomas Steiner at Google’s German operation in Hamburg. Steiner has created an application that monitors editing activity across all 287 language versions of Wikipedia and on Wikidata. And he publishes the results in real time online so that anybody can see exactly how many bots and humans are editing any of these sites at any instant.

For example, at the time of writing, across all language version of Wikipedia there are 10,407 edits being carried out by Bots and 11,148 by human Wikipedians. So that’s a 49/51 split between bots and humans.

But a closer look at the data reveals some interesting variations. For example, only 5 percent of the edits to the English language version of Wikipedia are being done by bots right now. By contrast, 94 percent of the edits to the Vietnamese version are by bots.

And on Wikidata, 77 percent of the 15,000 edits are being done by bots.

Steiner’s page also lists the most active bots. Wikipedia and Wikidata have long recognized the damage that bots can do and so have strict guidelines about their behavior. Wikidata even lists bots with approved tasks.

What’s curious about the automated edits on Wikidata is that the most active bots are not on this list. For example, at the time of writing a bot called Succubot is making 5797 edits to Wikidata entries and yet appears to be unknown to Wikidata. What is this bot doing?

Steiner’s page will give administrators a useful window into this seemingly shadowy behavior. In truth, any nefarious activity is usually spotted quickly and the perpetrator blocked. But this kind of oversight will still be hugely useful.

What’s more, Steiner has open-sourced the code so that anybody can use it to study the behavior of bots and humans in more detail.

An interesting corollary is that bots are becoming much more capable at producing articles of all kinds. The first Wikipedia bot, which was developed in 2002, automatically created entries for U.S. towns using a simple text template.

Today, there are automated feeds that produce stories about financial results and sporting results using simple templates: “Team A” beat “Team B” by “X amount” today in a match played at “Venue Y.” All that’s required is to cut and paste the relevant information into the correct places.

It’s not hard to see how this could become much more sophisticated. And while this kind of automated writing can be hugely useful, particularly for Wikipedia and its well documented problems with manpower, it could also be used maliciously too.

So ways of monitoring automated changes to text are likely to become more important in future.

Ref: arxiv.org/abs/1402.0412: Bots vs. Wikipedians, Anons vs. Logged-Ins

AI is here.
Own what happens next at EmTech Digital 2019.

Register now
More from Connectivity

What it means to be constantly connected with each other and vast sources of information.

Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.