Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Emerging Technology from the arXiv

A View from Emerging Technology from the arXiv

Can Automated Editorial Tools Help Wikipedia's Declining Volunteer Workforce?

An algorithm that assesses the quality of Wikipedia articles could reassure visitors and help focus editors on entries that need improving, say the computer scientists who developed it.

  • October 31, 2013

The result is many high quality articles on a huge range of topics in over 200 languages. But there are also articles of poor quality and dubious veracity.

This raises an important question for visitors to the site: how reliable is a given article on Wikipedia?

Today, we get an answer thanks to the work of Xiangju Qin and Pádraig Cunningham at University College Dublin in Ireland. These guys have developed an algorithm that assesses the quality of Wikipedia pages based on the authoritativeness of the editors involved and the longevity of the edits they have made.

“The hypothesis is that pages with significant contributions from authoritative contributors are likely to be high-quality pages,” they say. Given this information, visitors to Wikipedia should be able to judge the quality of any article much more accurately.

Various groups have studied the quality of Wikipedia articles and how to measure this in the past. The novelty of this work is in combining existing measures in a new way.

Qin and Cunningham begin with a standard way of measuring the longevity of an edit. The idea here is that a high-quality edit is more likely to survive future revision. They computed this by combining the size of an edit performed by a given author and how long this edit lasts after other revisions.

Vandalism is a common problem on Wikipedia. To get around this, Qin and Cunningham ignore all anonymous contributions and also take an average measure of quality which tends to reduce the impact of malicious edits.

Next, they measure the authority of each editor. Wikipedia is well known for having a relatively small number of dedicated editors who play a fundamental role in the community. These people help to maintain various editorial standards and spread this knowledge throughout the community.

In this community, Qin and Cunningham assume a link exists between two editors if they have both co-authored an article. So inevitably, more experienced editors tend to be better connected in the network.

There are various ways of measuring authority. Qin and Cunningham look at the number of other editors a given editor is linked to. They assess the proportion of shortest paths across the network that pass through a given editor. And they use an iterative Pagerank-type algorithm to measure authority. (Pagerank is the Google algorithm in which a webpage is considered important if other important webpages point to it.)

Finally, Qin and Cunningham combine these metrics of longevity and authority to produce a measure of the article quality.

To test the effectiveness of their assessment, they used their algorithm to assess the quality of over 9,000 articles that have already been assessed by Wikipedia editors. They say that the longevity of an edit by itself is already a good indicator of the quality of an article. However taking into account the authority of the editors generally improves the assessment.

“Articles with significant contributions from authoritative contributors are likely to be of high quality, and that high-quality articles generally involve more communication and interaction between contributors,” they conclude.

There are some limitations, of course. A common type of edit, known as a revert, changes an article to its previous version, thereby entirely removing an edit. This is often used to get rid of vandalism. “At present, we do not have any special treatment to deal with reverted edits that do not introduce new content to a page,” admit Qin and Cunningham. So there’s work to be done in future.

However, the new approach could be a useful tool in a Wikipedia editor’s armory. Qin and Cunningham suggest that it could help to identify new articles that are of relatively good quality and also identify existing articles that are of particularly low quality and so require further attention.

With the well-documented decline in Wikipedia’s volunteer workforce, automated editorial tools are clearly of value in reducing the workload for those that remain. A broader question is how good these tools can become and how they should be used in these kinds of crowd-sourced endeavors.

Ref: arxiv.org/abs/1206.2517: Assessing The Quality Of Wikipedia Pages Using Edit Longevity And Contributor Centrality

Tech Obsessive?
Become an Insider to get the story behind the story — and before anyone else.

Subscribe today
More from Connectivity

What it means to be constantly connected with each other and vast sources of information.

Want more award-winning journalism? Subscribe and become an Insider.
  • Insider Plus {! insider.prices.plus !}* Best Value

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Bimonthly print magazine (6 issues per year)

    Bimonthly digital/PDF edition

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special interest publications

    Discount to MIT Technology Review events

    Special discounts to select partner offerings

    Ad-free web experience

  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning print magazine, unlimited online access plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Bimonthly print magazine (6 issues per year)

  • Insider Online Only {! insider.prices.online !}*

    {! insider.display.menuOptionsLabel !}

    Unlimited online access including articles and video, plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

/3
You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.