We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Emerging Technology from the arXiv

A View from Emerging Technology from the arXiv

Data Mining Reveals How Social Coding Succeeds (And Fails)

Collaborative software development can be hugely successful or fail spectacularly. An analysis of the metadata associated with these projects is teasing apart the difference.

  • September 4, 2014

The process of developing software has undergone huge transformation in the last decade or so. One of the key changes has been the evolution of social coding websites, such as GitHub and BitBucket.

These allow anyone to start a collaborative software project that other developers can contribute to on a voluntary basis. Millions of people have used these sites to build software, sometimes with extraordinary success.

Of course, some projects are more successful than others. And that raises an interesting question: what are the differences between successful and unsuccessful projects on these sites?

Today, we get an answer from Yuya Yoshikawa at the Nara Institute of Science and Technology in Japan and a couple of pals at the NTT Laboratories, also in Japan. These guys have analysed the characteristics of over 300,000 collaborative software projects on GitHub to tease apart the factors that contribute to success. Their results provide the first insights into social coding success from this kind of data mining.

A social coding project begins when a group of developers outline a project and begin work on it. These are the “internal developers” and have the power to update the software in a process known as a “commit”. The number of commits is a measure of the activity on the project.

External developers can follow the progress of the project by “starring” it, a form of bookmarking on GitHub. The number of stars is a measure of the project’s popularity. These external developers can also request changes, such as additional features and so on, in a process known as a pull request.

Yoshikawa and co begin by downloading the data associated with over 300,000 projects from the GitHub website. This includes the number of internal developers, the number of stars a project receives over time and the number of pull requests it gets.

The team then analyse the effectiveness of the project by calculating factors such as the number of commits per internal team member, the popularity of the project over time, the number of pull requests that are fulfilled and so on.

The results provide a fascinating insight into the nature of social coding. Yoshikawa and co say the number of internal developers on a project plays a significant role in its success. “Projects with larger numbers of internal members have higher activity, popularity and sociality,” they say.

However, there is a downside to large projects as well. One measure of the efficiency of a project is the number of commits per internal team member. Yoshikawa and co say the data shows that the most efficient projects involve a single person working alone.

As a project grows, efficiency is roughly constant in projects with between two and 60 members but falls sharply after this. “We conclude that it is undesirable to involve more than 60 developers in a project if we want the project members to work efficiently,” they say.

The team also study how work is distributed between internal members. In general, teams with more evenly distributed work are more likely to have higher activity.

And when projects receive requests for changes from external developers, those that fulfil these requests faithfully are likely to be more popular.

They also measured the types of projects that are more popular. Unsurprisingly, they say that software designed to run on Apple’s various products have the highest popularity.

That is an interesting insight into an increasingly common form of software development. GitHub alone says it has 6 million registered users.

Of course, but these guys have found correlations and an important question is one of causation. It is possible, for example, that the positive correlations they have found are the result of some hidden variables that are not revealed in this study.

The best way to find out is for somebody to put into practice the lessons learnt in this study and see whether they work. There is certainly good reason to think that many of their conclusions are related to good practice.

Over to the developers!

Ref: arxiv.org/abs/1408.6012 : Collaboration on Social Media: Analyzing Successful Projects on Social Coding

Become an MIT Technology Review Insider for in-depth analysis and unparalleled perspective.

Subscribe today
More from Intelligent Machines

Artificial intelligence and robots are transforming how we work and live.

Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.