Skip to Content

Data Mining Reveals How Social Coding Succeeds (And Fails)

Collaborative software development can be hugely successful or fail spectacularly. An analysis of the metadata associated with these projects is teasing apart the difference.

The process of developing software has undergone huge transformation in the last decade or so. One of the key changes has been the evolution of social coding websites, such as GitHub and BitBucket.

These allow anyone to start a collaborative software project that other developers can contribute to on a voluntary basis. Millions of people have used these sites to build software, sometimes with extraordinary success.

Of course, some projects are more successful than others. And that raises an interesting question: what are the differences between successful and unsuccessful projects on these sites?

Today, we get an answer from Yuya Yoshikawa at the Nara Institute of Science and Technology in Japan and a couple of pals at the NTT Laboratories, also in Japan. These guys have analysed the characteristics of over 300,000 collaborative software projects on GitHub to tease apart the factors that contribute to success. Their results provide the first insights into social coding success from this kind of data mining.

A social coding project begins when a group of developers outline a project and begin work on it. These are the “internal developers” and have the power to update the software in a process known as a “commit”. The number of commits is a measure of the activity on the project.

External developers can follow the progress of the project by “starring” it, a form of bookmarking on GitHub. The number of stars is a measure of the project’s popularity. These external developers can also request changes, such as additional features and so on, in a process known as a pull request.

Yoshikawa and co begin by downloading the data associated with over 300,000 projects from the GitHub website. This includes the number of internal developers, the number of stars a project receives over time and the number of pull requests it gets.

The team then analyse the effectiveness of the project by calculating factors such as the number of commits per internal team member, the popularity of the project over time, the number of pull requests that are fulfilled and so on.

The results provide a fascinating insight into the nature of social coding. Yoshikawa and co say the number of internal developers on a project plays a significant role in its success. “Projects with larger numbers of internal members have higher activity, popularity and sociality,” they say.

However, there is a downside to large projects as well. One measure of the efficiency of a project is the number of commits per internal team member. Yoshikawa and co say the data shows that the most efficient projects involve a single person working alone.

As a project grows, efficiency is roughly constant in projects with between two and 60 members but falls sharply after this. “We conclude that it is undesirable to involve more than 60 developers in a project if we want the project members to work efficiently,” they say.

The team also study how work is distributed between internal members. In general, teams with more evenly distributed work are more likely to have higher activity.

And when projects receive requests for changes from external developers, those that fulfil these requests faithfully are likely to be more popular.

They also measured the types of projects that are more popular. Unsurprisingly, they say that software designed to run on Apple’s various products have the highest popularity.

That is an interesting insight into an increasingly common form of software development. GitHub alone says it has 6 million registered users.

Of course, but these guys have found correlations and an important question is one of causation. It is possible, for example, that the positive correlations they have found are the result of some hidden variables that are not revealed in this study.

The best way to find out is for somebody to put into practice the lessons learnt in this study and see whether they work. There is certainly good reason to think that many of their conclusions are related to good practice.

Over to the developers!

Ref: : Collaboration on Social Media: Analyzing Successful Projects on Social Coding

Keep Reading

Most Popular

transplant surgery
transplant surgery

The gene-edited pig heart given to a dying patient was infected with a pig virus

The first transplant of a genetically-modified pig heart into a human may have ended prematurely because of a well-known—and avoidable—risk.

open sourcing language models concept
open sourcing language models concept

Meta has built a massive new language AI—and it’s giving it away for free

Facebook’s parent company is inviting researchers to pore over and pick apart the flaws in its version of GPT-3

Muhammad bin Salman funds anti-aging research
Muhammad bin Salman funds anti-aging research

Saudi Arabia plans to spend $1 billion a year discovering treatments to slow aging

The oil kingdom fears that its population is aging at an accelerated rate and hopes to test drugs to reverse the problem. First up might be the diabetes drug metformin.

images created by Google Imagen
images created by Google Imagen

The dark secret behind those cute AI-generated animal images

Google Brain has revealed its own image-making AI, called Imagen. But don't expect to see anything that isn't wholesome.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.