Tuesday, May 19, 2009
How to Find Bugs in Giant Software Programs
A study of the distribution of bugs within large software programs should make it easier to find errors.
The efficiency of software development projects is largely determined by the way coders spot and correct errors.
But identifying bugs efficiently can be a tricky business, when the various components of a program can contain millions of lines of code. Now Michele Marchesi from the University of Calgiari and a few pals have come up with a deceptively simple way of efficiently allocating resources to error correction.
First, a little about the way that most projects are run. The days when programmers worked on huge single monolithic programs are long gone (for the most part anyway). Instead, large projects are now broken down into independent units that can be coded separately and then made to talk to each other when the system runs as a whole.
Marchesi and pals have analysed a database of java programs called Eclipse and found that the size of these programs follows a log normal distribution. In other words, the database and by extension, any large project, is made up of lots of small programs but only a few big ones.
So how are errors distributed among these programs? It would be easy to assume that the errors are evenly distributed per 1000 lines of code, regardless of the size of the program.
Not so say Marchesi and co. Their study of the Eclipse database indicates that errors are much more likely in big programs. In fact, in their study, the top 20 per cent of the largest programs contained over 60 per cent of the bugs.
That points to a clear strategy for identifying the most errors as quickly as possible in a software project: just focus on the biggest programs.
Simple really.
Ref: arxiv.org/abs/0905.2288: The Distribution of Program Sizes and Its Implications: An Eclipse Case Study
Comments
primprim
05/19/2009
Posts:1
On another note, I really hope this study didn't cost too much money. Because I'm sure if they would have just asked any software developer they would have told them the same thing. LOL! Of course, the bugs are in the app with the most code. Duh! :)
nssarg2
05/19/2009
Posts:2
bobbybobtheb...
05/19/2009
Posts:1
"It would be easy to assume that the errors are evenly distributed per 1000 lines of code, regardless of the size of the program."
I would assume that the more complex a program is (which might be translated by a high LOC) the more buggy the program can / will be.
metah
05/20/2009
Posts:1
jwilty
05/20/2009
Posts:1
z0rr0
05/20/2009
Posts:54
mikey386
05/20/2009
Posts:1
ArthurDent
05/24/2009
Posts:1
Well, no, that would probably be the paper on the "Wolf Trap algorithm" in the 80's in CACM. But this is close; what's more, unless the article is reporting the results rather wildly incorrectly, it's replicating results known since the 70's.
chasrmartin
05/24/2009
Posts:1
PS
Software bugs usually become known through some external manifestation which tends to indicate either a general location or logic path to follow to achieve isolation and correction. It's cause and effect tracking backwards. One doesn't approach a problem via, "OK. X is broken, so let's go look at all the large elements first because we know they tend to have more errors." That would just be silly.
VegasGuy
05/24/2009
Posts:1