Technology Review - Published By MIT
Advertisement

arXiv blog

The Physics arXiv Blog produces daily coverage of the best new ideas from an online forum called the Physics arXiv on which scientists post early versions of their latest ideas. Contact me at KentuckyFC @ arxivblog.com

Email Subscription

Recently on the arXiv blog...

Recent comments on the arXivblog

  • MattGaucho : 500 Kelvin is less than 450F.  You can reach this temperature in a toaster oven.
  • ... : Thomas Paine gives an accurate synopsis of how science is taught without asking certain questions...
  • fcofbe : Thanks for the tip, norm.  We should have known about the MD5 problem.  We have now added SHA-2...
  • tonyzito : I'm not sure this thing fits the definition of a perfect black body either.  A perfect black body...
  • norm : His predictions are secret documents based on MD5 checksums. MD5 was cracked in 2005, when...
  • Bonobo : "When greedy, immoral, short-term thinking people are involved, there's going to be a crash." ...
  • lasertekk : Thank you Sam.  Let's take that one more step forward.  When greedy, immoral, short-term thinking...
  • ssamd : Here is another way to forecast a crash.  When regulations are striped there will be a crash. ...
  • ... : What bubble in shanghai? For instance,The Economist doesn't think a bubble exists (yet) Analysing...
  • gary7 : Clothes, paper, wood and brick are all non-metallic, easily penetrated by terahertz radiation. I...
  • dan_kolis : As important as this area of inquiry is, it is burdened by being right on the tough love spot...
  • ZephirAWT : Personally, I disagree. In AWT time dimension is defined by direction of gradient of Aether...
  • Fabrizio : Perhaps this quest for an explanation to the arrow of time arises from the same misconception...
  • Mapou : You are 100% correct but you will not get any love from the physics community for saying this....
  • captpaul : The very first question of the paper is flawed and will lead to another Not Even Wrong theory....
  • gemay : The operative phrase is "...fitted to most aircraft."  Since there is no FAA requirement that all...
  • ... : zoroasterisk,That is an amazing statement. First the discussion is about a God who loved and...
  • ZephirAWT : We cannot neglect fact, one half of Universe evaporates and separates by antigravity (radiation...
  • TooMany : Please check into Roman technology; they keep finding more.  The other day I read that the...
  • TooMany : The claimed non-null results sound quite significant, unless the measurements were in the noise.
Advertisement
Tuesday, May 19, 2009

How to Find Bugs in Giant Software Programs

A study of the distribution of bugs within large software programs should make it easier to find errors.

The efficiency of software development projects is largely determined by the way coders spot and correct errors.

But identifying bugs efficiently can be a tricky business, when the various components of a program can contain millions of lines of code. Now Michele Marchesi from the University of Calgiari and a few pals have come up with a deceptively simple way of efficiently allocating resources to error correction.

First, a little about the way that most projects are run. The days when programmers worked on huge single monolithic programs are long gone (for the most part anyway). Instead, large projects are now broken down into independent units that can be coded separately and then made to talk to each other when the system runs as a whole.

Marchesi and pals have analysed a database of java programs called Eclipse and found that the size of these programs follows a log normal distribution. In other words, the database and by extension, any large project, is made up of lots of small programs but only a few big ones.

So how are errors distributed among these programs? It would be easy to assume that the errors are evenly distributed per 1000 lines of code, regardless of the size of the program.

Not so say Marchesi and co. Their study of the Eclipse database indicates that errors are much more likely in big programs. In fact, in their study, the top 20 per cent of the largest programs contained over 60 per cent of the bugs.

That points to a clear strategy for identifying the most errors as quickly as possible in a software project: just focus on the biggest programs.

Simple really.

Ref: arxiv.org/abs/0905.2288: The Distribution of Program Sizes and Its Implications: An Eclipse Case Study

Comments

  • consequences
    Would the effect not be somewhat destroyed after applying this rule, making future applications of the rule less and less useful?
    Rate this comment: 12345

    primprim
    05/19/2009
    Posts:1
    Avg Rating:
    3/5
    • Re: consequences
      Not true.  Today's software developers see the value in modular approaches, that is, breaking things down into small pieces for development.  This reduces complexity and increases maintainability.  Our applications are NOT getting smaller, but bigger.  So our focus has changed from writing extremely large applications to writing applications where functionality can be reduced to a simple DLL (dynamically linked library) or web service and then providing an appilcation that makes uses of those core services or APIs.

      On another note, I really hope this study didn't cost too much money.  Because I'm sure if they would have just asked any software developer they would have told them the same thing. LOL! Of course, the bugs are in the app with the most code.  Duh!  :)
      Rate this comment: 12345

      nssarg2
      05/19/2009
      Posts:2
      Avg Rating:
      3/5
      • Re: consequences
        kfc seemed to imply that the *density* of errors is larger in larger programs, which may not be that obvious at first
        Rate this comment: 12345

        bobbybobtheb...
        05/19/2009
        Posts:1
        Avg Rating:
        4/5
  • Really?
    I don't agree with below statement.

    "It would be easy to assume that the errors are evenly distributed per 1000 lines of code, regardless of the size of the program."
    I would assume that the more complex a program is (which might be translated by a high LOC) the more buggy the program can / will be.
    Rate this comment: 12345

    metah
    05/20/2009
    Posts:1
  • Large != More important
    This theory completely neglects the fact that some modules within a program are significantly more important to overall function than others.  Many small bugs in the GUI is less important than one bug in the security core if you are an electronic medical record, and vice verse if you are Apple.
    Rate this comment: 12345

    jwilty
    05/20/2009
    Posts:1
    Avg Rating:
    3/5
  • Been there..
    The concepts is already well understood, and fully exploited in other sectors. Credit cards, cell phone contracts, health  insurance plans, tax laws all have intentional, designed complexity to create consumer bugs. Each customer error generates new surcharges, fees and ultimately, more unearned profits. Giant systems are not without benefits,to some.
    Rate this comment: 12345

    z0rr0
    05/20/2009
    Posts:53
    Avg Rating:
    4/5
  • A Corollary Would Be
    that program size should be kept as small as possible.  Unfortunately this usually results in reduced performance.
    Rate this comment: 12345

    mikey386
    05/20/2009
    Posts:1
  • Wow amazing
    I may be missing something here, but this seems to me like the most unsurprising result ever published in any computer science discipline.
    Rate this comment: 12345

    ArthurDent
    05/24/2009
    Posts:1
    Avg Rating:
    5/5
    • Re: Wow amazing
      I may be missing something here, but this seems to me like the most unsurprising result ever published in any computer science discipline.

      Well, no, that would probably be the paper on the "Wolf Trap algorithm" in the 80's in CACM. But this is close; what's more, unless the article is reporting the results rather wildly incorrectly, it's replicating results known since the 70's.
      Rate this comment: 12345

      chasrmartin
      05/24/2009
      Posts:1
      Avg Rating:
      5/5
  • Where to Begin
    Large program size, never mind "giant", is indicative of poor engineering. (And yes, I do believe in software engineering rather than "coding" as a far superior approach to software product development.) Any project that allows such constructs should expect low quality (high bug count) results. In software, prevention is the best, and least expensive, medicine. Once the coding has begun it's too late. And finding errors after some behemoth has been written is by far the worst scenario. So, yes, this is not a particularly interesting, unexpected or useful outcome of a study whose time came and passed long ago.

    PS

    Software bugs usually become known through some external manifestation which tends to indicate either a general location or logic path to follow to achieve isolation and correction. It's cause and effect tracking backwards. One doesn't approach a problem via, "OK. X is broken, so let's go look at all the large elements first because we know they tend to have more errors." That would just be silly. 
    Rate this comment: 12345

    VegasGuy
    05/24/2009
    Posts:1
Advertisement

Log In

Forgot your password?     Register »
Advertisement
Technology Review November/December 2009

Current Issue

Natural Gas Changes the Energy Map
The United States has vast supplies of this cleaner fossil fuel. But how should we use it?
•  Subscribe
Save 36%
•  Table of Contents
•  MIT News
» Gift Subscription
» Digital Subscription
» Reprints, Back Issues
» Subscribe
» Table of Contents
» MIT News

More Technology News from Forbes

Advertisement
MIT Massachusetts Institute of Technology © 2009 Technology Review. All Rights Reserved.