Technology Review - Published By MIT
Advertisement

Googling for Code

Will Google's new search tool improve software?

By Kate Greene

Tuesday, October 10, 2006

smaller text tool iconmedium text tool iconlarger text tool icon

Last week, Google launched a tool for sifting through the billions of lines of software code available on the Web. The free resource could help programmers design new software projects, test code, and fix bugs, says Tom Stocky, product manager at Google. And, ultimately, it might help them build better products.

An efficient code-searching tool is invaluable for programmers, from computer science students to professionals, Stocky says. At the foundation of all software are lines of instructions that direct the program to perform certain tasks, such as searching through a list or rearranging values. Although tasks such as these are commonly used, the code for them varies by software language and can differ slightly depending on the application. Being able to search for code allows people to find the worked-out solutions to these common problems as well as solutions to more obscure coding challenges.

"The first thing someone does when writing a new piece of software is to search for existing things that are related," says Stocky. In the past, some programmers have used Google's standard search bar to find code, but it's inefficient because a lot of code resides in databases inaccessible to it.

Programmers can also turn to repositories of online code, Stocky says, but the previous methods for downloading software code and searching for specific functions is time consuming. Instead, Google Code Search crawls and indexes open-source archives that contain code in file formats, such as .zip and .tar, that general Web crawlers can't investigate. In addition, the tool indexes the code found in two common websites that host source code, CVS and Subversion. One of the goals of Google Code Search, Stocky says, is to make the searching process easier: "We're trying to give people one place where they can do that quickly."

The Google Code Search tool lets people search for code using not only keywords, as in a typical Google search, but also "regular expressions" in which patterns within words can be searched. For instance, a search for "do?" would return "dot" or "dog," Stocky says. Using this, "programmers can create really advanced queries that can search for obscure function definitions," he says. In addition, searches can be narrowed down to one of 33 different programming languages and 18 different licenses.

Of course there's the issue of ownership and licensing. "For each piece of code, we do our best to detect licenses," Stocky says; but in some cases, a license can't be identified. "For anyone who didn't mean for their code to be posted publicly, we have methods for them to remove it." It is similar to the way Google handles Web pages or images whose owners would like them to be unsearchable.

Stocky adds that the tool could actually help to prevent code plagiarism. By searching for code you've written, he says, you could see who has implemented it and how.

Comments

  • bugs?
    And once you find some code, how do you know if it is correct? Is there some sort of rating system, like for eBay vendors or Amazon books?
    Rate this comment: 12345

    ms
    10/10/2006
    Posts:141
    Avg Rating:
    4/5
    • Re: bugs?
      I don't know what Google does, though they seem to imply that the PageRank of the site where they downloaded the archive plays a role.

      For Krugle, we use the containing project's score to adjust the static (non-query specific) boost for files. The project score depends on factors like downloads, references on tech web pages, hoster reputation, etc.

      This doesn't weed out all the cruft, but it does increase the odds that a highly ranked file is coming from a popular project, which in turn is a fuzzy indicator of code quality.
      Rate this comment: 12345

      kkrugler
      10/10/2006
      Posts:2
  • Not a Silver Bullet
    Hi - Ken Krugler here. My quote in this article seems to imply that I think code search is "the silver bullet"...which isn't the case. Like Frederick Brooks, I also don't believe there is such a thing.

    What I do believe, and know from using Krugle intenally, is that code search can make programmers better. Every time I quickly find a working example of code using an API that I'm struggling with, or a component that implements functionality I need, then I'm working faster, and writing better code.

    This isn't the solution to all programming problems, but it is a solution that will grow stronger over time as the quantity and quality of publicly available code continues to increase.
    Rate this comment: 12345

    kkrugler
    10/10/2006
    Posts:2
  • And not a complete solution
    I think Mr. Krugler has pointed out one great use of Google Code, that is, for examples of how to use an existing API.  However, I don't believe that this is a complete solution for finding solutions to similar problems.  I also agree that there is no silver bullet.  On top of this, I believe that people need to be better educated about software engineering practices in order to really make a difference (myself included).  Fred Brooks seemed to advocate this as well.

    It seems that Google has now provided a way to find security flaws in existing code as well. Take this article: http://www.securityfocus.com/news/11417.  An example from it: a search like 'todo +security' allows you to find existing security flaws in open source software.  It would be nice to think that in the long run this will help improve software, but will people continue writing bad code without the proper training?

    Dale Beermann
    Rate this comment: 12345

    beermann
    10/13/2006
    Posts:1

Log In

Forgot your password?     Register »
Advertisement

Videos

Malleable Maps, Artistic Robots and Bubble Interfaces
Technology Review January/February 2010

Current Issue

Security in the Ether
Information technology's next grand challenge will be to secure the cloud--and prove we can trust it.
Advertisement
Advertisement
Advertisement
Subscribe to Technology Review's daily e-mail update. Enter your e-mail address

TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology © 2010 Technology Review. All Rights Reserved.