Last week, Google launched a tool for sifting through the billions of lines of software code available on the Web. The free resource could help programmers design new software projects, test code, and fix bugs, says Tom Stocky, product manager at Google. And, ultimately, it might help them build better products.
An efficient code-searching tool is invaluable for programmers, from computer science students to professionals, Stocky says. At the foundation of all software are lines of instructions that direct the program to perform certain tasks, such as searching through a list or rearranging values. Although tasks such as these are commonly used, the code for them varies by software language and can differ slightly depending on the application. Being able to search for code allows people to find the worked-out solutions to these common problems as well as solutions to more obscure coding challenges.
“The first thing someone does when writing a new piece of software is to search for existing things that are related,” says Stocky. In the past, some programmers have used Google’s standard search bar to find code, but it’s inefficient because a lot of code resides in databases inaccessible to it.
Programmers can also turn to repositories of online code, Stocky says, but the previous methods for downloading software code and searching for specific functions is time consuming. Instead, Google Code Search crawls and indexes open-source archives that contain code in file formats, such as .zip and .tar, that general Web crawlers can’t investigate. In addition, the tool indexes the code found in two common websites that host source code, CVS and Subversion. One of the goals of Google Code Search, Stocky says, is to make the searching process easier: “We’re trying to give people one place where they can do that quickly.”
The Google Code Search tool lets people search for code using not only keywords, as in a typical Google search, but also “regular expressions” in which patterns within words can be searched. For instance, a search for “do?” would return “dot” or “dog,” Stocky says. Using this, “programmers can create really advanced queries that can search for obscure function definitions,” he says. In addition, searches can be narrowed down to one of 33 different programming languages and 18 different licenses.
Of course there’s the issue of ownership and licensing. “For each piece of code, we do our best to detect licenses,” Stocky says; but in some cases, a license can’t be identified. “For anyone who didn’t mean for their code to be posted publicly, we have methods for them to remove it.” It is similar to the way Google handles Web pages or images whose owners would like them to be unsearchable.
Stocky adds that the tool could actually help to prevent code plagiarism. By searching for code you’ve written, he says, you could see who has implemented it and how.
Google isn’t the first company to offer code search. Santa Monica, CA-based Koders was launched in April 2005, and Krugle in Menlo Park, CA, went live in February 2006. Although the features of these engines differ–Krugle, for instance, allows people to search for code by project, unlike Google’s tool–their goal is the same: to allow programmers to reuse code that’s already been written, to make better software more quickly.
The rising popularity of code search is important, says Ken Krugler, founder and chief technology officer of Krugle. In surveying programmers, his company found that 20-27 percent of their time was spent searching for reusable code. “Everyone talks of code reuse as being the silver bullet to the problems of improving the software creation process,” he says. “To me, search is a key part of that.”
Google Code Search began as an internal tool for the company’s engineers, many of whom already participate in open-source software projects, Stocky says. The engineers were constantly searching for chunks of free code to complete their software, and used the tool to do it.
In fact, open-source developers have been using the general Google search to try to find code for a while, says Karsten Wade, a senior developer at Red Hat, a provider of open-source technology. Google’s Code Search tool “gives a friendly face to code snippets,” he says, adding that it will likely spur open-source development further by allowing more code to be found more easily. People can simply post pieces of code or how-to programs on their blogs, he says, and Google will find it. Moreover, an increase in code sharing could produce other benefits, he says, such as helping people find common mistakes.
Google Code Search currently resides in Google Labs, where the company’s latest product ideas are tested. The tool isn’t perfect, admits Stocky–it can’t yet find all the source code that’s available (Google has a form that allows people to submit code they’ve missed). The company plans to add support for more repositories of source code. But aside from improving code access, it’s not clear how exactly the new tool will evolve. “We want to get a lot of feedback to know what features people want that aren’t there,” says Stocky. “I’ve thought a lot about the potential directions to go, but [we want to know] what people are asking for.”