A new Web-based effort promises to track the sources of congressional earmarks, compile databases of the Twitter posts of state lawmakers, and add sharper perspective to the Obama administration’s open-government efforts.
“Government puts out a ton of data that is really interesting about what it does, but people can’t understand it,” says Clay Johnson, director of Sunlight Labs, an arm of the open-government group Sunlight Foundation, based in Washington, DC.
The foundation has already tapped open-source developers to help process the often-fragmented and cryptic data released by the government. “We are doing anything we can to celebrate the opening of this data–and making it so it’s useful,” says Johnson.
Now the group is raising an army of Web-based volunteers to go through all the information contained in those releases. Congressional earmarks–funds for projects inserted anonymously as line items in various bills, without any hearings or reviews–are a big initial focus. In 2004, members of Congress wrote more than 14,000 earmarks costing more than $50 billion. Technically, it’s already possible to find the sources of earmarks, but this involves going through all 535 congressional websites and reading PDFs of the earmark requests posted.
The new Sunlight Labs transparency corps invites users to log in and join the effort to analyze this information collaboratively. Users are presented with the PDFs and prompted to read them carefully and then enter the pertinent information–the date and dollar amount of a request, name of the requester, description of the project, and so on–into fields on the screen. These then become part of a searchable database.
Another corps project aims to track the Twitter statements of all state lawmakers. Volunteers who log on are asked to seek out and enter the Twitter addresses of state senators and representatives; Sunlight Labs will seek verification from three or four people that the address is correct, and then start recording the lawmakers’ tweets (the short messages they send through Twitter).
The project is in its infancy, but with this data in hand, it will later become possible to search the tweets for the most popular words that each lawmaker uses, and to search their statements by topic and date. It may even be possible to compare their tweets with statements made in other contexts, such as speeches recorded in the Federal Register.
The project’s launch roughly coincided with the launch earlier this month of a White House effort to chart the progress of information-technology (IT) projects in various federal agencies. The new IT Dashboard, an online tool accessible to the public, allows users to see which IT projects are under way, check their status, and provide feedback to the chief information officers at different federal agencies.
The tool revealed particularly startling delays at the Veterans Administration, so the White House halted 45 over-budget or behind-schedule projects for review. “We were able to catch these contracts, in part, thanks to our new tool,” Vivek Kundra, the White House chief information officer, wrote in his blog earlier this month.