A View from Kate Greene
Who Benefits from AOL's Released Search Logs?
Privacy takes a hit. But statisticians are having a field day.
Last week, AOL’s PR team cringed as the world learned that the company had publicly posted search terms from 650,000 AOL users. They posted the search log on a research site, and subsequently took it down after a flurry of coverage in the blogosphere. Nonetheless, a number of sites have reposted the log.
While specific names of AOL users weren’t link directly to searched terms, there was no guarantee of anonymity: AOL had assigned each user a number, and often users searched their own names, and their hometowns. The New York Times was able to track down one AOL user who talked with a reporter about her searches for “numb fingers” and “dog that urinates on everything.”
The data dissemination led privacy advocates to trumpet the dangers of search companies storing people’s queries. At the same time, though, other people – Internet researchers, statisticians, sociologists, and political scientists – silently cheered.
Before the AOL release, all major search engines had kept their data from the public eye. This meant that researchers interested in the activities of users of search engines had to either rely on speculative data from open, infrequently used search engines, or make educated guesses. The AOL search log, which contains more than 30 million search terms, could thus provide some missing insight into how people use the Web, says Matt Hindman, a political scientist at the Arizona State University in Phoenix. A better understanding of Web dynamics has implications for political campaigns, education, and an entire economy built on advertising through Web searches. “For researchers like me,” Hindman says, “that’s exciting.”
Shortly after AOL’s goof, a site called AOL Stalker was created. Its main draw is that it allows people to search through the AOL database and view user searches as well as other search data. The author of the site has also posted the first in a series of basic data analyses. This initial number-crunching examines how well the rank of search results can predict a page’s click-through rate – in other words, it shows how well results match what people want to find. According the analysis, in 47 percent of searches, people didn’t click on any of the presented results. While the revelation that nearly half of all AOL searches don’t go anywhere isn’t earth-shattering, further analysis could provide insight into how to make search engines more useful or guide advertisers in their ad placements.
As giddy as this sort of data makes statistics hounds, the creator of AOL Stalker, at least, still seems mindful of the sensitive nature of the information. The site’s creator lets anyone request that certain information be hidden from the site’s search engine if it’s too revealing. As noted in the fine print: “If you find any data that actually makes it possible to identify a user, please let us know using the contact form, and we’ll remove those references.” – By Kate Greene