THE DISSOLUTION OF PRIVACY
Almost every computer-science student takes a course in algorithms. Algorithms are sets of specified, repeatable rules or procedures for accomplishing tasks such as sorting numbers; they are, so to speak, the engines that make programs run. Unfortunately, innovations in algorithms are not subject to Moore’s law, and progress in the field is notoriously sporadic.”There are certain areas in algorithms we basically can’t do better and others where creative work will have to be done,” Ullman says. Sifting through large surveillance databases for information, he says, will essentially be “a problem in research in algorithms. We need to exploit some of the stuff that’s been done in the data-mining community recently and do it much, much better.” Working with databases requires users to have two mental models. One is a model of the data. Teasing out answers to questions from the popular search engine Google, for example, is easier if users grasp the varieties and types of data on the Internet—Web pages with words and pictures, whole documents in a multiplicity of formats, downloadable software and media files—and how they are stored. In exactly the same way, extracting information from surveillance databases will depend on a user’s knowledge of the system. “It’s a chess game,” Ullman says.”An unusually smart analyst will get things that a not-so-smart one will not.”
Second, and more important according to Spafford, effective use of big surveillance databases will depend on having a model of what one is looking for. This factor is especially crucial, he says, when trying to predict the future, a goal of many commercial and government projects. For this reason, what might be called reactive searches that scan recorded data for specific patterns are generally much more likely to obtain useful answers than proactive searches that seek to get ahead of things. If, for instance, police in the Washington sniper investigation had been able to tap into a pervasive network of surveillance cameras, they could have tracked people seen near the crime scenes until they could be stopped and questioned: a reactive process.But it is unlikely that police would have been helped by proactively asking surveillance databases for the names of people in the Washington area with the requisite characteristics (family difficulties, perhaps, or military training and a recent penchant for drinking) to become snipers.
In many cases, invalid answers are harmless. If Victoria’s Secret mistakenly mails 1 percent of its spring catalogs to people with no interest in lingerie, the price paid by all parties is small. But if a national terrorist-tracking system has the same 1 percent error rate, it will produce millions of false alarms, wasting huge amounts of investigators’ time and, worse, labeling many innocent U.S. citi- zens as suspects.”A 99 percent hit rate is great for advertising,”Spafford says,”but terrible for spotting terrorism.”
Because no system can have a success rate of 100 percent, analysts can try to decrease the likelihood that surveillance databases will identify blameless people as possible terrorists. By making the criteria for flagging suspects more stringent, officials can raise the bar, and fewer ordinary citizens will be wrongly fingered. Inevitably, however, that will mean also that the “borderline” terrorists—those who don’t match all the search criteria but still have lethal intentions—might be overlooked as well. For both types of error, the potential consequences are alarming.