THE DISSOLUTION OF PRIVACY
Almost every computer-science student takes a course in algorithms.
Algorithms are sets of specified,
repeatable rules or procedures for accomplishing tasks such as sorting numbers;
they are, so to speak, the engines that
make programs run. Unfortunately, innovations in algorithms are not subject to
Moore's law, and progress in the field is
notoriously sporadic."There are certain
areas in algorithms we basically can't do
better and others where creative work
will have to be done," Ullman says. Sifting
through large surveillance databases for
information, he says, will essentially be "a
problem in research in algorithms. We
need to exploit some of the stuff that's
been done in the data-mining community
recently and do it much, much better."
Working with databases requires
users to have two mental models. One is a model of the data. Teasing out answers
to questions from the popular search
engine Google, for example, is easier if
users grasp the varieties and types of
data on the Internet—Web pages with
words and pictures, whole documents in
a multiplicity of formats, downloadable
software and media files—and how they
are stored. In exactly the same way,
extracting information from surveillance
databases will depend on a user's knowledge of the system. "It's a chess game,"
Ullman says."An unusually smart analyst
will get things that a not-so-smart one
will not."
Second, and more important according to Spafford, effective use of big surveillance databases will depend on having
a model of what one is looking for. This
factor is especially crucial, he says, when
trying to predict the future, a goal of
many commercial and government projects. For this reason, what might be called
reactive searches that scan recorded data
for specific patterns are generally much
more likely to obtain useful answers than
proactive searches that seek to get ahead
of things. If, for instance, police in the
Washington sniper investigation had been able to tap into a pervasive network of
surveillance cameras, they could have
tracked people seen near the crime scenes
until they could be stopped and questioned: a reactive process.But it is unlikely
that police would have been helped by
proactively asking surveillance databases
for the names of people in the Washington area with the requisite characteristics
(family difficulties, perhaps, or military
training and a recent penchant for drinking) to become snipers.
In many cases, invalid answers are
harmless. If Victoria's Secret mistakenly
mails 1 percent of its spring catalogs to
people with no interest in lingerie, the
price paid by all parties is small. But if a
national terrorist-tracking system has
the same 1 percent error rate, it will produce millions of false alarms, wasting
huge amounts of investigators' time and,
worse, labeling many innocent U.S. citi-
zens as suspects."A 99 percent hit rate is
great for advertising,"Spafford says,"but
terrible for spotting terrorism."
Because no system can have a success rate of 100 percent, analysts can try to
decrease the likelihood that surveillance
databases will identify blameless people as
possible terrorists. By making the criteria
for flagging suspects more stringent, officials can raise the bar, and fewer ordinary
citizens will be wrongly fingered.
Inevitably, however, that will mean also
that the "borderline" terrorists—those
who don't match all the search criteria
but still have lethal intentions—might be
overlooked as well. For both types of error,
the potential consequences are alarming.
Comments