Sweeney means that the U.S. government watch lists, besides containing common names like Ted Kennedy, depend on variations of a phonetic algorithm called Soundex. As she says, “Soundex is an old patent that’s been used for a long time, whenever they have two databases where they’re trying to match up records.” Indeed, Soundex dates back to a time when Hollerith punch cards were the newest thing in computing technology. Developed to index and retrieve soundalike surnames with different spellings (like Rogers and Rodgers) scattered throughout an alphabetical list, Soundex was first used so that U.S. government clerks could retroactively analyze the 1890 U.S. census results. Soundex works by taking the first letter of a name, dropping all vowels, assigning a number to each of the next three consonants (with similar-sounding consonants like s and c getting the same numbers), then dropping any remaining consonants. Thereby, the algorithm reduces all names to a letter followed by three numbers.
Consequently, Soundex assigns to the name Laden the code L350, as it does Lydon, Lawton, and Leedham. This is, in other words, an algorithm so deficient for identification purposes that it confuses al Qaeda’s Osama bin Laden and the Sex Pistols’ Johnny (Lydon) Rotten. To see for yourself how poorly Soundex performs, go to nofly.s3.com, where S3 Matching Technologies has combined the algorithm with a list of potential-terrorist names recorded in U.S. government databases. “The U.S. government obviously updates its lists every day, so we don’t suggest this is up-to-date,” says James Moore, a company spokesperson. “But we got the best available data on who’d be on terrorist watch lists from various private intelligence agencies.” Using Soundex and S3 Matching Technologies’ version of the watch list reveals that the names Jesus Christ and George Bush resemble terrorists’ names enough that they’re assigned to the no-fly or selectee list.
How does the U.S. government rationalize using such error-prone technology for its watch lists? Sweeney says, “Whomever I ask–whether it’s DHS, DARPA, the Department of Justice–everybody essentially says, ‘We’re just going to plow ahead.’ At the DOJ, the answer I get is, ‘It’ll get solved when we use biometrics.’ Their belief is that the current problem will disappear because you’ll show your driver’s license and match your fingerprint against your fingerprint’s stored image on your license.” Sweeney half-seriously proposes a hypothetical solution to the watch-list problem. “I’ve told ChoicePoint that they ought to go into the watch-list business.”
Alongside Lexis-Nexis and AcXiom, ChoicePoint is one of the big-three data-brokerage corporations and in many ways the most interesting of them. Evan Hendricks, editor-publisher of the Washington-based Privacy Times, says, “Though most Americans don’t know about ChoicePoint, it’s a company that knows a lot about hundreds of millions of Americans.” Would ChoicePoint have a minimum of four data points–name, address, social-security number, and birth date–for almost every adult U.S. citizen, and therefore have enough information to differentiate among, say, any five people with names whose Soundex hashes would come out the same? Hendricks answers, “That’s certainly true. So would the three main credit-reporting companies.” However, Hendricks continues, whereas the big-three credit-reporting agencies–Experian, Trans Union and Equifax–calculate individuals’ credit scores, ChoicePoint defines itself as a data-aggregation company in the business of selling actionable intelligence to both industry and government, with credit-related information being only a subset of that whole.