An Anti-Junk Arsenal
As one of the most daunting computer science problems to come along in years, the spam jam has triggered the Internet’s version of a Manhattan Project. Hundreds of software whizzes are forming teams and companies in search of the ultimate way to halt mass proliferation (see Seven Ways of Sifting Spam). At the first-of-its-kind Spam Conference at MIT in January, the overcapacity crowd of almost 600 was speckled with PhDs writing scientific journal entries, young programmers wearing beards and backpacks, and P.R. pros touting the latest anti-spam services and software. The scene struck some participants as rather pathetic. “There are some very bright people here,” The World’s Shein told the conferees, “and what are you spending your time doing? Blocking penis enlargement ads.”
Despite deep divisions among this assemblage on who has the best tools for eradicating spam, there’s broad consensus on one point: if there’s one thing worse than a piece of junk e-mail, it’s the prospect that a spam filter will stop a legitimate message from reaching its recipient. That’s why there are two important numbers one needs to know about the spam filters now in use or under development: the filtration percentage (the proportion of junk mail blocked) and the false-positive rate (the proportion of normal mail blocked). A 95 percent filtration rate is considered good, according to Paul Judge, head of the Anti Spam Research Group, started in February as a new branch of the Internet Research Task Force, a professional society. Many filters claim even higher filtration rates, he says, but those tend to run the risk of the unacceptable false-positive rates of .1 percent or higher-meaning that one in 1,000 normal messages would be lost.
Spam fighters are relentlessly adding new weapons to their arsenal. San Francisco-based Brightmail maintains one of the most widely used filters, which has been installed on corporate e-mail servers as well as the user networks of EarthLink, Verizon, Comcast, and Microsoft’s Hotmail. The filter processes about 10 percent of the world’s e-mail flow, says Enrique Salem, the company’s CEO. Brightmail has set up more than one million randomly generated “decoy” e-mail addresses, such as Dxodt19@anydomain.com. Since no human is attached to these accounts, no one can possibly claim that their owners ever authorized a marketer to communicate with them. Within days, weeks, or sometimes months, these phony addresses will begin receiving spam.
How can an e-mail address that’s neither listed nor used start receiving spam? The answer is the “dictionary attack.” So-called spambots not only harvest e-mail addresses posted on Web sites but connect to the major Internet service providers and systematically send standard address verification requests to guessed addresses, beginning with “aaa, aab, aac,” or by trying “DrDebra25a, DrDebra25b, DrDebra25c.” Such programs are often included with spam kits sold by organized syndicates. Whenever these programs fail to receive a “user unknown” type of message in reply, they add that address to a list of valid addresses, to be sold to other spammers (see “Spreading Spam,” below).
An Internet service provider can sometimes detect such a breach and throw the attacker off the system, but the attacker will attempt to connect seconds or minutes later, from a seemingly different Internet location. According to the Spamhaus Project, a U.K.-based volunteer organization funded by a British Web hosting company, earlier this year both Hotmail and MSN were buffeted by such an attack at the rate of three to four tries per second, round the clock, for at least five months straight. (Microsoft, which runs both of the targeted services, says it has identified the alleged perpetrators and is pursuing legal action in U.S. district court in San Jose, CA.)
Brightmail’s decoy method is aimed at minimizing the damage of such attacks. When the in-box of Dxodt19@hotmail.com receives a message, Brightmail’s software compresses that message into a unique 512-bit “signature,” which is added to the database of known spam. The database is updated constantly, and a new version of it is transmitted several times per hour to Brightmail’s more than 600 corporate customers. Any message that comes reasonably close to matching a known spam signature is automatically flagged as unsolicited. Eventually these pieces of presumed junk are deleted en masse. “It’s like a sting operation,” Salem says.
Brightmail excels in its extremely low false-positive rate. It will block only about one in a million legitimate messages, for a rate of .0001 percent. The big shortcoming of this kind of filtering is that it doesn’t do a terribly good job of actually blocking junk. A new piece of spam, or even a significant twist on an old spam, will probably make it through. Indeed, Brightmail’s Salem claims only a 92 percent filtration rate-and large customers such as Microsoft and EarthLink peg the actual rate at more like 70 percent. That’s why Brightmail is only used as a rough filter-and why it doesn’t come close to tackling the overall problem.
A spammer harvests valid e-mail addresses using a “dictionary attack.”