The proliferation of junk e-mail is threatening to overwhelm the Internet. Software companies are rushing to build defenses-but will the new technologies do more harm than good?
Operating 20 computers in an abandoned schoolhouse in Rockford, IL, Jay Nelson worked with relatives to set up more than a dozen shell companies, renting equipment and Web hosting services using aliases such as “Art Fudge.” Nelson and his associates then “hacked into AOL e-mail accounts,” states one legal motion filed by AOL, and overwhelmed members with links to pornographic Web sites such as pamsplayhouse.com.
In 1999, AOL won a court injunction barring Nelson from such activities and fining him $1.9 million; nonetheless, he and his colleagues subsequently sent another billion e-mail messages-triggering 25 percent of AOL’s spam-related customer complaints over the next two years.
Alan Ralsky, by contrast, seems almost respectable. While trying to overcome a past littered with fraud convictions, a court-ordered fine, personal bankruptcy, and a brief jail stint, Ralsky in 1997 heard about a new Internet opportunity. Repudiating pornography to his wife, Ralsky rented mailing lists and set up servers in his basement, according to media interviews he gave last year. Pitching mortgages, vacations, and online pharmacies and casinos on behalf of others, he boasted of thousands of dollars per week in sales commissions. After moving into a $740,000 house in a Detroit suburb, Ralsky set up another basement operation that was soon spewing tens of thousands of messages per hour, relayed through servers in Dallas and in Canada, China, Russia, and India. In 2001, Verizon Internet Services sued Ralsky, charging him with unauthorized use of its network.
Nelson and Ralsky are just two of the many faces behind spam. But according to Jon Praed, an attorney with the Internet Law Group, an Arlington, VA, firm hired by the plaintiffs in both of these cases, big-time spammers have a common profile. “They have not been successful in anything else,” he says. “They are hackers gone bad, or they are crooks gone geek.” They also sit at the center of far-flung conspiracies to conceal their actions. (Neither Nelson nor Ralsky returned phone calls from Technology Review.)
The spam crisis is hardly a secret. But few could have imagined it would get this bad this fast. More than 13 billion unwanted e-mail messages swamp the Internet per day, worldwide. This tsunami of time-wasting junk will be a $10 billion drag on worker productivity this year in the United States alone, according to San Francisco-based Ferris Research. In a perverse analogy to Moore’s Law of microchip processing power, the number of daily spam messages is doubling roughly every 18 months, according to the Radicati Group, a Palo Alto, CA, market research firm specializing in electronic messaging. Having risen from 8 percent of all e-mail in 2000 to more than 40 percent by the end of 2002, spam has now reached a majority, according to studies from several anti-spam software companies. Conceivably, spam could soon represent 90 percent of all e-mail, says David Heckerman, who heads the Machine Learning and Applied Statistics group at Microsoft Research, which is working on anti-spam technologies. If that happens, he says, “a lot of people will just stop using e-mail.”
“Spammers are gaining control of the Internet,” says Barry Shein, president of Brookline, MA-based The World, which started in 1989 as the first commercial provider of dial-up Internet service. Shein has been spending an increasing number of nights and weekends-the witching hours for spammers-trying to block barrages of spam that appear so suddenly that they threaten to overwhelm his service. He’s constantly adding new spammers to a “blacklist” used to block all e-mail from rogue Internet addresses, but that’s a Band-Aid. “They change their network identities every couple of hours,” and then sometimes launch “revenge attacks,” Shein says. And spammers are ever alert to fresh prey: according to a study conducted by the Federal Trade Commission, someone who uses a brand new e-mail address in an online chat room could get hit with spam as quickly as nine minutes later.
The problem could easily grow beyond anyone’s control. “Our concern is not so much for the porn and the herbal Viagra as it is for the legitimate businesses,” says John Mozena, cofounder of the Coalition against Unsolicited Commercial E-mail (CAUCE), an advocacy group.”There are 24 million small businesses in the U.S. If just 1 percent got your e-mail address and sent you one message per year, you’d have 657 additional messages in your in-box every day. That is our nuclear-winter scenario.”
To avert such a catastrophe, electronic warriors are fighting the scourge of spam using three principal tactics. The first involves the rapid adoption of spam-blocking-and-filtering software by consumers, corporate networks, and Internet service providers. Anti-spam software is expected to grow into a $2.4 billion industry by 2007, up from about $650 million now, according to a Radicati Group forecast. But that alone won’t win the war. The second, newer approach involves instituting more drastic changes in the way e-mail and the Internet work, perhaps imposing new costs to send messages or developing the ability to trace e-mail messages like phone calls.
The third tactic is a legal one, involving not only better law enforcement and prosecution of spammers but even a ban on all unsolicited commercial e-mail. To beat back the persistent, rising tide of spam, it’s probably necessary to engage on all three fronts at once. “We move based on what we anticipate from the enemy, and then the enemy reacts,” says Microsoft’s Heckerman. “We’re already up five levels of prediction.” Everyone expects further escalation-while hoping that e-mail as we know it won’t be destroyed in the process.
An Anti-Junk Arsenal
As one of the most daunting computer science problems to come along in years, the spam jam has triggered the Internet’s version of a Manhattan Project. Hundreds of software whizzes are forming teams and companies in search of the ultimate way to halt mass proliferation (see Seven Ways of Sifting Spam). At the first-of-its-kind Spam Conference at MIT in January, the overcapacity crowd of almost 600 was speckled with PhDs writing scientific journal entries, young programmers wearing beards and backpacks, and P.R. pros touting the latest anti-spam services and software. The scene struck some participants as rather pathetic. “There are some very bright people here,” The World’s Shein told the conferees, “and what are you spending your time doing? Blocking penis enlargement ads.”
Despite deep divisions among this assemblage on who has the best tools for eradicating spam, there’s broad consensus on one point: if there’s one thing worse than a piece of junk e-mail, it’s the prospect that a spam filter will stop a legitimate message from reaching its recipient. That’s why there are two important numbers one needs to know about the spam filters now in use or under development: the filtration percentage (the proportion of junk mail blocked) and the false-positive rate (the proportion of normal mail blocked). A 95 percent filtration rate is considered good, according to Paul Judge, head of the Anti Spam Research Group, started in February as a new branch of the Internet Research Task Force, a professional society. Many filters claim even higher filtration rates, he says, but those tend to run the risk of the unacceptable false-positive rates of .1 percent or higher-meaning that one in 1,000 normal messages would be lost.
Spam fighters are relentlessly adding new weapons to their arsenal. San Francisco-based Brightmail maintains one of the most widely used filters, which has been installed on corporate e-mail servers as well as the user networks of EarthLink, Verizon, Comcast, and Microsoft’s Hotmail. The filter processes about 10 percent of the world’s e-mail flow, says Enrique Salem, the company’s CEO. Brightmail has set up more than one million randomly generated “decoy” e-mail addresses, such as Dxodt19@anydomain.com. Since no human is attached to these accounts, no one can possibly claim that their owners ever authorized a marketer to communicate with them. Within days, weeks, or sometimes months, these phony addresses will begin receiving spam.
How can an e-mail address that’s neither listed nor used start receiving spam? The answer is the “dictionary attack.” So-called spambots not only harvest e-mail addresses posted on Web sites but connect to the major Internet service providers and systematically send standard address verification requests to guessed addresses, beginning with “aaa, aab, aac,” or by trying “DrDebra25a, DrDebra25b, DrDebra25c.” Such programs are often included with spam kits sold by organized syndicates. Whenever these programs fail to receive a “user unknown” type of message in reply, they add that address to a list of valid addresses, to be sold to other spammers (see “Spreading Spam,” below).
An Internet service provider can sometimes detect such a breach and throw the attacker off the system, but the attacker will attempt to connect seconds or minutes later, from a seemingly different Internet location. According to the Spamhaus Project, a U.K.-based volunteer organization funded by a British Web hosting company, earlier this year both Hotmail and MSN were buffeted by such an attack at the rate of three to four tries per second, round the clock, for at least five months straight. (Microsoft, which runs both of the targeted services, says it has identified the alleged perpetrators and is pursuing legal action in U.S. district court in San Jose, CA.)
Brightmail’s decoy method is aimed at minimizing the damage of such attacks. When the in-box of Dxodt19@hotmail.com receives a message, Brightmail’s software compresses that message into a unique 512-bit “signature,” which is added to the database of known spam. The database is updated constantly, and a new version of it is transmitted several times per hour to Brightmail’s more than 600 corporate customers. Any message that comes reasonably close to matching a known spam signature is automatically flagged as unsolicited. Eventually these pieces of presumed junk are deleted en masse. “It’s like a sting operation,” Salem says.
Brightmail excels in its extremely low false-positive rate. It will block only about one in a million legitimate messages, for a rate of .0001 percent. The big shortcoming of this kind of filtering is that it doesn’t do a terribly good job of actually blocking junk. A new piece of spam, or even a significant twist on an old spam, will probably make it through. Indeed, Brightmail’s Salem claims only a 92 percent filtration rate-and large customers such as Microsoft and EarthLink peg the actual rate at more like 70 percent. That’s why Brightmail is only used as a rough filter-and why it doesn’t come close to tackling the overall problem.
A spammer harvests valid e-mail addresses using a “dictionary attack.”
Seeking a more perfect form of relief, tens of thousands of users have downloaded open-source filters (most popularly, Spam Assassin) or purchased commercialized versions such as McAfee’s SpamKiller. A collection of statistically valid rules created by humans, these “heuristic” filters stand guard at the user’s in-box and scan every incoming message for tip-off terms such as “Viagra,” “V1AGRA,” or even “V*I*A*G*R*A,” plus improbable return addresses, strange symbols, embedded graphics, and fraudulent routing information, indicating the message is of dubious origins. After applying hundreds of rules, the filter scores each message, discarding those whose scores exceed a threshold value. Spam Assassin and SpamKiller typically exhibit filtration rates higher than 95 percent and false-positive rates of about .1 percent, according to Matt Sergeant of MessageLabs, a maker of Spam Assassin improvements.
This relatively high false-positive rate, however, is troubling to some users. After all, much legitimate e-mail has some of the same traits as spam. Sergeant concedes that newsletters that were requested by users will occasionally be discarded. That flaw has led to novel solutions such as collaborative filters, in which users vote as to which messages should be deemed spam.
SpamNet, from San Francisco-based Cloudmark, is one example of a program that deploys democracy in this way. An add-on to Microsoft’s Outlook e-mail program, SpamNet starts filtering spam automatically upon installation. If enough trusted users designate a message as spam, that message ends up in the spam folders of Cloudmark’s entire base of 420,000 users. “When a new person joins, they get the benefit of the community,” says Vipul Ved Prakash, Cloudmark’s founder and chief scientist. False positives are rarer under this approach, and users also have the option of clicking “unblock” on any messages in their spam folders. But there are drawbacks: SpamNet demands a higher level of user vigilance, and it requires that Cloudmark’s remote servers examine all incoming e-mail before passing it on.
To fend off spam that penetrates other defenses, computer scientists have turned to the 18th-century probability theory of English mathematician Thomas Bayes. Published in 1763, two years after his death, Bayes’s “Essay towards Solving a Problem in the Doctrine of Chances” provides a blueprint for determining the likelihood of future events. Since one person’s spam can be another person’s invitation to a pleasurable afternoon, Bayesian spam filters learn over time what each individual considers unwanted e-mail. When a user deletes several unopened messages about mortgage refinancing, for instance, a Bayesian filter learns to discard e-mail with that kind of terminology. If you typically do read such messages, however, the filter will take note of that and consider it normal e-mail.
Because Bayesian filters can be trained, their effectiveness improves over time, typically attaining filtration rates of 99.8 percent, along with a false-positive rate of a mere .05 percent. “If everyone’s filter has different probabilities of different messages getting through, it makes it harder for the spammers,” says Paul Graham, an independent Cambridge, MA, programmer. Last August, a link to Graham’s article “A Plan for Spam” on slashdot.org jump-started a rush to Bayesian filtering. These kinds of filters, Graham says, will break the business model of the spammer. It costs about $200, he continues, to send one million messages-an endeavor that typically yields about 100 responses. If those 100 people spend an average of $2 each, the spammer breaks even. The goal, Graham says, is to drive response rates down to around one in a million so that “it would no longer be economical for a spammer to consider such a business proposition.”
Microsoft Research has taken this probabilistic approach even further. Standard, so-called nave Bayesian filters treat each word or feature in an e-mail independently, but Microsoft claims its new filter, which is offered as an option in MSN 8 software, learns probabilities for words, phrases, and other distinguishing characteristics that commonly appear together. It might flag messages containing the phrase “make money from home” and “click here” that are sent from servers based in Hong Kong and that have random characters in the subject line. Microsoft’s Heckerman claims that, by correlating patterns, his filter exhibits an even lower rate of false positives.
The monkey wrench is that spam is not an inanimate adversary, but rather a tool of wily and willful humans. In fact, the very effectiveness of spam filters may actually be making the problem worse. If half of a batch of spam gets thrown into the digital garbage can, the spammer will tend to respond by sending twice as much spam the next time. “As you put more filters in place, spammers become more determined, and the spam will increase,” says the Anti Spam Research Group’s Judge, who is the chief technology officer at CipherTrust, an Alpharetta, GA-based provider of e-mail security systems.
To balance the higher volume, Judge says, spammers simply find ways to lower their costs, such as enlisting servers based in China or India, where labor is cheap. What’s more, as spammers mount a counterattack against Bayesian methods, spam is tending to look more and more like non-spam. For example, a message that says, “Hi Jim, have you seen the party pictures-take a look!” may not raise red flags, because it doesn’t contain any obvious spam terms. When spam begins to look exactly like messages from friends and colleagues, filters may fail.
Crippling the Attackers
That’s why anti-spam researchers are cooking up more-systematic treatments. Referring to spam as a “plague,” Mark Petrovic, vice president of R&D at Internet service provider EarthLink, notes that today’s e-mail system was designed 20 years ago for small numbers of people who already knew one another. “The possibility of sending body part enlargement ads was unheard of,” he says. Stemming the tide of spam, he says, will “require a cooperative solution to augment the basic way e-mail works.”
The most widespread of these measures is a blacklist of the sort used by Shein and other Internet service providers. Also maintained by startups such as SpamCop and NetBlocks, and by nonprofits such as CAUCE and Spamhaus, blacklists are collections of Internet Protocol addresses, domain names, and server farms that have been implicated in spewing spam; any mail originating from these tainted places will be blocked. But blacklists are imprecise: they often fail to keep pace with spammers, who constantly falsify their network locations, while sometimes blocking legitimate users. Indeed, blacklists sometimes halt e-mail from entire countries with high spam rates. E-mail originating in China and South Korea, in particular, has periodically been blocked from much of the Internet.
The inverse of the blacklist is the white list-a preauthorized address book maintained by users. An option in AOL 8.0, for instance, causes any message from senders not on the high-priority list to be discarded. This method also tends to trash e-mail you might want, though, and requires a high degree of maintenance; every time you make a new contact, you have to add a name to the white list. Aside from these drawbacks for their users, blacklists and white lists also are “wreaking havoc” on legitimate mass e-mailers, says Paul Soltoff, CEO of SendTec, a direct-marketing firm. After all, many companies (Technology Review among them) send out electronic newsletters and other promotional materials. These aren’t as obnoxious as the come-ons that most of us consider spam, and yet they are just as vulnerable to being blocked through the widespread use of blacklists and white lists.
Another drastic anti-spam measure strikes at the heart of the Internet’s culture: imposing new costs on sending e-mail. “Paying to send e-mail may be anathema to almost everybody,” says Robert Hettinga of Internet Bearer Underwriting, a startup in Boston. “But eventually, bits of money will be attached to e-mail messages.” Just as paper mail requires postage, e-mail would require e-stamps. A charge of one-tenth of a cent per e-mail, for instance, would hardly be noticeable to ordinary users but would levy a $1,000 tax on someone sending a million messages at once. Any piece of e-mail sent without an e-stamp would be automatically blocked. Others favor imposing a cost not in dollars but in the sender’s computer time. Your PC would have to solve a quick mathematical problem for each message it transmits, barely affecting senders of normal quantities of e-mail but crippling a spammer’s microprocessor. Such a “computational cost” approach is being developed at Microsoft Research and in an open-source effort called Camram (see “Making Spam Expensive,” TR April 2003).
The World’s Shein proposes an Internet market trade association, which would be an “e-mail clearinghouse,” run by a group of e-mail providers. Such an organization would sell legitimate bulk mailers special license codes in return for royalties based on the size of the mailings they are sending. Spammers who buck the system would be tracked down and sued by clearinghouse lawyers using funds set aside from the royalty pool. “The goal is to monetize the processing of bulk e-mail,” Shein says. He derives the idea from the long-established model by which radio stations and performers pay royalties to songwriters based on the formulas of another clearinghouse: the American Society of Composers, Authors, and Publishers. Elements of such a plan are already being adopted by the big three of e-mail providers-Microsoft, Yahoo!, and AOL-who announced in April that they are banding together to develop a way of creating a white list for legitimate marketers. The group has yet to announce whether participating marketers will pay to maintain a new infrastructure, but Shein guesses that things are heading that way.
For such a plan to work, future e-mail will have to be traceable. The telephone system has survived, in part, because there have always been ways to track phone calls back to their sources and find those who abuse the network. “Filtering e-mail without being able to establish identity is essentially futile,” says EarthLink’s Petrovic. He cites the problem of spam masquerading as real e-mail. “If my wife says, I’d like to spend some time with you this evening,’ I will react differently than if a stranger says the same thing. I need to know who is talking to me before I can evaluate the meaning of the message.” Indeed, Petrovic adds, the anonymity of e-mail is central to the spam phenomenon. If we cannot determine who is sending messages, all other spam-blocking measures will ultimately fail.
Establishing such traceability would require fundamental changes to the basic protocol that governs all e-mail transmission. Called the Simple Mail Transport Protocol, or SMTP, it is the 20-year-old language that virtually all e-mail software speaks in order to move messages around the Internet. If all network providers switch to an “authenticated SMTP,” as EarthLink’s Petrovic calls it, only an e-mail with a verified return address and from a valid domain name would be able to get to its desired recipient.
The Legal Front
Technology alone will never win the war. Ninety percent of spam is sent by fewer than 200 people, according to Mozena of CAUCE, the anti-spam coalition. That represents an astounding degree of concentration, but virtually everyone who fights spam for a living agrees it is roughly correct. The implication is clear: spam is a crime-fighting problem akin to the prosecution of the small number of malicious hackers who crack into networks. “These are human beings generating these messages,” Mozena says. “It’s not as if the Internet is broken. You can’t address social problems solely with technical means.” He believes that the spam plague is a criminal-justice dilemma that can be eradicated only with the active participation of legislatures and courts.
New laws, though, have yet to make much of a dent. Last year, the European Parliament passed a directive suggesting that member countries require marketers to ask permission from users before sending pitches through e-mail. So far, Austria, Denmark, Finland, Germany, Greece, Italy, and Norway have enacted such “opt-in” anti-spam legislation. But since so much spam is sent from the United States through Asia-based servers, these laws have had little effect. In 2000, the U.S. House of Representatives voted 427 to 1 to pass an anti-spam bill. But instead of including a strict opt-in provision, the bill required consumers to request the removal of their addresses from each marketer’s e-mail list. After privacy advocates denounced this “opt-out” bill as useless, it died without reaching the Senate. At least two spam bills are now alive in Congress, but there is still no consensus among lawmakers on whether the government can effectively outlaw spam-or even that it should.
In April, the Federal Trade Commission held a conference to help decide how best to approach this crisis. Brian Huseman, an FTC staff attorney, says the commission has prosecuted spammers who have sold bogus wares, failed to live up to their claims, impersonated legitimate organizations, or engaged in other deceptive practices. But since the agency is mainly charged with prosecuting fraud cases, it is powerless against spam that sells legitimate products. “There is no federal law that prohibits unsolicited commercial e-mail,” Huseman says.
Until such a law is passed, lawyers will continue to rely on precedents from similar cases, says Jon Praed of the Internet Law Group. He believes that indiscriminate mass e-mailing is “already illegal in all 50 states” based on centuries-old Common Law that prohibits unauthorized use of someone else’s property-in this case, computer networks.
Armed with this argument, AOL pursued porn spammer Jay Nelson, both before and after he and his cohorts violated the 1999 court order. Since spam cases can be prosecuted anywhere damage occurs, AOL chose its hometown district court in Alexandria, VA. In October 2002, the judge held the coconspirators in contempt and awarded AOL $6.9 million in damages and fees on top of the original $1.9 million finding, according to court documents. That figure was topped in May when EarthLink won a $16.4 million judgment against Howard Carmack, a Buffalo, NY, spammer; a week later, he was arrested on charges of identity theft. Praed says spammers cannot skirt the payments by filing bankruptcy, and that the plaintiff can “hound” the guilty parties until the money is collected, preventing them from buying houses and cars. “We need to make the spammers realize they made a mistake and to discourage others from doing it,” he says.
Detroit-based spammer Alan Ralsky, however, remains active. Instead of spending more time and money bringing Ralsky to court, Verizon last October decided to settle its case against the man that some call “the spam king.” In return for Ralsky’s paying an undisclosed sum and promising to avoid Verizon’s network, the lawsuit was dropped-leaving Ralsky firmly in business.
Furious anti-spam activists posted Ralsky’s home and e-mail addresses online, and soon he was deluged with piles of printed catalogues and junk mail. Yet he appears undeterred and continues to add to his list of 250 million e-mail addresses. According to his own statements, he is finding new ways to obscure his identity, laundering his Internet location data through servers in Romania and obscure parts of China. Spamhaus and CAUCE consider the 57-year-old Ralsky one of the top five spammers worldwide. “I’ll never quit,” he told the Detroit Free Press. “I like what I do. This is the greatest business in the world.”
The war on spam won’t be won until guys like him are somehow forced to change their minds.
Become an MIT Technology Review Insider for in-depth analysis and unparalleled perspective.Subscribe today