Technology Review - Published By MIT
Advertisement

July 2003

Spam Wars

Continued from page 2

By Evan I. Schwartz

smaller text tool iconmedium text tool iconlarger text tool icon

Smarter Shields

Seeking a more perfect form of relief, tens of thousands of users have downloaded open-source filters (most popularly, Spam Assassin) or purchased commercialized versions such as McAfee's SpamKiller. A collection of statistically valid rules created by humans, these "heuristic" filters stand guard at the user's in-box and scan every incoming message for tip-off terms such as "Viagra," "V1AGRA," or even "V*I*A*G*R*A," plus improbable return addresses, strange symbols, embedded graphics, and fraudulent routing information, indicating the message is of dubious origins. After applying hundreds of rules, the filter scores each message, discarding those whose scores exceed a threshold value. Spam Assassin and SpamKiller typically exhibit filtration rates higher than 95 percent and false-positive rates of about .1 percent, according to Matt Sergeant of MessageLabs, a maker of Spam Assassin improvements.

This relatively high false-positive rate, however, is troubling to some users. After all, much legitimate e-mail has some of the same traits as spam. Sergeant concedes that newsletters that were requested by users will occasionally be discarded. That flaw has led to novel solutions such as collaborative filters, in which users vote as to which messages should be deemed spam.

SpamNet, from San Francisco-based Cloudmark, is one example of a program that deploys democracy in this way. An add-on to Microsoft's Outlook e-mail program, SpamNet starts filtering spam automatically upon installation. If enough trusted users designate a message as spam, that message ends up in the spam folders of Cloudmark's entire base of 420,000 users. "When a new person joins, they get the benefit of the community," says Vipul Ved Prakash, Cloudmark's founder and chief scientist. False positives are rarer under this approach, and users also have the option of clicking "unblock" on any messages in their spam folders. But there are drawbacks: SpamNet demands a higher level of user vigilance, and it requires that Cloudmark's remote servers examine all incoming e-mail before passing it on.

To fend off spam that penetrates other defenses, computer scientists have turned to the 18th-century probability theory of English mathematician Thomas Bayes. Published in 1763, two years after his death, Bayes's "Essay towards Solving a Problem in the Doctrine of Chances" provides a blueprint for determining the likelihood of future events. Since one person's spam can be another person's invitation to a pleasurable afternoon, Bayesian spam filters learn over time what each individual considers unwanted e-mail. When a user deletes several unopened messages about mortgage refinancing, for instance, a Bayesian filter learns to discard e-mail with that kind of terminology. If you typically do read such messages, however, the filter will take note of that and consider it normal e-mail.

Because Bayesian filters can be trained, their effectiveness improves over time, typically attaining filtration rates of 99.8 percent, along with a false-positive rate of a mere .05 percent. "If everyone's filter has different probabilities of different messages getting through, it makes it harder for the spammers," says Paul Graham, an independent Cambridge, MA, programmer. Last August, a link to Graham's article "A Plan for Spam" on slashdot.org jump-started a rush to Bayesian filtering. These kinds of filters, Graham says, will break the business model of the spammer. It costs about $200, he continues, to send one million messages-an endeavor that typically yields about 100 responses. If those 100 people spend an average of $2 each, the spammer breaks even. The goal, Graham says, is to drive response rates down to around one in a million so that "it would no longer be economical for a spammer to consider such a business proposition."

Microsoft Research has taken this probabilistic approach even further. Standard, so-called nave Bayesian filters treat each word or feature in an e-mail independently, but Microsoft claims its new filter, which is offered as an option in MSN 8 software, learns probabilities for words, phrases, and other distinguishing characteristics that commonly appear together. It might flag messages containing the phrase "make money from home" and "click here" that are sent from servers based in Hong Kong and that have random characters in the subject line. Microsoft's Heckerman claims that, by correlating patterns, his filter exhibits an even lower rate of false positives.

The monkey wrench is that spam is not an inanimate adversary, but rather a tool of wily and willful humans. In fact, the very effectiveness of spam filters may actually be making the problem worse. If half of a batch of spam gets thrown into the digital garbage can, the spammer will tend to respond by sending twice as much spam the next time. "As you put more filters in place, spammers become more determined, and the spam will increase," says the Anti Spam Research Group's Judge, who is the chief technology officer at CipherTrust, an Alpharetta, GA-based provider of e-mail security systems.

To balance the higher volume, Judge says, spammers simply find ways to lower their costs, such as enlisting servers based in China or India, where labor is cheap. What's more, as spammers mount a counterattack against Bayesian methods, spam is tending to look more and more like non-spam. For example, a message that says, "Hi Jim, have you seen the party pictures-take a look!" may not raise red flags, because it doesn't contain any obvious spam terms. When spam begins to look exactly like messages from friends and colleagues, filters may fail.

July/August 2003

Would you like to read more articles from the July/August 2003 issue?

This article is from the July/August 2003 Issue of Technology Review. To read other articles from this issue simply register for My.TechnologyReview.com. It's free.

Subscribe today and save up to 41% »

Comments

Advertisement

Current Issue

Technology Review November/December 2008
Sun + Water = Fuel
An MIT chemist has opened the way to making hydrogen fuel from water using sunlight.
•  Subscribe
Save 41%
•  Table of Contents
•  MIT News

Magazine Services

Career Resources

MIT Technology Insider

Stories and breaking news from inside MIT about the latest research, innovations, and startups--in a convenient monthly e-newsletter. Subscribe today
Advertisement

Follow us on Twitter

Twitter

Get Technology Review updates via the web, cellphone, or Instant Messager – Follow techreview on Twitter!

Advertisement

More Technology News from Forbes

Advertisement
Advertisement
TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology