Don’t Break E-Mail To Save It
By Vipul Ved Prakash
July 23, 2003
I couldn’t agree more with David Crocker’s post “Controlling Spam: Ready, Fire, Aim.” A lot of people seem to be in a rush to blame the spam deluge on the lack of authentication provided by the venerable Simple Mail Transfer Protocol (SMTP) standard. The truth is, existence of SMTP authentication would, by itself, do almost nothing to alleviate the spam problem. It’s important to understand what SMTP authentication really adds to the equation, how it might be useful in the fight against spam and what are its limitations.There are two kinds of authentication. One is domain based authentication-a mechanism that connects each sender of an e-mail to his or her domain name. Several such schemes have been proposed recently, most notably Reverse MX and SPF, which provide the recipient with an assurance that the e-mail actually originated from the sender’s domain name. For example, if you receive mail from firstname.lastname@example.org, and wonderland.com was participating in one of the domain authentication schemes, you could be sure that her e-mails actually originated from wonderland.com.
The second kind of authentication is identity authentication with digital signatures, in which the sender digitally signs the e-mail such that it can be verified by the recipient with help of sender’s public key. PGP has been used for identity authentication for the last 10 years by the cryptography and security community, but has not been widely integrated into e-mail applications.
Both kind of authentications have one thing in common: they allow us to trust the “From” field. But what would we do next? Should we “whitelist” known correspondents and quarantine the mail from unknown people in a folder labeled something like, “potential spam”? Clearly, that just shifts the problem from Inbox to the “potential spam” folder; we’d still have to sift through the spam to get at legit messages. We could issue challenge/responses to unknown but authenticated senders. Several challenge/response tools exist today (Spam Arrest, Matador, etc.) that respond to unknown senders with a graphical or analytical question to establish they are indeed human (as opposed to spam-spewing automata). Such systems are flawed without authentication, however, since they end up sending challenges to whatever fake address the spammer puts in the From field. Authentication would benefit these systems but challenge/response interactions alter the e-mail experience (and don’t work for mailing lists), and is perhaps the least desired “solution” to spam. We could use network origination information coupled with sender’s e-mail addresses as a vector in a statistical spam filtration system. Cloudmark does this to some extent already and SPF based systems intend to do this as well; the technique turns out to be a decent metric to prioritize and filter e-mail. However, it’s does not “solve” the spam problem better than filtration technologies available today.
A major change to a well-established protocol will invariably have an extended adoption time-line. Spam filtration systems would have to choose a cut-off point after which they start using the authentication for filtration purposes. If this adoption cut-off is 95 percent (for legitimate mail) than the false positive rate of the system would be 5 percent. I doubt we are going to see 100 percent adoption very quickly. For this reason, I believe that e-mail authentication would be useful only in a statistical context, and not as a deterministic function to drop e-mail on the floor.
My perspective on design of spam filtration solutions is centered around exploitation of the various constraints of the spammer. One thing we don’t talk about enough is the fact that spammers have rather serious constraints. They have to send out a marketing message (containing the same meme) to millions of people from (at most) a few thousand different IP addresses. They have to do this in a relatively short period of time. They have to differentiate their content from other spam. They have to defeat existing spam filtration systems.
A successful anti-spam solution will be able to leverage one or more of these constraints to differentiate spam from anti-spam. Cloudmark’s product, SpamNet, for example, leverages the “same meme to millions of people” constraint by allowing the first few recipients to identify the meme and share the knowledge with other intended recipients before they receive the message. Bayesian classifiers, on the other hand, leverage the fact that marketing messages are not statistically representative of the mail a person receives. Blacklists, like RBL and SpamCop, leverage the fact that spammers have a limited number of IP addresses from which to send their spam. Cloudmark’s Authority product looks for “mutations” in a message crafted to defeat anti-spam systems.
Some of these constraints make for good differentiators and others don’t. IP addresses, for example, are a bad differentiator. A spammer can use a major Internet service provider to send out spam, and blocking the ISP’s IP would block all the legit mail originating from the ISP. However, the knowledge of IP combined with another constraint (e.g., meme X originating from IP Y) can be a good differentiator. SMTP authentication would provide us another constraint to exploit. Like IP address, SMTP makes for a poor differentiator by itself (specially domain based authentication), but could be useful when combined with other constraints.
But do we really need to introduce another constraint to defeat spam? I don’t believe that we do. If we develop clever systems that can effectively leverage existing constraints we can solve the problem without requiring authentication. Still, if I had to choose, I would be more supportive of identity-based authentication techniques that employ digital signatures, since such schemes will be based on identifying individual users (instead of domains). As far as I know, no such scheme has yet been proposed for fighting spam, but they can’t be far from coming.
Barry Shein talks about augmenting e-mail to use payment systems. Assuming such a system ever gets widely deployed, it will change e-mail a lot more radically than authentication. The tradeoff is more apparent in this case; is addition of another constraint that helps us solve the spam problem worth radically changing the way e-mail functions? I think not.
I am a fan of the current Internet e-mail architecture. If e-mail had strong authentication or payment systems built into it, I doubt it would have been as wildly successful as it is today. David Crocker’s warning is well grounded: Let’s not go off and change e-mail, specially not without an understanding of what can be done with the resources we have today.
Here’s a list of three rules (created after the most important features of e-mail) that anti-spam software should strive to follow:
1) Ability to send and receive e-mail from a stranger. (Whitelisting, payment systems, and challenge/response break this rule.)
2) Ability to send and receive pseudo-anonymous e-mail. (Domain-based authentication breaks this rule.)
3) E-mail should be free. (Payment systems break this rule.)
If we can solve the spam problem while maintaining the three features of e-mail, it would be a much sweeter victory.