New software developed at the Georgia Institute for Technology can identify spam before it hits the mail server. The system, known as SNARE (Spatio-temporal Network-level Automatic Reputation Engine), scores each incoming e-mail based on a variety of new criteria that can be gleaned from a single packet of data. The researchers involved say the automated system puts less of a strain on the network and minimizes the need for human intervention while achieving the same accuracy as traditional spam filters.
Separating spam from legitimate e-mail, also known as ham, isn’t easy. That’s partly because of the sheer volume of messages that need to be processed and partly because of e-mail expectations: users want their e-mail to arrive minutes, if not seconds, after it was sent. Analyzing the content of every e-mail might be a reliable method for identifying spam, but it takes too long, says Nick Feamster, an assistant professor at Georgia Tech who oversaw the SNARE research. Letting spam flow into our in-boxes unfiltered isn’t a sensible option, either. According to a report released by the e-mail security firm MessageLabs, spam accounted for 90.4 percent of all e-mail sent in June.
“If you’re not concerned about spam, I would suggest you turn off your spam filter for about an hour and see what happens,” says Sven Krasser, senior director of data-mining research at McAfee. The Santa Clara, CA, company provided raw data for analysis by the Georgia Tech team.
The team analyzed 25 million e-mails collected by TrustedSource.org, an online service developed by McAfee to collate data on trends in spam and malware. Using this data, the Georgia Tech researchers discovered several characteristics that could be gleaned from a single packet of data and used to efficiently identify junk mail. For example, their research revealed that ham tends to come from computers that have a lot of channels, or ports, open for communication. Bots, automated systems that are often used to send out reams of spam, tend to keep open only the e-mail port, known as the Simple Mail Transfer Protocol port.
Furthermore, the researchers found that by plotting the geodesic distance between the Internet Protocol (IP) addresses of the sender and receiver–measured on the curved surface of the earth–they could determine whether the message was junk. (Much like every house has a street address, every computer on the Internet has an IP address, and that address can be mapped to a geographic area.) Spam, the researchers found, tends to travel farther than ham. Spammers also tend to have IP addresses that are numerically close to those of other spammers.
Dean Malmgren, a PhD candidate at Northwestern University whose work includes identifying new methods for identifying spam, says he finds the research interesting. But he wonders how robust SNARE will be once its methodology is widely known. IP addresses, he notes, are easy to fake. So, if spammers got wind of how SNARE works, they might, for example, use a fake IP address close to the recipient’s.