Researchers have shed new light on the methods by which spammer harvest e-mail addresses from the Web and relay bulk messages through multiple computers. They say that findings could provide additional ammunition in the fight against junk e-mail campaigns.
The problem of unwanted e-mail messages, or spam, continues to vex computer users and security professionals. Currently, more than 90 percent of the e-mail messages traversing the Internet appear to be spam, according to the information released in June by the e-mail security firm MessageLabs.
In one paper scheduled to be presented this week at the Conference on E-mail and Anti-Spam, in Mountain View, CA, researchers from Indiana University studied how spammers obtain the e-mail addresses in the first place. The researchers used a variety of techniques to match the programs that cull e-mail addresses from Web pages to the resulting spam. “We are basically trying to figure out how spammers get your address–the addresses of people that they try to victimize,” says Craig Shue, a graduate student at Indiana University who now works at Oak Ridge National Laboratory.
This involved exposing 22,230 unique e-mail addresses on the Web over a five-month period and watching for spam sent to those destinations. The researchers found that an e-mail address included in a comment posted to a website had a much higher probability of resulting in spam. While only four e-mail addresses submitted to 70 websites during registration resulted in spam, half of the e-mail addresses posted to popular sites resulted in spam.
The researchers also set up a website on their own domain and waited for their pages to be crawled. Each visitor to the website would see a different e-mail, a strategy that the researchers hoped would gauge how often programs that automatically crawl sites are operated by spammers. “We are giving out a unique e-mail address to every visitor to our webpage,” Shue says. “If we ever get an e-mail to that address, we know that the crawler gave that e-mail address to a spammer.”
The researchers also found that the programs that crawl the Web looking for e-mail addresses–dubbed spamming crawlers–have characteristics that could make it easier to detect them. For example, the parts of a network from which a crawler operates tend to be a good predictor of whether it is a legitimate crawler, such as those used by Google or other search engines, or a spamming crawler. “It may be feasible to block a small number of [network numbers] associated with spammer Web crawlers to eliminate the harvesting of e-mail addresses on a site,” the Indiana University researchers wrote.