Anonymity on the Internet can be both a blessing and a curse. While the ability to hide behind anonymous proxies and fast-changing Internet protocol (IP) addresses has enabled freer speech in nations with repressive regimes, the same technologies allow cybercriminals to hide their tracks and pass off malicious code and spam for legitimate communications.
In a paper to be presented next week at SIGCOMM 2009 in Barcelona, Spain, three researchers from Microsoft’s research center in Mountain View, CA, demonstrate a way to remove the shield of anonymity from such shadowy attackers. Using a new software tool, the three computer scientists were able to identify the machines responsible for malicious activity, even when the host’s IP address changed frequently.
“What we are really trying to get at is the host responsible for an attack,” said Yinglian Xie, a member of the Microsoft team. “We are not trying to track those identifiers but associate them with a particular host.”
The prototype system, dubbed HostTracker, could result in better defenses against online attacks and spam campaigns. Security firms could, for example, build a better picture of which Internet hosts should be blocked from sending traffic to their clients, and cybercriminals would have a harder time camouflaging their activities as legitimate traffic.
Xie and her colleagues, Fang Yu and Martin Abadi, analyzed a month’s worth of data–330 gigabytes–collected from a large e-mail service provider, in an attempt to determine which users were responsible for sending out spam. To trace the origins of multiple spam outbreaks, the scientists studied records including more than 550 million user IDs, 220 million IP addresses, and a time stamp for events such as sending a message or logging into an account.
Tracing the origins of messages–a key task for tracking spam and other kinds of Internet attack–involved reconstructing relationships between account IDs and the hosts from which users connected to the e-mail service. To do this, the researchers clumped together all the IDs accessed from different hosts over a certain time period. The HostTracker software then combed through this data to resolve any conflicts. For example, sometimes more than one user appeared to originate from the same IP address or a single user had multiple ID addresses during overlapping periods of time.
HostTracker resolves the conflicts by cross referencing the data to identify proxy servers, which allow several hosts to appear as a single IP address, and to determine when a guest was using a legitimate host. “The fact that we are able to trace malicious traffic to the proxy itself is an improvement because we are able to pinpoint the exact origin,” Xie says.