Data Mining for Dodgy Machines

A study highlights efforts to take down ISPs that allow malicious activity.

Brian Krebsarchive page

March 17, 2010

In recent years, cyber gangs have been careful to spread their operations across multiple Internet service providers, a tactic that makes it much harder for law enforcement and security administrators to track organized crime activity.

But new research shows that gathering data from various places, including anti-malware and anti-spam companies and phishing blacklists, makes it possible to identify dense clusters of ISPs that that appear to be overly tolerant of malicious activity. This pattern was particularly evident in Eastern Europe and the Middle East.

Researchers from Indiana University at Bloomington and the Oak Ridge National Laboratory in Oak Ridge, TN, compared the data from a variety of sources that measure ISP reputation from different perspectives.

Security organizations tend to measure online threats differently depending on their geographic location and focus. The study includes information on phishing websites from Phishtank.com and the Anti-Phishing Working Group; botnet data from the Shadowserver Foundation; spam data from Indiana University, Spamhaus, SURBL, and Support Intelligence; and malware hosting stats from organizations such as CleanMX, eSoft, and Malware Patrol.

Craig Shue, a cyber security research scientist at the Oak Ridge National Lab, said the group agreed not to name the hosts and ISPs they determined were malicious in return for a look at the different data sets. Shue’s employer, as well as several organizations that contributed data, were concerned about being sued for criticizing particular ISPs.

Still, Shue said, it is clear that a large fraction of Internet address ranges at many ISPs engaged in malicious activity. “Overall, a small number of ISPs have a disproportionate fraction of malicious hosts,” the researchers conclude in their report. “These [networks] may harbor malicious activity and should be investigated.”

The researchers classified an ISP as malicious if it harbored at least 2.5 percent of the malicious Internet addresses for a given data set, such as the list of phishing sites or malware-laced sites. They found 58 networks that each had more than 100,000 compromised hosts in their Internet address space ranges, while another 255 networks had between 10,000 and 100,000 systems blacklisted.

“What we are seeing is, there aren’t a whole lot [of ISPs] above 1 percent of each data set, but there are more [ISPs] than we thought there were,” Shue said.

The group identified two ISPs from Ukraine, one from Iran, and one from Belarus that had more than 80 percent of their Internet address ranges blacklisted for a combination of spam, phishing, and hosting malicious software. In another data set–which examined the prevalence of servers that criminals use to control botnets (large groupings of hacked PCs)–the researchers found that a large broadband ISP from Turkey represented 9.11 percent of all the Internet addresses.

The researchers tried to avoid penalizing large network providers unfairly by examining the percentage of an ISP’s Internet addresses that showed up in each individual badness data set. Other approaches identify problem networks based on the number of blacklisted addresses for a given ISP, and this method usually points to the world’s largest ISPs, the majority of which are in the United States.

The researchers also sought to identify ISPs and hosting providers that had a disproportionate number of network peers that were malicious. For this measurement, they focused on ISPs with at least three such partner networks. They found 22 networks that had 100 percent of their customers classified as malicious, while some 194 networks had at least 50 percent of their customers fall into that category.

Last week, a Russian ISP named Troyak was disconnected from the Internet after its upstream providers pulled the plug on it. Researchers found that Troyak served several different hosting providers that collectively were home to command and control (C&Cs) networks for more than 60 “Zeus” botnets–huge groupings of zombie PCs that provide criminals a constant flow of stolen financial data, such as online banking credentials.

On March 9, Troyak was briefly knocked offline before finding another upstream ISP to take it on. This cat-and-mouse game was repeated five times over the next three days.

Roman Hussey, a Swiss information technology expert who maintains a site called Zeustracker, which tracks Zeus botnet C&Cs around the globe, says it’s important to collect and publicly highlight information about malicious ISPs. Not only does that help draw attention to and isolate malicious hosts, he said, it also helps inform the media.

“Troyak has had some troubles getting back online again, and it’s largely because of the media hype,” Hussey says. Because of that hype, “every ISP knows who Troyak is, and now won’t peer with them.”

But Alex Lanstein, a senior security researcher at Fireeye, a Milpitas, CA-based security firm that has participated in several botnet and malicious ISP takedown efforts, says many security firms do not want to share information publicly for competitive reasons.

And there are important strategic reasons to keep certain types of threat intelligence close to the vest, Lanstein says. “Some security companies block specific hosts or ISPs for their customers, but don’t tell anyone else, so that the [malicious network owners] don’t know they’re being blocked,” Lanstein said.

He pointed to a writeup the company published in September 2009 about an ISP in Ukraine. “When I posted that, literally a day later they completely stopped using the Internet address blocks we wrote about,” Lanstein says. “As soon as they know they’re on a major blacklist, they often will ditch the Internet address blocks that are being blocked,” and move their operations to less tainted address space.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.