Mapping the Malicious Web

Analyzing the connections between sites could help spot Web attacks.

Robert Lemosarchive page

March 9, 2010

Over the past couple of years, cybercriminals have increasingly focused on finding ways to inject malicious code into legitimate websites. Typically they’ve done this by embedding code in an editable part of a page and using this code to serve up harmful content from another part of the Web. But this activity can be difficult to spot because websites also increasingly pull in legitimate content, such as ads, videos, or snippets of code, from outside sites.

**Wicked web:** FireShark finds potentially malicious servers by determining which ones are serving up content to multiple websites.

Now a researcher at Websense, a security firm based in San Diego, has developed a way to monitor such malicious activity automatically.

Speaking at the RSA Security Conference in San Francisco last week, Stephan Chenette, a principal security researcher at Websense, detailed an experimental system that crawls the Web, identifying the source of content embedded in Web pages and determining whether any code on a site is acting maliciously.

Chenette’s software, called FireShark, creates a map of interconnected websites and highlights potentially malicious content. Every day, the software maps the connections between nearly a million websites and the servers that provide content to those sites.

“When you graph multiple sites, you can see their communities of content,” Chenette says. While some of the content hubs that connect different communities could be legitimate–such as the servers that provide ads to many different sites–other sources of content could indicate that an attacker is serving up malicious code, he says. According to a study published by Websense, online attackers’ use of legitimate sites to spread malicious software has increased 225 percent over the past year.

Even legitimate hubs can pose a threat, however. In September, for example, the New York Times acknowledged that online criminals, masquerading as legitimate advertisers, had placed content on its site via an advertising network.

Attacking a network of this kind can be far more lucrative than attacking any single site. “Let’s suppose that the site’s security is top-notch. How can a malicious attacker get to the user?” Chenette says. “An ad network would be a fine choice.”

**Remote control:** FireShark discovered that some content on the site howtofindmyIP.com comes from dubious sites hosted in the Ukraine.

The researchers at Websense plan to release a plug-in for the Firefox browser that will reveal the content hubs that a site is linked to.

“The interesting thing about all of this is when attackers are using, say, DoubleClick as the vector of attack,” says Tom Pinckney, cofounder of the Web security firm SiteAdvisor, which was bought by McAfee in 2006, and now vice president of engineering for the recommendation site Hunch. “For many attacks, someone buys the content on the ad network, but the guy who is actually supplying the content on the page–God knows who that is.”

SiteAdvisor offers a plug-in that provides a service that’s similar to what FireShark offers. McAfee used a data center full of virtual PCs to troll the Web for malicious sites, evaluating links and submitting unique e-mail addresses that are then monitored for spam.

FireShark delves deeper than SiteAdvisor by decoding the HTML, Javascript, and other code embedded in each Web page it parses, looking for the ultimate source of content, even if it’s redirected multiple times. “FireShark gives a more in-depth view of what is going on,” Chenette says.

Maxim Weinstein, executive director of StopBadware, a nonprofit organization that helps create lists of malicious websites, says FireShark could be an interesting tool for researchers. The caveat, he says, is that anomalous behavior is not always malicious. “The patterns that look bad are often good things–just anomalous,” he says.

Tracking the way sites are connected over time could also help identify malicious changes to sites, Chenette says. He adds that the FireShark browser plug-in may eventually let users feed information about the sites they visit back to Websense.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.