Machine-Learning Algorithm Combs the Darknet for Zero Day Exploits, and Finds Them

The first machine-based search of online hacker marketplaces identifies over 300 significant cyberthreats every week.

Emerging Technology from the arXivarchive page

August 5, 2016

In February 2015, Microsoft identified a critical vulnerability in its Windows operating system that potentially allowed a malicious attacker to remotely control the targeted computer. The problem affected a wide variety of Windows operating systems including, Vista, 7, 8 and various others designed for servers and mobile computers.

The company immediately issued a fix. But it didn’t take long for details of the vulnerability to spread through the hacker community.

In April, cybersecurity experts found an exploit based on this vulnerability for sale on a darknet marketplace where the seller was asking around $15,000. In July, the first malware appeared that used this vulnerability. This piece of malware, the Dyre Banking Trojan, targeted users all over the world and was designed to steal credit-card numbers from infected computers.

The episode provided a key insight into the way malware evolves. In the space of just a few months, hackers had turned a vulnerability into an exploit, offered this for sale, and then saw it developed into malware that was released into the wild.

In this case, Microsoft became aware of the vulnerability before it could be exploited and so could release a patch. But when malware exploits previously unknown vulnerabilities, the original software owners have to develop a patch immediately, in literally zero days, hence the name “zero day attacks.”

A key goal for cybersecurity experts is to identify zero day exploits before they can be turned into malware. And for Eric Nunes and pals at Arizona State University, the case of the Dyre Banking Trojan has provided important inspiration for an entirely new approach to this kind of cybersecurity.

Today, these guys unveil a cyberthreat intelligence-gathering operation that uses machine learning to study hacking forums and marketplaces in the dark web and deep net. The system hunts for clues about emerging vulnerabilities.

And their new system is off to an impressive start. “Currently, this system collects on average 305 high-quality cyberthreat warnings each week,” say Nunes and co.

First some background. Hackers and other nefarious types tend to hide their forums and marketplaces in one of two ways. The first relies on the widely used Tor software to anonymize traffic as it passes around the Internet and prevent it being tracked. This is known as the “dark net.”

Another option is to use websites hosted on the open portion of the Web but not indexed by search engines. This is the “deep net,” and can be equally hard to navigate.

To monitor hacker activity in these places, Nunes and co developed a crawler to gather information from HTML pages hosted on the deep net and the dark net. Obviously, a key part of this work is to point the crawler at the best starting pages, a task that must be done by humans familiar with these pages. The team then extracts specific information regarding hacking activities while discarding all other information relating to drugs, weapons, and so on.

Finally, they used a machine-learning algorithm to detect relevant products and topics being discussed on these sites. They do this by labeling 25 percent of data by hand, pointing out what is relevant and what’s not. It takes a human about one minute to label five marketplace products or to label two topics on a forum, but this can be reduced as the machine learns. They then train the algorithm using this labeled data set and test it on the rest.

The results make for interesting reading. “With the use of machine learning models, we are able to recall 92% of products in marketplaces and 80% of discussions on forums relating to malicious hacking with high precision,” says Nunes and co.

This technique has already revealed a number of nefarious activities. “Over a 4 week period, we detected 16 zero-day exploits from the marketplace data,” say the team. This included one significant Android exploit being offered for around $20,000 and one involving Internet Explorer 11 for around $10,000.

The team also mapped the social networks associated with the way hackers use these forums and marketplaces. They say there are 751 users who are present on more than one marketplace and give the example of one vendor who was active on seven marketplaces and one forum offering over 80 malicious hacking related products.

This was clearly lucrative business. “The vendor has an average rating of 4.7/5.0, rated by customers on the marketplace with more than 7,000 successful transactions, indicating the reliability of the products and the popularity of the vendor,” says Nunes and co.

That’s a useful step forward in the fight against cybercrime. With the system currently spotting over 300 cyberthreats each week, it has already attracted attention from the commercial world. Indeed, the team says it is currently transitioning the system to a commercial partner.

If the team goes on spotting zero day vulnerabilities before they are developed in malware products, they can help software owners develop patches quickly. And that’s a significant help for security experts.

Of course, this will be part of the cat and mouse game of cybersecurity. It’ll be interesting to see how hackers change their behavior now that they know they are being systematically monitored in this way. And when that happens, there’ll be yet an iteration in game.

Ref: arxiv.org/abs/1607.08583: Darknet and Deepnet Mining for Proactive Cybersecurity Threat Intelligence

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.