Startup Aims to Scour the Dark Web for Stolen Data

New technology protects customer privacy while it crawls the Dark Web for data compromised in corporate breaches.

Robert Lemosarchive page

June 3, 2015

When online thieves compromised Target in November 2013, they took less than a day to gain a foothold in the retail giant’s network. Because the signs of such an attack are hard to distinguish, Target did not detect the breach until after the data had already started being sold in the darker corners of the Internet, more than three weeks later.

The delay between compromise and detection is a common problem for all companies. Eight out of 10 online breaches happen within a few hours or days, but defenders detect the attack in the same amount of time in only about a quarter of cases, according to annual Data Breach Investigations Report released by Verizon.

Startup Terbium Labs aims to change that. Founded by two researchers from the Johns Hopkins University Applied Physics Laboratory and announced on Wednesday, the company uses a combination of two technologies to give businesses a private way to detect data leaked to the Web.

“When you can bring that breach detection time down from months to seconds or minutes, then you can really minimize the damage and reduce the risk of the data being out there in the first place,” says Danny Rogers, cofounder and CEO of Terbium Labs.

Data breaches have become a major problem for companies that house consumer data. Retailers such as Target and Home Depot, and health-care companies, such as Anthem and Community Health Systems, have lost millions of dollars and credibility with their customers after suffering attacks. Online thieves stole more than a billion records containing personally identifiable information last year, costing companies more than $445 billion, according to the Ponemon Institute, a survey research firm.

While many security technologies aim to stop attackers before they steal data or block data from being removed from a network, Terbium Labs aims to close the gap between the detection and compromise.

Terbium Labs’ Matchlight technology continuously searches the Web as well as the hidden and anonymized parts of the Internet, commonly called the “Dark Web,” where criminals often conduct illegal transactions. Johns Hopkins researchers have estimated that Google indexes only 5 to 10 percent of the total Web.

After researchers enter hundreds of seed links, the system crawls through those pages, following any new links, to eventually map a significant portion of the entire Web. When it finds data, the system divvies the information up into 14-byte chunks, a common way to search for patterns in text. Those chunks, or n-grams, are stored in a database for later searching. Clients can then query this database to see if any sensitive data from their systems may have been found.

Yet, the system also protects privacy. The data is encrypted and stored as a digital fingerprint. A client can then encrypt its own data, and search for that encrypted text within the database, preventing anyone else, including Terbium Labs, from seeing the information. The company works with its clients to make sure that their selection of sensitive data fingerprints will not result in too many false matches.

“It allows us to search for data in a way that we are blind to what we are actually searching for,” says Michael Moore, cofounder and chief technology officer.

Overall, the system allows companies, such as retailers and financial institutions, to detect whether a criminal has published some of their data on the Dark Web without revealing to anyone the exact nature of the sensitive data.

Already the system has helped companies testing the system find thousands of credit-card numbers that had been put up for sale on the Internet. While the Matchlight system catches attackers only after they post data following a breach and does not prevent the original compromise, it does reduce the time between compromise and discovery.

And for companies, reducing that gap means reducing damages. The breach of Target cost the company $252 million in gross expenses in 2013 and 2014. Catching the attack as soon as the thieves attempted to sell the data could have given the attackers less time inside the company’s network and the buyers of the data less time to rack up fraudulent charges.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.