Cybercrime Surveys Aren't Telling Us What We Need to Know

Methods used for most cybercrime surveys are so poor that the data can’t be trusted, researchers argue.

Erica Naonearchive page

June 28, 2011

For years, government officials, news articles, and security companies have warned about the dangers and impact of cybercrime. Patrick Peterson, chief security researcher at Cisco, has estimated that losses totaled $560 million in 2009. Killian Strauss of the Organization for Security and Cooperation in Europe has estimated them at $100 billion annually. And in March 2009, Edward Amoroso, AT&T’s chief security officer, submitted written testimony to the U.S. Senate Committee on Commerce, Science, and Transportation estimating that cybercrime was bringing in illicit revenues of approximately $1 trillion a year.

To some researchers, those wildly different numbers suggest that current methods for calculating cybercrime losses are so poor we actually have no idea how bad the problem is. And without good data, they say, there’s no way to fight it intelligently.

“How can this be?” says Cormac Herley, a principal researcher at Microsoft Research, his voice rising in incredulity. “How can you have estimates of the same problem ranging across three orders of magnitude?”

In fact, Herley says, when he saw these numbers he felt they “just didn’t make sense.” Not only are they all over the map, but some of them also seem impossibly high. For example, he says, cybercrime revenues of $1 trillion mean $5,000 for every U.S. adult who spends time online.

Bad data has consequences. “Without numbers, we can’t make good policy or sound investment decisions,” Herley says. Not only that, but we can’t figure out where key threats are coming from. Are the criminals making most of their money from key logging? Highly targeted phishing attacks (“spear phishing”)? Brute-force attacks on people’s passwords? “It’s distressing,” he says.

Herley embarked on a study of the methods used to calculate these numbers and found them severely wanting. Most of the statistics come from surveys in which respondents are asked to report whether they’ve been victims of a crime and how much they lost. “Surveys are hard,” Herley says. His research revealed a number of reasons why surveys about cybercrime are particularly hard.

Scientists have pretty good methods for surveying, say, voter intentions. In that case, you focus on getting a good representative sample. Inaccuracies matter, but a few one way or another aren’t going to make much of a difference.

Cybercrime is a whole different story. For one thing, cybercrime surveys are trying to measure a number: how much money was lost. In that case, individual responses can make a huge difference. A voting survey isn’t thrown off by much if someone who actually plans to vote Democrat states an intention of voting Republican. But if a survey respondent who has lost $50,000 to cybercrime claims to have lost $500,000, any calculations based on that information will be wildly out of whack.

There are other problems. Any registered voter has useful information to report. But not everyone has a useful story to tell about cybercrime, which means that a small number of responses can make a huge difference. For example, in a 2006 survey conducted by Gartner Research, 128 out of 4,000 people claimed to have been victims. Herley calculates that 59 percent of losses came from the top 1 percent of respondents who had been victimized—in this case, a single person. He believes that such concerns make it impossible to trust data coming from most cybercrime surveys.

“Understanding the impact of any crime is problematic,” says Julie Ryan, lead professor in information security management at George Washington University. However, Ryan says, cybercrime presents particular problems because most people aren’t well equipped to answer technical questions about it. For example, individual survey respondents might not be sure whether they’ve been victims of a phishing attack or, indeed, whether anything was taken from them as a result. “So here we have a problem,” Ryan says. “Potential crime that is potentially undetectable, compounded by a target space that is mostly ignorant.”

Compounding the problem, it’s hard to get unbiased information, Ryan says. Big corporations might be reluctant to admit to being cybercrime victims, since such an admission could lose them customers. On the other hand, security firms—which often conduct surveys on cybercrime—benefit when people take cybercrime seriously.

Herley wants companies and organizations that conduct surveys to publish much more information, making it easier for researchers to evaluate their methods. For example, they could publish median figures for cybercrime, which aren’t as likely as mean, or average, figures to be distorted if a few people exaggerate or are mistaken.

Herley also suggests that claims of big losses receive careful scrutiny. “You risk catastrophic error,” he says.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.