How Spammers Use Low-cost Labor to Solve CAPTCHAS

Workers in Russia, Southeast Asia, and China are paid a pittance to solve millions of CAPTCHAS.

Christopher Mimsarchive page

August 11, 2010

What can only be described as an epic new analysis by a cadre of researchers at UC San Diego has uncovered the seedy underbelly of a sophisticated, highly automated, world-wide network of services that help email, blog and forum spammers get past the CAPTCHAS that are designed to keep them out.

A CAPTCHA, for those of you not up on your reverse Turing tests, is that little bit of distorted text you have to type back at a webpage when you’re trying to sign up for a new email account or leave a comment on a forum or blog that happens to use them. The original idea was that a CAPTCHA would prevent spammers from being able to flood public forums with their dreck, because CAPTCHAS are by definition easy for humans to solve but challenging or impossible for computers to get right often enough. They’ll be recognized as a computer after their 6th or 7th failure.

But the inventors of CAPTCHAS probably didn’t anticipate this: Hundreds, possibly thousands of laborers working for less than $50 a month to solve an endless stream of CAPTCHAS delivered to them by automated middlemen who sell the results to spammers in real time, so that their spam bots can use those solutions to post to forums and blogs as well as set up fraudulent email accounts, says a paper about to be delivered at the USENIX Security Symposium.

Clever analysis of the location of the workers involved in this scheme revealed that they are based in India, Russia, Southeast Asia and China. The system is so efficient at delivering CAPTCHAS to workers in these remote locales that the average time for delivery of a solution hovers around 20 seconds.

One of the CAPTCHA services the researchers experimented with - ImageToText - was so good that its workers were able to deliver correct results in “a remarkable range of languages,” including Dutch, Korean, Vietnamese, Greek and Arabic.

Even setting the sample CAPTCHAS in Klingon - a language readable by so few people on earth that the scientists thought they could use it as a control in their experiment - wasn’t enough to stop ImageToText, whose workers managed to solve a handful of these CAPTCHAS despite odds of less than one in one thousand of their randomly getting the right answer.

The results of this landmark study show that a number of sites, including those run by Microsoft, AOL, Google and the widely use reCaptcha, are regularly compromised by spammers employing these services.

Here’s an actual screenshot of what workers for these services see when solving a CAPTCHA:

The researchers conclude that their investigation, which included interviews with an anonymous “Mr. E” who actually runs one of these services, proves that for sophisticated spammers, CAPTCHAS aren’t so much a barrier as a cost of doing business.

Follow Mims on Twitter or contact him via email.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.