Mims's Bits

How Spammers Use Low-cost Labor to Solve CAPTCHAS

Workers in Russia, Southeast Asia, and China are paid a pittance to solve millions of CAPTCHAS.

Christopher Mims 08/11/2010

  • 9 Comments

What can only be described as an epic new analysis by a cadre of researchers at UC San Diego has uncovered the seedy underbelly of a sophisticated, highly automated, world-wide network of services that help email, blog and forum spammers get past the CAPTCHAS that are designed to keep them out.

A CAPTCHA, for those of you not up on your reverse Turing tests, is that little bit of distorted text you have to type back at a webpage when you're trying to sign up for a new email account or leave a comment on a forum or blog that happens to use them. The original idea was that a CAPTCHA would prevent spammers from being able to flood public forums with their dreck, because CAPTCHAS are by definition easy for humans to solve but challenging or impossible for computers to get right often enough. They'll be recognized as a computer after their 6th or 7th failure.

But the inventors of CAPTCHAS probably didn't anticipate this: Hundreds, possibly thousands of laborers working for less than $50 a month to solve an endless stream of CAPTCHAS delivered to them by automated middlemen who sell the results to spammers in real time, so that their spam bots can use those solutions to post to forums and blogs as well as set up fraudulent email accounts, says a paper about to be delivered at the USENIX Security Symposium.

Clever analysis of the location of the workers involved in this scheme revealed that they are based in India, Russia, Southeast Asia and China. The system is so efficient at delivering CAPTCHAS to workers in these remote locales that the average time for delivery of a solution hovers around 20 seconds.

One of the CAPTCHA services the researchers experimented with - ImageToText - was so good that its workers were able to deliver correct results in "a remarkable range of languages," including Dutch, Korean, Vietnamese, Greek and Arabic.

Even setting the sample CAPTCHAS in Klingon - a language readable by so few people on earth that the scientists thought they could use it as a control in their experiment - wasn't enough to stop ImageToText, whose workers managed to solve a handful of these CAPTCHAS despite odds of less than one in one thousand of their randomly getting the right answer.

The results of this landmark study show that a number of sites, including those run by Microsoft, AOL, Google and the widely use reCaptcha, are regularly compromised by spammers employing these services.

Here's an actual screenshot of what workers for these services see when solving a CAPTCHA:

The researchers conclude that their investigation, which included interviews with an anonymous "Mr. E" who actually runs one of these services, proves that for sophisticated spammers, CAPTCHAS aren't so much a barrier as a cost of doing business.

Follow Mims on Twitter or contact him via email.

Print

Close Comments

To comment, please sign in or register

Forgot my password

mattgroom

290 Comments

  • 549 Days Ago
  • 08/12/2010

Solutions

Offer a choice to the user:

1. Make the user enter the answer in under 10 seconds. Make them do this three times.

2. Take a computer specific id by allowing a piece of software to grab information from the persons machine in real time.

That computer id will only work for 5 created accounts.

I prefer number 2.

I have additional things i could say to improve the computer imaging of these but if i do that they could be picked up the wrong people. I hate spam.

Reply

rsanchez1

213 Comments

  • 549 Days Ago
  • 08/12/2010

Re: Solutions

I think the very best solution is to spread awareness of spam. If people don't click spam links, or if they are tricked into clicking spam links immediately leave the site they were brought to, then the spammers won't have people to spam. EVERYONE has to do this for it to be effective. I read once that as little as 1 in 100,000 people have to fall for spam for spam to be effective. This is a problem with internet culture and if you tell people that by not responding to spam, you're helping poor people in Asia that are being exploited for spam, you'll get people to stop responding to spam.

Reply

jsmitty2212

1 Comment

  • 547 Days Ago
  • 08/14/2010

Re: Solutions

Those are both bad ideas.


#1 has all sorts of problems. Most people would have a hard time solving a good CAPTCHA in under 10 seconds, but I bet someone who does it all day would get to be faster than average at it. I have seen people take minutes to figure out what one says. That is what makes the 20 second number so amazing, even with the overhead involved the service gets the correct solution really fast. And making people solve 3 in a row is going to frustrate legitimate users, while people being paid to solve them will just keep plugging away.

As for #2, you just hand waved away all the import details. What piece of computer specific information, and what software should you use to capture it? Flash? Java? Silverlight? Whatever you pick, there are going to be platforms like smartphones that don't run it. And do you really want to give some random website permission to install software that can read your hardware details? Of course as soon as the software is created, people will start working on cracking it. You will replace a simple image with something with privacy/security problems, that won't work on many devices, and it will just be a speed-bump.

Ultimately, the problem is that, barring strong AI, there isn't a technical solution to the problem of determining if you like the intent of the human behind the keyboard.

Reply

mwilson1962

35 Comments

  • 549 Days Ago
  • 08/12/2010

The book "Freedom" by Daniel Suarez (but read "Daemon" first) had an interesting solution - tens of thousands of spammers are gunned down almost simultaneously around the world.

Reply

angelrr7702

1 Comment

  • 549 Days Ago
  • 08/12/2010

another way to solve CAPTCHAS

To create another services where require CAPTCHAS but without the knowledge of the user that he is doing the CAPTCHAS for another site (google, yahoo,etc)....that will be free (no low-cost Labor)..

What I mean is in real time a  site that require for some web services to fill the CAPTCHAS but it provides the one from another site, in that way the spanner will be able to acquire account on this another services or be sold to another spanner.. I hope this never happens but maybe somebody find the solution for this one...

Reply

willknight

37 Comments

  • 549 Days Ago
  • 08/12/2010

Re: another way to solve CAPTCHAS

That's a clever suggestion @angelrr7702. In fact, I believe some spammers have used that approach:
http://boingboing.net/2004/01/27/solving-and-creating.html

Reply

MATR

91 Comments

  • 549 Days Ago
  • 08/12/2010

Authentication Protocol

I am wondering why TCP/IP does not do authentication?  If it did then I suspect a lot of the spam and porn and other illegal activities on the net would vanish.  Without authentication any computer can spoof itself as another, which allows criminals the ability to hide their identities.  If there was a strong authentication protocol at the TCP/IP level wouldn't it largely eliminate this capability?  The downside is that there could then be no anonyminity on the net.  It's a choice.   Personally my feeling is that anonyminity is not a requirement, but internet security is because the lack of security is liable to ruin the Internet as a medium of business transaction.  The other issue that I've seen mentioned in regards to this idea is that it would be very expensive to change TCP/IP at this point.  Maybe, but what is more expensive - making the change, or allowing the Internet to foster ID Theft, Porn, Spam and other illegal activites?  I guess the choice is out there, but so far I don't see anyone mentioning this option.  Maybe I'm mistaken.

Reply

Advertisement

NOcean

5 Comments

  • 549 Days Ago
  • 08/12/2010

Is this news?

Actually using teams of humans in third-world countries to solve CAPTCHAs is old news.  What's new in this story?

Now, a twist that I heard about last year:  Using CAPTCHA-like queries to fix problems in poorly scanned documents.  You present an image of captured text that your *own* algorithms have failed on in a CAPTCHA window.  The human gives his best attempt at it, and you grant him access regardless of his answer--as your really *don't* know what's correct.  You repeat this dozens or hundreds of times, and when you have a consensus of human-supplied answers, you assume that's the correct interpretation of the badly mangled scan.  Sneaky, eh?

Reply

smithsomian

182 Comments

  • 548 Days Ago
  • 08/13/2010

Re: Is this news?

That is reCAPTCHA .

Reply

Bio

Christopher Mims is a journalist who covers technology and science for just about everybody.

Subscribe to the Mims's Bits RSS Feed

Advertisement
Advertisement

Facebook

Advertisement