Web

A Joined-Up Bot-Fighting Strategy

(Page 2 of 2)

  • Friday, January 9, 2009
  • By Duncan Graham-Rowe

Publishing their results in the latest issue of the journal Pattern Recognition, the researchers show that some of the best OCR programs can recognize the characters less than 1 percent of the time. "Before a computer can try to recognize a character, it first has to locate it," Oommen Thomas says, so having characters joined together should make this process (known as segmentation) more challenging.

However, Yan worries that such handwriting could also be much harder for humans to read. "My main concern is usability," he says. Currently, the system has a human success rate of 75 percent, meaning that one in four times, a human can't read the text. "That's way too low," says Yan.

Luis von Ahn, a computer scientist at Carnegie Mellon University, in Pittsburgh, and a member of the team that first coined the term CAPTCHA, agrees. Von Ahn's latest system, called reCAPTCHA, has a human success rate of 96 percent. "And still people complain," he admits.

Oommen Thomas concedes this but says that his team is looking at ways to improve the success rate. "There is a region where humans and machines both do badly, but there is also a sweet spot where humans do well and machines do badly," he says, and this is what he and his team are now trying to find. "There's a lot of money to be made circumventing CAPTCHAs to generate spam," he adds, meaning that spambots are likely to get better and better at breaking existing CAPTCHAs.

"It's a worthy thing to look at," says von Ahn, but he is not sure that there's a need for a completely new kind of CAPTCHA. Systems like reCAPTCHA (currently one of the most widely used systems: it's running on more than 100,000 websites) are regularly improved to stay ahead of the curve. One trick is to scan in characters from old books, with all their imperfections. "We only use the ones that computers cannot recognize," von Ahn says. Because of this, reCAPTCHA is extremely good at keeping the bots out, he says, with the best known attacks achieving a success rate of no better than one in 1,000.

"Humans are just not that good at recognizing handwriting," von Ahn adds, noting that, as we use handwriting less and less in modern life, our ability to recognize squiggly text may fade further still.

Print

Related Articles

How Spam is Improving AI

Anti-spam puzzles are helping researchers develop smarter algorithms.

Microsoft Declares War on Spam

The once insular superpower is enlisting the help of allies.

Excuse Me, Are You Human?

Anti-spam schemes that force people to prove they aren't machines won't work.

Close Comments

To comment, please sign in or register

Forgot my password

Nostromo

4 Comments

  • 1130 Days Ago
  • 01/09/2009

CAPTCHA, the lost strategy

The same algorithm breaks all CAPTCHAs. Here it is:

1. Set up an "adult" site with lots of porn
2. Let anyone into it provided they solve a CAPTCHA
copied from the website your bot wants to break into.

Why bother with expensive, complicated pattern-recognition software when human labor is free?

Reply

Guest (jfrank)

  • 1130 Days Ago
  • 01/09/2009

re:lost strategy

That is brilliant! Human and social engineering will beat software every time...

I wonder how long it will be until we see that technique in widespread use?

Reply

Trondy

1 Comment

  • 1130 Days Ago
  • 01/09/2009

Need a better model

Even this advance in Captcha can be defeated with improved OCR. I prefer systems where subtle or hidden info must also be conveyed.

Reply

jhertzberg

15 Comments

  • 1130 Days Ago
  • 01/09/2009

Re: Need a better model

A two factor Turing test would defeat more human users. "Hidden" meaning is quite often education and culture specific.

And, software that solves for the first factor, the text itself, would then pass the result off to software to solve the second level. Much work is being done to enable software to extract semantic meaning from text (Autonomy, Nomino, etc.).

Two factor CAPTCHAs also do nothing to defeat the human "will solve CAPTCHAs for porn" crowd.

I'm sorry I can't be more optimistic. It may be that in order to have an anonymous online persona, we must accept systems for centrally creating and tracking the persona's reputation.

Reply

CStroliaDavis

6 Comments

  • 1120 Days Ago
  • 01/19/2009

Using mental tricks that would probably fool most bots

I know that spammers would probably eventually find a way around these, too, but what if a CAPTCHA used a sort of CAPTCHA image to ask a user to solve a very simple question.

Like "what are the 3rd and 5th characters in the image below?", or something like that. Where the question is in a fairly simple to read CAPTCHA and the image to select the characters from, are perhaps a bit harder to read. This might actually be easier for most humans and harder for bots.

Another thought is to use the human mind's propensity to assume what a word is based on the first and last characters and the number of characters in between. What I mean is this. Msot popele wlil be albe to raed smeotinhg qitue esiely eevn wehn the carhatecrs are jbmlued up. This could either be used to ask the question, or perhaps the CAPTCHA could make sure the first and last letters of a word are fairly easy to read and really make the inside letters much more difficult to read. Close enough is likely to help most people figure out the word, but it would be more difficult for a bot.

Reply

Siedenburg

1 Comment

  • 1113 Days Ago
  • 01/26/2009

User-friendly CAPTCHA alternative

The tests used to tell humans from bots have to be easy on users (from all walks of life) and effective against computer programs to really address the SPAM bot problem. The "hand-writing style" CAPTCHA examples in this article make me wince... I'm unsure I would solve ANY of those and would leave the site without registering for their service, posting a comment etc - annoyed... Pardon the plug, but there is a more effective way. http://demo.vidoop.com/captcha/ Feedback is invited.

Reply

Advertisement

MAGAZINE

Can We Build Tomorrow's Breakthroughs?

Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.

Videos

A Social-Media Decoder

More

Advertisement

Technology Review Lists

TR50

Our list of the 50 most innovative companies, including the following:

Twitter

Netflix

Cellular Dynamics International

Facebook

More

Advertisement

Facebook

Advertisement