Technology Review - Published By MIT
Advertisement

How Spam is Improving AI

Continued from page 1

By Kurt Kleiner

Tuesday, October 14, 2008

smaller text tool iconmedium text tool iconlarger text tool icon

Golle trained his program using 8,000 images collected from the same website. Through trial and error, his software gradually learned to tell cats and dogs apart, based on a statistical analysis of color and texture in each photo. The pink of the dogs' tongues and the green of the cats' eyes provided strong clues, Golle says, but it is only by studying color and texture information from so many images that his program could attack the problem. "Machine learning is very good at aggregating information," Golle says.

However, although each individual picture was recognized 83 percent of the time, the full CAPTCHA test requires 12 pictures to be identified simultaneously, so the attack actually works only 10.3 percent of the time.

Golle says that an easy countermeasure would be for Asirra to present more pictures, which would further drive down the success rate of the attack. Microsoft did not respond to our requests for comment.

Despite all this progress, it's unclear whether or not real spammers are currently using AI attacks against real CAPTCHAs. Websense Security Labs, in San Diego, has released reports about spammers cracking CAPTCHAs, but often this involves simply having low-paid workers solve CAPTCHAs manually.

Luis von Ahn, a computer scientist at Carnegie Mellon University, who helped coin the term CAPTCHA, says that it's not clear that any common CAPTCHAs have been broken by machine attack in the real world. "I don't know of anybody who's thinking of getting rid of the CAPTCHA because it doesn't work," he says.

However, von Ahn notes that using humans comes at a cost. Even if workers are paid just $3 per 1,000 CAPTCHAs, that is expensive, he says, especially since most of the hacked Web mail accounts will be shut down soon after they begin to send out spam. So a truly automated attack would reduce the cost to spammers and greatly increase the number of successful attacks they could afford, he says.

But until computers start to get much smarter, CAPTCHA creators will always be able to implement a few simple tweaks to make a CAPTCHA much harder. "I do think there will be a day when, essentially, CAPTCHAs are going to be useless," von Ahn says. "But I don't think it's this year, or next."

Comments

  • Turing test
    Very interesting article! It is fascinating that the Turing Test itself has become a practical issue. I'd like to point out a trend I have seen in AI, and use that to project the evolution of CAPTCHAs.

    I took my first AI course in 1962 (a graduate EE course at MIT, taught by Prof James Slagle). At the time, the focus was on heuristics to "prune the choice tree".

    Looking at progress in AI for the following 40 years, this approach was not where the successes were. Pragmatic AI successes stemmed not from pruning the tree, but rather from faster and cheaper computing that could afford to look deeper into the tree -- pruning not necessary. I don't know whether the new CAPTCHA attacks fit that description, but I suspect they do.

    So what can we do to make better CAPTCHAs? Suggestions in the article, like identifying more pictures to reduce the probability of N hits, miss the point of history. That is at best a temporary expedient; as computing gets faster and cheaper, it will be necessary to identify more and more pictures -- a major nuisance to legitimate users.

    Perhaps a better approach would be to look again at the thing that makes a Turing test such a high threshold: the unstructured richness of human intelligence. Suppose the user had no idea what kind of test would appear as the CATCHA? That is more in the spirit of the Turing test than the limited scope of a CATCHA today. This time you must tell a cat from a dog; the next login you must identify a state or a country from a map; after that, you must name a tune that plays on your speaker.

    Of course, this "unstructured richness" has its problems as well. The article made the point that, as text is distorted more and more, people start to have as much trouble identifying it as computer programs do. By making the test less structured, we run the risk of some humans being unable to solve the puzzle. For instance, the map test that I mentioned assumes some proficiency with geography. I watch quiz shows that leave me appalled at the lack of geographical knowledge of too many contestants. As for the music identification, those same quiz shows impress on me that, while I may be a genius at pre-1960 music, I'm a complete dunce at post-1980 music that many contestants identify immediately.

    So the choice of test is a challenge to find unstructured, unpredictable knowledge that is, at the same time, universal to humans. And the test must have unambiguous responses, so that the CATCHA program itself does not have to pass the Turing test.

    DaveT

    Rate this comment: 12345

    dtutelman
    10/14/2008
    Posts:57
    Avg Rating:
    4/5
    • Re: Turing test
      Hi from Greece. Most interesting idea. Shifting the target from a specific domain to a more general one would probably require a "strong AI". Coincidentally i singed up yesterday on a popular website and it requested that i successfully passed two tests: the usual distorted-text test and a second one that looked like this: "nineteen -11 +3"
      Rate this comment: 12345

      aatnet
      10/16/2008
      Posts:2
      Avg Rating:
      3/5
  • Not a great Turing test to begin with
    The problem with current CAPTCHA's is that they weren't a very good Turing test to begin with (using computer fonts to generate an image).

    OCR technology has been around for a long time, and even after mutilating characters, the problem is we're still limited to only 60 or so "images" to choose from (a-z, A-Z, 0-9).
    I posted an idea about the ideal Turing test, one that could be used to make a CAPTCHA that is theoretically unbreakable.
    Read more about it here:

    http://www.yuniti.com/BetterCaptcha

    The idea is to use a hurdle in technology - image recognition - as a hurdle in cracking CAPTCHA's.  In essence, using google's image labeling platform and database to have users enter a word that describes the image.

    Because the source of images is infinately large (millions of images), caching of results would not be an option.

    Better yet, words could be limited to the English language for english sites, making it more difficult to outsource to "CAPTCHA typers" in China/Russia/etc.
    Rate this comment: 12345

    marquinhocb
    10/14/2008
    Posts:1
    Avg Rating:
    2/5
  • What If AI Is Solved?
    What will happen to the internet when AI is solved? What if no test could fool our future intelligent machines? How will web sites distinguish between humans and machines? I see this as a future business opportunity. We will need some form of personal authentication service. Maybe, the website's own AI will be used to authenticate users via their webcams and/or computer microphone.
    Rate this comment: 12345

    Mapou
    10/14/2008
    Posts:65
    Avg Rating:
    2/5
    • Re: What If AI Is Solved?
      A most interesting question that raises more. When that happens (in my view it is a matter of 'when' not 'if') will it be unethical to use that AI for spamming?
      Rate this comment: 12345

      aatnet
      10/16/2008
      Posts:2
      Avg Rating:
      3/5
  • AI
    AI as your thinking about it will never be solved. Yes machines will become more human, but humans will also get more like machines. We will eventually meet some ware in the middle, and the entire concept of AI will become ridiculous.
    Rate this comment: 12345

    zig158
    10/15/2008
    Posts:64
    Avg Rating:
    4/5
    • Re: AI
      how about sensors - like some of the laptops and desktops come with fingerprint reader to login, It can be used to identify whether its a human or a machine. But that will raise the question - how will a browser access data on that machine (security issues)
      Rate this comment: 12345

      kiran_342
      10/16/2008
      Posts:1
      Avg Rating:
      1/5

Log In

Forgot your password?     Register »
Advertisement

Videos

The Marcellus Shale Gas Rush
Technology Review November/December 2009

Current Issue

Natural Gas Changes the Energy Map
The United States has vast supplies of this cleaner fossil fuel. But how should we use it?
Featured Content
Sponsored by:
White Papers

Twelve ways to reduce costs with SQL Server 2008
Find out how to reduce costs and get more efficient

Download

Total Economic Impact of SQL Server 2008 Upgrade
Forrester reports on increasing productivity and management capabilities

Download 

Achieving Cost and Resource Savings with UC
How Office Communications Server R2 and Exchange Server can make your business smarter and more efficient

Download 

The Compelling Case for Conferencing
Read how you can improve workload support and find IT efficiencies

Download

How Windows Server 2008 R2 Helps Optimize IT and Save you Money
Read how you can improve workload support and find IT efficiencies

Download

Windows Server 2008 R2 Hyper-V Live Migration
See how Windows Server 2008 R2 and Hyper-V enable virtualization and Live Migration

Download
Advertisement
Subscribe to Technology Review's daily e-mail update. Enter your e-mail address

TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology © 2009 Technology Review. All Rights Reserved.