Technology Review - Published By MIT
Advertisement
« Back 1 [2]

Tuesday, October 14, 2008

How Spam is Improving AI

Continued from page 1

By Kurt Kleiner

smaller text tool iconmedium text tool iconlarger text tool icon

Golle trained his program using 8,000 images collected from the same website. Through trial and error, his software gradually learned to tell cats and dogs apart, based on a statistical analysis of color and texture in each photo. The pink of the dogs' tongues and the green of the cats' eyes provided strong clues, Golle says, but it is only by studying color and texture information from so many images that his program could attack the problem. "Machine learning is very good at aggregating information," Golle says.

However, although each individual picture was recognized 83 percent of the time, the full CAPTCHA test requires 12 pictures to be identified simultaneously, so the attack actually works only 10.3 percent of the time.

Golle says that an easy countermeasure would be for Asirra to present more pictures, which would further drive down the success rate of the attack. Microsoft did not respond to our requests for comment.

Despite all this progress, it's unclear whether or not real spammers are currently using AI attacks against real CAPTCHAs. Websense Security Labs, in San Diego, has released reports about spammers cracking CAPTCHAs, but often this involves simply having low-paid workers solve CAPTCHAs manually.

Luis von Ahn, a computer scientist at Carnegie Mellon University, who helped coin the term CAPTCHA, says that it's not clear that any common CAPTCHAs have been broken by machine attack in the real world. "I don't know of anybody who's thinking of getting rid of the CAPTCHA because it doesn't work," he says.

However, von Ahn notes that using humans comes at a cost. Even if workers are paid just $3 per 1,000 CAPTCHAs, that is expensive, he says, especially since most of the hacked Web mail accounts will be shut down soon after they begin to send out spam. So a truly automated attack would reduce the cost to spammers and greatly increase the number of successful attacks they could afford, he says.

But until computers start to get much smarter, CAPTCHA creators will always be able to implement a few simple tweaks to make a CAPTCHA much harder. "I do think there will be a day when, essentially, CAPTCHAs are going to be useless," von Ahn says. "But I don't think it's this year, or next."

« Back 1 [2]

Comments

  • Turing test
    dtutelman on 10/14/2008 at 10:03 AM
    Posts:
    23
    Avg Rating:
    4/5
    Very interesting article! It is fascinating that the Turing Test itself has become a practical issue. I'd like to point out a trend I have seen in AI, and use that to project the evolution of CAPTCHAs.

    I took my first AI course in 1962 (a graduate EE course at MIT, taught by Prof James Slagle). At the time, the focus was on heuristics to "prune the choice tree".

    Looking at progress in AI for the following 40 years, this approach was not where the successes were. Pragmatic AI successes stemmed not from pruning the tree, but rather from faster and cheaper computing that could afford to look deeper into the tree -- pruning not necessary. I don't know whether the new CAPTCHA attacks fit that description, but I suspect they do.

    So what can we do to make better CAPTCHAs? Suggestions in the article, like identifying more pictures to reduce the probability of N hits, miss the point of history. That is at best a temporary expedient; as computing gets faster and cheaper, it will be necessary to identify more and more pictures -- a major nuisance to legitimate users.

    Perhaps a better approach would be to look again at the thing that makes a Turing test such a high threshold: the unstructured richness of human intelligence. Suppose the user had no idea what kind of test would appear as the CATCHA? That is more in the spirit of the Turing test than the limited scope of a CATCHA today. This time you must tell a cat from a dog; the next login you must identify a state or a country from a map; after that, you must name a tune that plays on your speaker.

    Of course, this "unstructured richness" has its problems as well. The article made the point that, as text is distorted more and more, people start to have as much trouble identifying it as computer programs do. By making the test less structured, we run the risk of some humans being unable to solve the puzzle. For instance, the map test that I mentioned assumes some proficiency with geography. I watch quiz shows that leave me appalled at the lack of geographical knowledge of too many contestants. As for the music identification, those same quiz shows impress on me that, while I may be a genius at pre-1960 music, I'm a complete dunce at post-1980 music that many contestants identify immediately.

    So the choice of test is a challenge to find unstructured, unpredictable knowledge that is, at the same time, universal to humans. And the test must have unambiguous responses, so that the CATCHA program itself does not have to pass the Turing test.

    DaveT

    Rate this comment: 12345
    • Re: Turing test
      aatnet on 10/16/2008 at 6:18 AM
      Posts:
      2
      Avg Rating:
      3/5
      Hi from Greece. Most interesting idea. Shifting the target from a specific domain to a more general one would probably require a "strong AI". Coincidentally i singed up yesterday on a popular website and it requested that i successfully passed two tests: the usual distorted-text test and a second one that looked like this: "nineteen -11 +3"
      Rate this comment: 12345
  • Not a great Turing test to begin with
    marquinhocb on 10/14/2008 at 2:16 PM
    Posts:
    1
    Avg Rating:
    2/5
    The problem with current CAPTCHA's is that they weren't a very good Turing test to begin with (using computer fonts to generate an image).

    OCR technology has been around for a long time, and even after mutilating characters, the problem is we're still limited to only 60 or so "images" to choose from (a-z, A-Z, 0-9).
    I posted an idea about the ideal Turing test, one that could be used to make a CAPTCHA that is theoretically unbreakable.
    Read more about it here:

    http://www.yuniti.com/BetterCaptcha

    The idea is to use a hurdle in technology - image recognition - as a hurdle in cracking CAPTCHA's.  In essence, using google's image labeling platform and database to have users enter a word that describes the image.

    Because the source of images is infinately large (millions of images), caching of results would not be an option.

    Better yet, words could be limited to the English language for english sites, making it more difficult to outsource to "CAPTCHA typers" in China/Russia/etc.
    Rate this comment: 12345
  • What If AI Is Solved?
    Mapou on 10/14/2008 at 11:37 PM
    Posts:
    18
    Avg Rating:
    3/5
    What will happen to the internet when AI is solved? What if no test could fool our future intelligent machines? How will web sites distinguish between humans and machines? I see this as a future business opportunity. We will need some form of personal authentication service. Maybe, the website's own AI will be used to authenticate users via their webcams and/or computer microphone.
    Rate this comment: 12345
    • Re: What If AI Is Solved?
      aatnet on 10/16/2008 at 6:07 AM
      Posts:
      2
      Avg Rating:
      3/5
      A most interesting question that raises more. When that happens (in my view it is a matter of 'when' not 'if') will it be unethical to use that AI for spamming?
      Rate this comment: 12345
  • AI
    zig158 on 10/15/2008 at 7:41 AM
    Posts:
    64
    Avg Rating:
    4/5
    AI as your thinking about it will never be solved. Yes machines will become more human, but humans will also get more like machines. We will eventually meet some ware in the middle, and the entire concept of AI will become ridiculous.
    Rate this comment: 12345
    • Re: AI
      kiran_342 on 10/16/2008 at 1:06 PM
      Posts:
      1
      Avg Rating:
      1/5
      how about sensors - like some of the laptops and desktops come with fingerprint reader to login, It can be used to identify whether its a human or a machine. But that will raise the question - how will a browser access data on that machine (security issues)
      Rate this comment: 12345
Advertisement

Current Issue

Technology Review January/February 2009
Lifeline for Renewable Power
Without a radically expanded and smarter electrical grid, wind and solar will remain niche power sources.
•  Subscribe
Save 41%
•  Table of Contents
•  MIT News

Magazine Services

Career Resources

MIT Technology Insider

Stories and breaking news from inside MIT about the latest research, innovations, and startups--in a convenient monthly e-newsletter. Subscribe today
Advertisement

Follow us on Twitter

Twitter

Get Technology Review updates via the web, cellphone, or Instant Messager – Follow techreview on Twitter!

Advertisement

More Technology News from Forbes

Advertisement
Advertisement
TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology