Technology Review - Published By MIT
Advertisement
[1] 2 Next »

Tuesday, October 14, 2008

How Spam is Improving AI

Anti-spam puzzles are helping researchers develop smarter algorithms.

By Kurt Kleiner

smaller text tool iconmedium text tool iconlarger text tool icon
Cats v. Dogs: A project called Asirra uses photographs of cats and dogs to distinguish between humans and computers. Normally, it is a difficult task for computers, but a new algorithm can correctly classify the images 83 percent of the time.
Credit: Microsoft

Those pesky visual puzzles that have to be completed each time you sign up for a Web mail account or post a comment to a blog are under attack. It's not just from spam-spewing computers or hackers, though; it's also from researchers who are using anti-spam puzzles to develop smarter, more humanlike algorithms.

The most common type of puzzle (a series of distorted letters and numbers) is increasingly being cracked by smarter AI software. And a computer scientist has now developed an algorithm that can defeat even the latest photograph-based tests.

Known as CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart), these puzzles were developed in the late '90s as a way to separate real users from machines that create e-mail accounts to send out spam or log in to message boards to post ad links. The Turing Test, named after mathematician Alan Turing, involves measuring intelligence by having a computer try to impersonate a real person.

Textual CAPTCHAs are a good way to tell humans and spam-bots apart, because distorted letters and numbers can easily be read by real people (most of the time) but are fiendishly difficult for computers to decipher. However, computer scientists have long seen CAPTCHAs as an interesting AI challenge. Designers of textual CAPTCHAs have gradually introduced more distortion to prevent machines from solving them. But they have to balance security against usability: as distortion increases, even real human beings begin to find CAPTCHAs difficult to decipher.

Earlier this year, Jeff Yan, a researcher at the University of Newcastle, U.K., revealed a program capable of completing the textual CAPTCHAs used to protect Microsoft's Hotmail, MSN, and Windows Live services with a success rate of 60 percent. This might not sound like much, but it's significant, since a computer can try its attack thousands of times each minute. Yan withheld the paper until Microsoft had a chance to tweak its CAPTCHAs so that they were more difficult to crack. But at the ACM Computer and Communication Security Conference in Alexandria, VA, later this month, Yan will present details of another program that he says can crack even more widely used textual CAPTCHAs.

So an alternative is to ask users to solve different kinds of puzzles. But another paper to be presented at the same conference describes an algorithm that could spell trouble for even newer CAPTCHAs.

Philippe Golle of the Palo Alto Research Center has developed a program that can correctly pass an image-based CAPTCHA called Asirra, developed by Microsoft. Asirra asks users to correctly classify images of either cats or dogs using a database of three million images provided by animal-rescue organizations. This task should be even harder for computers than recognizing squiggly letters, but Golle's program can correctly identify the cats or dogs shown by Asirra 83 percent of the time.

[1] 2 Next »

Comments

  • Turing test
    dtutelman on 10/14/2008 at 10:03 AM
    Posts:
    22
    Avg Rating:
    4/5
    Very interesting article! It is fascinating that the Turing Test itself has become a practical issue. I'd like to point out a trend I have seen in AI, and use that to project the evolution of CAPTCHAs.

    I took my first AI course in 1962 (a graduate EE course at MIT, taught by Prof James Slagle). At the time, the focus was on heuristics to "prune the choice tree".

    Looking at progress in AI for the following 40 years, this approach was not where the successes were. Pragmatic AI successes stemmed not from pruning the tree, but rather from faster and cheaper computing that could afford to look deeper into the tree -- pruning not necessary. I don't know whether the new CAPTCHA attacks fit that description, but I suspect they do.

    So what can we do to make better CAPTCHAs? Suggestions in the article, like identifying more pictures to reduce the probability of N hits, miss the point of history. That is at best a temporary expedient; as computing gets faster and cheaper, it will be necessary to identify more and more pictures -- a major nuisance to legitimate users.

    Perhaps a better approach would be to look again at the thing that makes a Turing test such a high threshold: the unstructured richness of human intelligence. Suppose the user had no idea what kind of test would appear as the CATCHA? That is more in the spirit of the Turing test than the limited scope of a CATCHA today. This time you must tell a cat from a dog; the next login you must identify a state or a country from a map; after that, you must name a tune that plays on your speaker.

    Of course, this "unstructured richness" has its problems as well. The article made the point that, as text is distorted more and more, people start to have as much trouble identifying it as computer programs do. By making the test less structured, we run the risk of some humans being unable to solve the puzzle. For instance, the map test that I mentioned assumes some proficiency with geography. I watch quiz shows that leave me appalled at the lack of geographical knowledge of too many contestants. As for the music identification, those same quiz shows impress on me that, while I may be a genius at pre-1960 music, I'm a complete dunce at post-1980 music that many contestants identify immediately.

    So the choice of test is a challenge to find unstructured, unpredictable knowledge that is, at the same time, universal to humans. And the test must have unambiguous responses, so that the CATCHA program itself does not have to pass the Turing test.

    DaveT

    Rate this comment: 12345
    • Re: Turing test
      aatnet on 10/16/2008 at 6:18 AM
      Posts:
      2
      Hi from Greece. Most interesting idea. Shifting the target from a specific domain to a more general one would probably require a "strong AI". Coincidentally i singed up yesterday on a popular website and it requested that i successfully passed two tests: the usual distorted-text test and a second one that looked like this: "nineteen -11 +3"
      Rate this comment: 12345
  • Not a great Turing test to begin with
    marquinhocb on 10/14/2008 at 2:16 PM
    Posts:
    1
    Avg Rating:
    2/5
    The problem with current CAPTCHA's is that they weren't a very good Turing test to begin with (using computer fonts to generate an image).

    OCR technology has been around for a long time, and even after mutilating characters, the problem is we're still limited to only 60 or so "images" to choose from (a-z, A-Z, 0-9).
    I posted an idea about the ideal Turing test, one that could be used to make a CAPTCHA that is theoretically unbreakable.
    Read more about it here:

    http://www.yuniti.com/BetterCaptcha

    The idea is to use a hurdle in technology - image recognition - as a hurdle in cracking CAPTCHA's.  In essence, using google's image labeling platform and database to have users enter a word that describes the image.

    Because the source of images is infinately large (millions of images), caching of results would not be an option.

    Better yet, words could be limited to the English language for english sites, making it more difficult to outsource to "CAPTCHA typers" in China/Russia/etc.
    Rate this comment: 12345
  • What If AI Is Solved?
    Mapou on 10/14/2008 at 11:37 PM
    Posts:
    12
    Avg Rating:
    3/5
    What will happen to the internet when AI is solved? What if no test could fool our future intelligent machines? How will web sites distinguish between humans and machines? I see this as a future business opportunity. We will need some form of personal authentication service. Maybe, the website's own AI will be used to authenticate users via their webcams and/or computer microphone.
    Rate this comment: 12345
    • Re: What If AI Is Solved?
      aatnet on 10/16/2008 at 6:07 AM
      Posts:
      2
      A most interesting question that raises more. When that happens (in my view it is a matter of 'when' not 'if') will it be unethical to use that AI for spamming?
      Rate this comment: 12345
  • AI
    zig158 on 10/15/2008 at 7:41 AM
    Posts:
    64
    Avg Rating:
    3/5
    AI as your thinking about it will never be solved. Yes machines will become more human, but humans will also get more like machines. We will eventually meet some ware in the middle, and the entire concept of AI will become ridiculous.
    Rate this comment: 12345
    • Re: AI
      kiran_342 on 10/16/2008 at 1:06 PM
      Posts:
      1
      how about sensors - like some of the laptops and desktops come with fingerprint reader to login, It can be used to identify whether its a human or a machine. But that will raise the question - how will a browser access data on that machine (security issues)
      Rate this comment: 12345
Advertisement

Current Issue

Technology Review November/December 2008
Sun + Water = Fuel
An MIT chemist has opened the way to making hydrogen fuel from water using sunlight.
•  Subscribe
Save 41%
•  Table of Contents
•  MIT News

Magazine Services

Career Resources

MIT Technology Insider

Stories and breaking news from inside MIT about the latest research, innovations, and startups--in a convenient monthly e-newsletter. Subscribe today

Follow us on Twitter

Twitter

Get Technology Review updates via the web, cellphone, or Instant Messager – Follow techreview on Twitter!

Advertisement

More Technology News from Forbes

Advertisement
Advertisement
TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology