Technology Review - Published By MIT
Advertisement

How Spam is Improving AI

Anti-spam puzzles are helping researchers develop smarter algorithms.

By Kurt Kleiner

Tuesday, October 14, 2008

smaller text tool iconmedium text tool iconlarger text tool icon

Those pesky visual puzzles that have to be completed each time you sign up for a Web mail account or post a comment to a blog are under attack. It's not just from spam-spewing computers or hackers, though; it's also from researchers who are using anti-spam puzzles to develop smarter, more humanlike algorithms.

Cats v. Dogs: A project called Asirra uses photographs of cats and dogs to distinguish between humans and computers. Normally, it is a difficult task for computers, but a new algorithm can correctly classify the images 83 percent of the time.
Credit: Microsoft

The most common type of puzzle (a series of distorted letters and numbers) is increasingly being cracked by smarter AI software. And a computer scientist has now developed an algorithm that can defeat even the latest photograph-based tests.

Known as CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart), these puzzles were developed in the late '90s as a way to separate real users from machines that create e-mail accounts to send out spam or log in to message boards to post ad links. The Turing Test, named after mathematician Alan Turing, involves measuring intelligence by having a computer try to impersonate a real person.

Textual CAPTCHAs are a good way to tell humans and spam-bots apart, because distorted letters and numbers can easily be read by real people (most of the time) but are fiendishly difficult for computers to decipher. However, computer scientists have long seen CAPTCHAs as an interesting AI challenge. Designers of textual CAPTCHAs have gradually introduced more distortion to prevent machines from solving them. But they have to balance security against usability: as distortion increases, even real human beings begin to find CAPTCHAs difficult to decipher.

Earlier this year, Jeff Yan, a researcher at the University of Newcastle, U.K., revealed a program capable of completing the textual CAPTCHAs used to protect Microsoft's Hotmail, MSN, and Windows Live services with a success rate of 60 percent. This might not sound like much, but it's significant, since a computer can try its attack thousands of times each minute. Yan withheld the paper until Microsoft had a chance to tweak its CAPTCHAs so that they were more difficult to crack. But at the ACM Computer and Communication Security Conference in Alexandria, VA, later this month, Yan will present details of another program that he says can crack even more widely used textual CAPTCHAs.

Story continues below

So an alternative is to ask users to solve different kinds of puzzles. But another paper to be presented at the same conference describes an algorithm that could spell trouble for even newer CAPTCHAs.

Philippe Golle of the Palo Alto Research Center has developed a program that can correctly pass an image-based CAPTCHA called Asirra, developed by Microsoft. Asirra asks users to correctly classify images of either cats or dogs using a database of three million images provided by animal-rescue organizations. This task should be even harder for computers than recognizing squiggly letters, but Golle's program can correctly identify the cats or dogs shown by Asirra 83 percent of the time.

Comments

  • Turing test
    Very interesting article! It is fascinating that the Turing Test itself has become a practical issue. I'd like to point out a trend I have seen in AI, and use that to project the evolution of CAPTCHAs.

    I took my first AI course in 1962 (a graduate EE course at MIT, taught by Prof James Slagle). At the time, the focus was on heuristics to "prune the choice tree".

    Looking at progress in AI for the following 40 years, this approach was not where the successes were. Pragmatic AI successes stemmed not from pruning the tree, but rather from faster and cheaper computing that could afford to look deeper into the tree -- pruning not necessary. I don't know whether the new CAPTCHA attacks fit that description, but I suspect they do.

    So what can we do to make better CAPTCHAs? Suggestions in the article, like identifying more pictures to reduce the probability of N hits, miss the point of history. That is at best a temporary expedient; as computing gets faster and cheaper, it will be necessary to identify more and more pictures -- a major nuisance to legitimate users.

    Perhaps a better approach would be to look again at the thing that makes a Turing test such a high threshold: the unstructured richness of human intelligence. Suppose the user had no idea what kind of test would appear as the CATCHA? That is more in the spirit of the Turing test than the limited scope of a CATCHA today. This time you must tell a cat from a dog; the next login you must identify a state or a country from a map; after that, you must name a tune that plays on your speaker.

    Of course, this "unstructured richness" has its problems as well. The article made the point that, as text is distorted more and more, people start to have as much trouble identifying it as computer programs do. By making the test less structured, we run the risk of some humans being unable to solve the puzzle. For instance, the map test that I mentioned assumes some proficiency with geography. I watch quiz shows that leave me appalled at the lack of geographical knowledge of too many contestants. As for the music identification, those same quiz shows impress on me that, while I may be a genius at pre-1960 music, I'm a complete dunce at post-1980 music that many contestants identify immediately.

    So the choice of test is a challenge to find unstructured, unpredictable knowledge that is, at the same time, universal to humans. And the test must have unambiguous responses, so that the CATCHA program itself does not have to pass the Turing test.

    DaveT

    Rate this comment: 12345

    dtutelman
    10/14/2008
    Posts:57
    Avg Rating:
    4/5
    • Re: Turing test
      Hi from Greece. Most interesting idea. Shifting the target from a specific domain to a more general one would probably require a "strong AI". Coincidentally i singed up yesterday on a popular website and it requested that i successfully passed two tests: the usual distorted-text test and a second one that looked like this: "nineteen -11 +3"
      Rate this comment: 12345

      aatnet
      10/16/2008
      Posts:2
      Avg Rating:
      3/5
  • Not a great Turing test to begin with
    The problem with current CAPTCHA's is that they weren't a very good Turing test to begin with (using computer fonts to generate an image).

    OCR technology has been around for a long time, and even after mutilating characters, the problem is we're still limited to only 60 or so "images" to choose from (a-z, A-Z, 0-9).
    I posted an idea about the ideal Turing test, one that could be used to make a CAPTCHA that is theoretically unbreakable.
    Read more about it here:

    http://www.yuniti.com/BetterCaptcha

    The idea is to use a hurdle in technology - image recognition - as a hurdle in cracking CAPTCHA's.  In essence, using google's image labeling platform and database to have users enter a word that describes the image.

    Because the source of images is infinately large (millions of images), caching of results would not be an option.

    Better yet, words could be limited to the English language for english sites, making it more difficult to outsource to "CAPTCHA typers" in China/Russia/etc.
    Rate this comment: 12345

    marquinhocb
    10/14/2008
    Posts:1
    Avg Rating:
    2/5
  • What If AI Is Solved?
    What will happen to the internet when AI is solved? What if no test could fool our future intelligent machines? How will web sites distinguish between humans and machines? I see this as a future business opportunity. We will need some form of personal authentication service. Maybe, the website's own AI will be used to authenticate users via their webcams and/or computer microphone.
    Rate this comment: 12345

    Mapou
    10/14/2008
    Posts:65
    Avg Rating:
    2/5
    • Re: What If AI Is Solved?
      A most interesting question that raises more. When that happens (in my view it is a matter of 'when' not 'if') will it be unethical to use that AI for spamming?
      Rate this comment: 12345

      aatnet
      10/16/2008
      Posts:2
      Avg Rating:
      3/5
  • AI
    AI as your thinking about it will never be solved. Yes machines will become more human, but humans will also get more like machines. We will eventually meet some ware in the middle, and the entire concept of AI will become ridiculous.
    Rate this comment: 12345

    zig158
    10/15/2008
    Posts:64
    Avg Rating:
    4/5
    • Re: AI
      how about sensors - like some of the laptops and desktops come with fingerprint reader to login, It can be used to identify whether its a human or a machine. But that will raise the question - how will a browser access data on that machine (security issues)
      Rate this comment: 12345

      kiran_342
      10/16/2008
      Posts:1
      Avg Rating:
      1/5

Log In

Forgot your password?     Register »
Advertisement

Videos

Making 3D Maps on the Move
Technology Review November/December 2009

Current Issue

Natural Gas Changes the Energy Map
The United States has vast supplies of this cleaner fossil fuel. But how should we use it?
Featured Content
Sponsored by:
White Papers

Twelve ways to reduce costs with SQL Server 2008
Find out how to reduce costs and get more efficient

Download

Total Economic Impact of SQL Server 2008 Upgrade
Forrester reports on increasing productivity and management capabilities

Download 

Achieving Cost and Resource Savings with UC
How Office Communications Server R2 and Exchange Server can make your business smarter and more efficient

Download 

The Compelling Case for Conferencing
Read how you can improve workload support and find IT efficiencies

Download

How Windows Server 2008 R2 Helps Optimize IT and Save you Money
Read how you can improve workload support and find IT efficiencies

Download

Windows Server 2008 R2 Hyper-V Live Migration
See how Windows Server 2008 R2 and Hyper-V enable virtualization and Live Migration

Download
Advertisement
Subscribe to Technology Review's daily e-mail update. Enter your e-mail address

TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology © 2009 Technology Review. All Rights Reserved.