In the battle to beat the spambots, a new weapon has been developed that exploits the difficulty that computers have with recognizing joined-up handwriting. The hope is that switching from text-based verification systems to systems that use computer-generated handwriting will make many Web services more secure.
Developed by researchers at the State University of New York (SUNY), in Buffalo, the system is a variant of a commonly used challenge-response technique called a CAPTCHA (completely automated public Turing test to tell computers and humans apart). This kind of test is designed to be easy for humans but nearly impossible for machines to pass, to prevent automated programs from automatically generating new accounts for nefarious purposes like sending out spam.
Most CAPTCHAs work by displaying images of randomly generated text that has been distorted to make it difficult for optical character recognition (OCR) programs to read, without making it illegible to humans. To pass the test and gain access, users simply reenter the text that they have read.
The trouble is that OCR software is improving steadily, making it possible for spambots to sometimes pass these tests. “It’s an arms race,” says Achint Oommen Thomas, one of the computer scientists who developed the new system. “Every CAPTCHA that exists today has already been broken.”
Just last year, a character-based CAPTCHA developed by Microsoft and used widely for services like Hotmail, MSN, and Windows Live was broken by Jeff Yan and his colleagues at Newcastle University, in the U.K. Microsoft had previously claimed that the CAPTCHA would only let one in 10,000 machine attempts through, but Yan was able to demonstrate that his attack succeeded 60 percent of the time.
Microsoft has since enacted improvements that have made the service much more secure. Even so, Oommen Thomas believes that automatically generating joined-up handwriting could further raise the bar. His system, developed with colleagues Amalia Rusu and Venu Govindaraju, generates words by selecting characters, all handwritten, from a public database of 20,000. Algorithms are then applied to identify important control points within the characters–the key loops and arches that make the letters and numbers recognizable–before other algorithms distort the characters and link them so that they appear joined up. “We distort them randomly but make sure that they are within set limits; otherwise, they become illegible to humans,” says Oommen Thomas.