"Emergent" Images to Outwit Spambots

Software creates images that confuse machines but are clear to people.

Rachel Kremenarchive page

January 6, 2010

Researchers have developed an automated system for creating still and video images that can be identified by humans but not by computers. Such technology could be useful for Captcha systems, which were designed to keep “spambots”–or automated junk e-mail programs–from signing up for free online accounts.

**Making sense:** The image above was created using software designed to generate pictures with hidden figures. Pattern-recognition software tends to have more trouble spotting the flamingo on the right than humans do.

The new technology uses simple images of a recognizable, moving figure, such as a running man or galloping horse, and converts them into blotches, hidden within a similarly blotchy scene. Computers are usually unable to detect the figure, but the human eye typically can.

Coined in 2000, “Captcha” stands for Completely Automated Public Turing test to tell Computers and Humans Apart. A typical Captcha system generates distorted text, often on a slightly cluttered background. The user must respond with the correct string of characters in order to access an online service, such as the account creation tool for a free e-mail address. But existing Captcha system don’t offer complete security–they are occasionally broken by security researchers and hackers. Captcha systems for Live Mail, Gmail, Yahoo!, Livejournal, and PayPal have all been cracked at one point. While the current systems are considered secure for now, most in the field agree it’s only a matter of time before they are broken again. Captcha system designers have to keep improving their methods to stay one step ahead of those who seek to circumvent them.

“The systems we all use today are relatively easy to break,” says Danny Cohen-Or, a researcher on the project and a professor of computer science at Tel Aviv University. “What we have developed is something that, with more effort, could be like the base of a stronger Captcha [system].”

Developed with researchers at the Indian Institute of Technology in Delhi, the National Cheng Kung University in Taiwan, and others from the University of Tel Aviv, the software was inspired by “gestalt,” or the idea of the whole being greater than the sum of the parts. Specifically, the software exploits the human ability to analyze a chaotic, fragmented scene to find a hidden figure.

The key was designing an adjustable system that could generate images that are easy enough for a human to identify, but too difficult for pattern-recognition software.

The software begins with a 3-D subject, such as a running dog. It converts the dog into a series of carefully generated black dots, which the researchers call “splats,” that take into account the dog’s silhouette and shape. To ensure that the subject isn’t too obvious, long, complicated shapes are broken into smaller parts, and the silhouettes are slightly deformed. The software then places the subject in a scene with more shapes, including some made of small pieces of the subject, to create added visual confusion. Videos are created as a series of still images.

The “emergent” images generated by the system were tested on three kinds of learning-based pattern recognition software. After training on a set of 30 emergent images, the systems were presented with other emergent images. The best of the three pattern recognition systems could only distinguish between a horse and a human 60 percent of the time. Humans, presented with the same task, answered correctly nearly 100 percent of the time. The software can also create images that are far more difficult for computers to interpret, but this would make it harder for humans to interpret, too. “It’s still not something that the big mass of users will be able to do,” Cohen-Or says.

**The big picture:** Small sections of the emerging picture (left) look like little more than random splatter, to humans and machines. But when a human sees an emerging image (center), the animal becomes apparent. The normal picture is shown at right.

He says another key problem to using the software for Captcha lies in the test procedure. It’s not clear how it would determine whether a user has accurately identified the image. Asking users to describe what they see would be far too complicated. One person might write “dog” to describe the subject, while another might write “doggy,” “puppy,” or “Dalmatian.” There are far too many correct answers. “We cannot do multiple choice, either,” Cohen-Or says. “Then the computer could guess.”

Other researchers have found ways around this problem. James Wang, an associate professor of information sciences and technology at the Pennsylvania State University worked on a still image Captcha system that asks users to select one image from a collage and then annotate an unrelated image by selecting the correct response from a list. The approach reduces the likelihood of a spambot bluffing the system. “The success rate for a random attack can be controlled to as low as one in 210,312, if these steps are applied twice,” says Wang.

Wang admits that the emergence system does have a “cool factor.” Users might enjoy finding the hidden animal in a scene. But he says it will take more development and experimentation to create a practical Captcha system.

“Whereas this work is interesting, and the encoding scheme appears to be novel, only time will tell if image and vision scientists can find a way to break it,” Wang says. “Besides, in order to make a practical Captcha system with a proven low brute-force attack rate, more development and experimentation will have to be done.”

Wang notes that the system would have to incorporate a lot of different animals to work, and he wonders how many could be easily identified by humans. “For example, will I be able to tell a tiger from a leopard when only body silhouettes are shown?”

Luis von Ahn, a professor of computer science at Carnegie Mellon University and one of the people who first started building Captcha systems, agrees that the animal selection might be too limited to make Cohen-Or’s approach practical. And while he finds the research interesting, he’s not sure emergent images are really more secure than the standard distorted text systems currently used. “Nobody has actually tried to break this really hard,” von Ahn says.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.