Researchers Crack Audio Security System

Spammers could create new accounts more cheaply.

Christopher Mimsarchive page

June 2, 2011

A team of computer scientists at Stanford and Tulane University with expertise in artificial intelligence, audio processing, and computer security has come up with a way to automatically defeat the systems that prevent spammers from creating new accounts on sites like Yahoo, Microsoft’s Hotmail, and Twitter.

Many websites require users to correctly transcribe a string of distorted characters—a puzzle known as a CAPTCHA—to gain access. These tests are relatively easy for people, but very hard for computers. Most sites also make CAPTCHAs available in audio form, for vision-impaired users, and the researchers found that their algorithm could solve many of these audio CAPTCHAs. Researchers at Carnegie Mellon University have demonstrated the vulnerability of audio CAPTCHAs before, in 2008, but the new work targets newer, more secure versions.

The ability to automatically defeat CAPTCHAs could make it cheaper for spammers to churn out spam. Right now, spammers pay humans sweatshop wages to solve CAPTCHAs, but this can cost up to one cent apiece.

Team leader Elie Bursztein, of Stanford University, says the team’s algorithm, called deCAPTCHA, was able to defeat audio CAPTCHAs from Microsoft and Yahoo in almost half of all cases. Microsoft has since switched to another type of CAPTCHA, which the algorithm is still able to defeat in 1.5 percent of cases.

“[In defeating security measures,] if you cross the 1 percent threshold, you are in a lot of trouble,” says Burzstein. “It’s almost a free pass.”

Luis Von Ahn, who coined the term CAPTCHA, says that, in reality, companies can control the rate at which audio CAPTCHAS are compromised by limiting the number of them that can be solved per day, or by limiting the number that can be solved by a single IP address. But, says security expert Markus Jakobsson, “it’s very important to understand how we can break things before the bad guys do.”

An audio CAPTCHA reads aloud a string of letters or numbers with added audio distortion. The Stanford team created a learning algorithm to “process the sound in a way that was as close as possible to the way that we think the human ear is made,” says Bursztein. This meant focusing on lower-frequency sounds, which humans are especially good at processing, and eliminating as much of the noise from audio CAPTCHAs as possible.

Bursztein’s team is also working to create several new types of audio CAPTCHA. One type plays two voices reading different strings of letters or words at the same time. Humans are especially good at picking out one voice when surrounded by many competing conversations in a crowded room, but computers are terrible at this task. A second type combines words with music.

Even if many existing CAPTCHAs are vulnerable to attack, says Jakobsson, their failure isn’t as severe as the compromise of a password system. “[CAPTCHA defeat] is a gradual decay of security. You don’t have to keep everybody out to feel like you have security—some failure is tolerable.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.