Securing Your Voice

Researchers turn voiceprints into passwords to avoid storing your actual speech anywhere.

David Talbotarchive page

August 27, 2012

Voice authentication is increasingly used by tens of millions of people, including bank and telecom customers: you record a sample upon enrollment, and then speak that passage each time you call in, confirming your identity with a certainty regular passwords can’t match. But if hackers obtain your voiceprint—under scenarios akin to breaches of credit-card and other personal data—they could use it to break into other systems that use voice authentication.

Now researchers at Carnegie Mellon University say they’ve developed voice-verification technology that can transform your voice into a series of password-like data strings, in a process that can be handled on the average smart phone. Your actual voice never leaves your phone, during enrollment or later authentication.

“We are the first to convert a voice recording to something like passwords,” says Bhiksha Raj, the CMU computer scientist who led the research. “With fingerprints, this is exactly what is done, but nobody has figured out how to do it with voice until now.” The work will be presented as a keynote speech at an information security conference in Passau, Germany next month.

The technology handles the slight differences in the way people speak from day to day by making multiple password-like data strings using different mathematical functions. By comparing how many of those match, it can determine whether the speaker is the person who enrolled. “The key to making it work is that instead of converting it to just one password, we convert it to a large collection of them,” Raj says.

The technology also throws in a dash of extra data specific to your phone, so that “nobody else besides you, using your smart phone, can generate the specific strings that you did,” he says. Then it encrypts those data strings for their journey across the network.

The CMU system is accurate 95 percent of the time using a test dataset. (Errors would simply require a speaker to repeat the authentication process.) That’s not quite as good as commercial systems that use stored voiceprints, but the technology is still being honed, and improvements are expected, says Raj. He adds that the method, though still in the research phase, is computationally efficient enough to work on most smart phones.

Other research efforts to protect voice privacy in voice verification have tried to work with encrypted versions of voice files—without ever decrypting them. (See “Homomorphic Encryption.”) But that method takes so much computational horsepower that it’s “currently impractical,” says Shantanu Rane, principal research scientist at Mitsubishi Electric Research Laboratories in Cambridge, Massachusetts. Raj’s technology “works fast while giving reasonable verification accuracy,” Rane adds.

Other groups are working on different methods to protect voice privacy. For the speech recognition used by Apple’s Siri app, researchers at BBN, of Cambridge, Massachusetts, have proposed only sending certain features of your voice to Apple (see “Wiping Away Your Siri Fingerprint”), rather than the voice itself.

It might take only take one bad voice-data breach to shock users and shake the industry, says Prem Natarajan, executive vice president at Raytheon BBN Technologies in Cambridge, Massachusetts. “Privacy-preserving speech processing, including for voice verification, is likely to be of increasing importance” given the surging popularity of voice interfaces, he says (see “Where Speech Recognition is Going”). “I would like nothing more than to be able to carry only one password with me—my voice.”

Currently, companies protect the privacy of voiceprints in part by isolating the stored file from other identifiers such as name and social security number. That’s something industry leader Nuance Communications, of Burlington, Massachusetts, already does with the 23 million voiceprints it stores, says Brett Beranek, the company’s senior solutions manager for voice biometrics. Going a step further, the company also encrypts the stored voiceprints, he says, only decrypting them when the authentication process is under way.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.