A technique for saving bandwidth in Internet phone calls could undermine their security, according to research recently presented at the IEEE Symposium on Security and Privacy. Johns Hopkins University researchers showed that, in encrypted phone calls using a certain combination of technologies, preselected phrases can be spotted up to 50 percent of the time on average, and up to 90 percent of the time under optimal conditions.
Voice-over-Internet-protocol (VoIP) phone calls, in which a computer converts a voice signal into data packets and sends them over the Internet, are increasingly popular for personal and business communication. Although most VoIP systems don’t yet use encryption, says Jason Ostrom, director of the VoIP-exploitation research lab at Sipera Systems, it’s absolutely necessary, particularly for business users. In many cases, security measures aren’t in place because companies haven’t realized how vulnerable VoIP can be, he says. He cites an assessment that he did for a hotel that uses VoIP phones, in which he showed that an attacker could access and record guests’ calls using a laptop plugged into a standard wall connection. The Johns Hopkins researchers hope that pointing out possible holes in voice encryption systems can help ensure their security when they become more commonplace.
The Johns Hopkins attack takes advantage of a compression technique called variable-bit-rate encoding, which is sometimes used to save bandwidth in VoIP calls, explains Charles Wright, lead author of the paper. (Wright, who recently received his PhD from Johns Hopkins, will join the technical staff at the MIT Lincoln Laboratory in August.) Variable-bit-rate encoding, Wright says, adjusts the size of data packets being sent over the Internet based on how much information they actually contain. For example, when the person on one end of a VoIP call is listening rather than speaking, the packets sent from that person’s computer shrink significantly. Also, packets containing certain sounds, such as “s” or “f,” can take up less space than those containing more-complex sounds, such as vowels.
Encrypting the packets after they’ve been compressed scrambles their contents, making them look like gibberish. But it doesn’t change their size, which is what would give away information to potential eavesdroppers.
In their tests, the Hopkins researchers simulated the packets that a combination of compression and encryption would produce for particular phrases. While an example of the way that a targeted speaker pronounced a particular phrase would give eavesdroppers a big advantage, they could still simulate the phrase using a pronunciation dictionary and a database of sample sounds from multiple speakers. The researchers can create many versions of the sounds in the phrase, which lets them accommodate different accents and other variations in pronunciation. They then use probabilistic methods to look for likely instances of the phrase. Wright says that the method can identify the phrase, on average, about half the time that it occurs, and that about half of the phrases it flags turn out to be exact matches of the desired phrase. In some circumstances, as when the phrases are longer, or when the speakers are particularly well matched to the simulated versions of the phrase, the accuracy became as high as 90 percent, Wright says. Because eavesdroppers have to know what phrase they’re listening for, Wright says, “the threat would be more to technical, professional jargon than to an informal call between friends or family members.”