The Voice of Osama bin Laden

Osama’s voice on tape proves that the leader of al Qaeda is still alive. Or does it?

Richard A. Mullerarchive page

January 23, 2004

On January 4, Al Jazeera broadcast yet another audio tape purported to be from Osama bin Laden, in which he exhorted his followers to “continue the jihad.” The voice referred to the capture of Saddam Hussein, proving the tape was recent. An anonymous CIA official confided to the New York Times: “It is likely the voice of Osama bin Laden.” In an interview with CBS, Homeland Security Secretary Tom Ridge agreed.

I pronounced bin Laden dead in my May 2002 and September 2002 columns. Am I ready to retract this claim, and to pay off several bets I made back then? No, not yet. I think Osama bin Laden is still dead. And I don’t think I’m just being stubborn. To understand my logic, consider the following three issues: the state of the antiterrorism effort, the technology of voice identification, and the most likely alternative hypothesis that could explain the audio tape.

Despite the distractions of Iraq War II, the U.S. antiterrorism effort has remained remarkably strong. The U.S.A. Patriot Act treads on our civil rightsbut it also makes it difficult for terrorists to operate in our homeland. Secretive organizations cannot easily regroup when a few of their key links are disrupted, whether through wiretapping, surveillance, or arrest. What do you do when your one and only contact is gone? Al Qaeda may not be defeated, and it can still send suicide bombers against soft targets, but its organization must certainly be in a state of disarray.

Moreover, experts I’ve consulted with tell me that cooperation between the United States and foreign antiterrorist organizations remains strong, even in “old Europe.” Political disagreements about the wisdom of invading Iraq have not interfered with the shared recognition of the dangers of terrorism. That is not surprising; even France and Germany know that Osama didn’t like them much more than he liked the United States.

All that adds up to a tough time for al Qaeda, and it and its sympathizers desperately need encouragement from their charismatic leader. That is, of course, why the tapes were made and broadcast. But why were they audio and not video? The voice sounds right, but video would have been more convincing. Video recorders are cheap and small. Osama could put all doubt to rest by releasing a film of him holding up a recent newspaper of Saddam’s capture. Prior to Tora Bora, videos of him were the norm. What happened?

I can find only two plausible explanations. One is that Osama bin Laden is severely ill or wounded, and does not want the world to know it. The other is that he is dead, and the audio tape is faked. But how could the counterfeiter do a good enough job to fool experts?

Voice recognition is a rapidly developing technology, thanks to the availability of cheap computing power. You’ve probably seen a “voice print,” a plot of frequency density vs. time; music editing software makes them on personal computers. Old voice recognition analysis made matches between sets of such plots. Modern voice identification systems, which seek to have low false-alarm rates even in the presence of noise, tend to depend more heavily on a technique known as “feature analysis.” A feature is a peculiar twist in the voice, often a tell-tale transition between phonemes with different pitches. These are not readily heard by listeners, but they can be picked out in a digital analysis. Patterns of such glitches are unique identifiers, much like the ridge bifurcations and other minutiae of fingerprint patterns are the keys in fingerprint identification.

Voice identification systems are already in widespread use around the world. They are employed at the Canadian border to identify and track frequent travelers, and in Britain to verify the compliance of young parolees. U.S. companies, including Chase Manhattan Bank, Charles Schwab, and Prudential Securities, use voice identification to control access to secure areas and records. Visa is hoping to replace credit card verification personal identification numbers with voice recognition; a computer will compare features of your voice with those stored in the credit card chip.

With such a success record, shouldn’t voice recognition software work reliably to identify Osama, or to reject an imitator? Unfortunately, the Al Jazeera tapes are not high quality-probably no better than telephone sound. That’s good enough to detect some kinds of deception, but not all. Here are three possibilities:

1. The tape was made by an impressionist trying to imitate bin Laden’s voice. Good impressionists can mimic the tone and pacing of their subject, but they often overemphasize obvious quirks, much as a caricaturist exaggerates dominant physical features. That makes it amusing to hear, but it won’t fool an analyst. Impressionists are not good at catching the more subtle features that even simple voice recognition software uses. This kind of counterfeit can almost certainly be ruled out.

2. The tape was made by cutting and pasting true excerpts from bin Laden’s past speeches. Much of the tape could be unchanged from a prior recording. The tough part for the counterfeiter was adding mention of Saddam’s capture, where words and phrases had to be rearranged. To detect such a forgery, a good analyst would listen for discontinuities in the background noise, or small blips indicating the tape was spliced. Digital processing by the tape maker can remove such artifacts, but they leave behind their own; low-pass filters, for example, create easily detected changes in the spectrum of the background hiss. (That’s why true audiophiles dislike noise suppression filters. It is readily noticed by a trained ear.) Such cutting and pasting, even with digital filtering, would have been detected by the CIA. Digital processing can be detected in other ways; for example, it sometimes generates false frequencies (called aliases). Such tampering would have raised suspicions. Therefore this scenario can probably be ruled out as well.

3. The tape was a recording of one of Osama bin Laden’s sons, who was deliberately trying to sound like his father. This is, in my mind, the most likely hypothesis.

Saad bin Osama bin Laden is the third of Osama’s 23 to 50 children; he is known to be in his early twenties. He has been active in al Qaeda since his pre-teen years, and was probably being groomed for eventual leadership. He is reported to be fluent in English and the use of computers. The Washington Post reported that Saad was a key organizer of the May 12, 2003, al Qaeda bombing in Riyadh, Saudi Arabia. There have been reports that he is hiding along the Afghanistan-Pakistan border; others say that he is in Iran close to the Afghanistan border, in a region not controlled by the Iranian government. The Arab newspaper Asharq Al Awsat says that Saad is now one of the principal leaders of al Qaeda, but I’m skeptical of that. Al Qaeda is too sophisticated to let such a young and inexperienced person take over. But he likely has an extremely useful talent: sounding like his dad.

I like to consider myself an expert in the voices of my wife and my two daughters. I notice them even in a crowded and noisy room. When one of them telephones me, I instantly recognize her-but often incorrectly. The one I name is the one I expect, not the one who called. (They find this very amusing.) I don’t know if the similarity of their voices is genetic or learned, but I know that others have similar problems. Parents and children tend to sound alike, and that effect is exaggerated when bandwidth is poor, such as in a telephone call or on a cassette recording. In fact, commercial speech recognition software that is “trained” to respond to a particular person’s voice often will have a hard time distinguishing the voice of a family member. The more sophisticated systems that intelligence agencies presumably use may of course be less prone to such confusion-but I suspect that this vulnerability to child and sibling spoofing remains. And I doubt that the U.S. government has a recording of Saad to use for comparison.

Here is my scenario:

Osama bin Laden was killed at Tora Bora-or his dialysis machine was destroyed and he died shortly afterwards. The strongest evidence for this is the absence of new videos. Al Qaeda fears that news of his death will shock and discourage many of its supporters. There is no other leader who can hold together this diverse and contentious organization, so they believe that they need to keep the news secret. The initial tapes they released were old recordings of former speeches. But many supporters were concerned. They, like me, noticed the absence of videos, and of speeches with clear date indicators. Al Qaeda knew a video counterfeit would be detected, but they noticed that Saad sounded a lot like his father. They had him listen to his father’s speeches, and practice enunciating them with a similar style. It took many attempts, but Saad’s voice on the final tape was good enough to deceive not only al Qaeda’s foreign legions, but even some analysts at the CIA.

And if my personal experience is indicative, the tapes may even have fooled one or more of Osama bin Laden’s wives.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.