There are 16 possible four-bit messages, and Shannon’s method would assign each of them its own randomly selected eight-bit serial number–its “code word.” The receiver, like the sender, would have a codebook correlating the 16 possible four-bit messages with the 16 random eight-bit code words. Since there are 256 possible sequences of eight bits, there are 240 that don’t appear in the codebook. Someone who receives one of those 240 sequences will know that an error has crept into the data. But as long as the 16 permitted code words are different enough from each other, there’s likely to be only one that comes closest to the corrupted sequence. For instance, if 00000001 and 11111110 are both valid code words but 00000011 is not, then someone who receives the sequence 00000011 can conclude that the intended code word was much more likely to be 00000001 than 11111110.
In real life, of course, no one is worried about transmitting messages of only four bits. But by using statistical analysis, Shannon was able to draw conclusions about encoded messages of any length, sent over channels with any amount of noise. In particular, he was able to rigorously quantify both the degree of difference between randomly selected code words and the likelihood that a corrupted sequence would resemble only one of them. While the probability that two eight-bit sequences will be similar is relatively high, Shannon showed that as code words get longer, the chances of similarity decrease exponentially. In fact, one of his most startling results was that for long messages, most randomly assigned code words will be almost as different from one another as it’s possible for them to be. That means that almost any coding scheme–any way of generating those words–would allow error-free transmission across a noisy channel at near the maximum rate.
“It took a lot of intuition to think that a perfectly random code might be a pretty good code on average,” says David Forney, SM ‘63, ScD ‘65, a former vice president of the Codex Corporation and Motorola who returned to MIT in 1996 as an adjunct professor. “It turns out that that drastically simplifies the analysis, because now you can do an average-case analysis.” Forney pauses for a moment, then adds, “Not to say that it was totally simple: he had to invent a few theorems at least, if not branches of mathematics.” But Gallager agrees. Of Shannon’s 1948 paper, he says, “After you study it for two years, it seems very simple. So many people will tell you, ‘It’s really very simple.’ And after you understand it, it is.”
An Irresistible Challenge
Shannon’s mathematical description of information had many ramifications. His 1948 paper also introduced the idea of data compression, or representing the same information with fewer bits; compression is what lets programs like WinZip or StuffIt shrink files down so that they don’t overwhelm e-mail servers, and it’s used to save space in disk drives. Information theory also put the study of cryptography on a more secure mathematical footing; indeed, Gallager believes that it was Shannon’s wartime cryptographic work at Bell Labs that led him to his novel reconception of communication.
By the time Shannon returned to MIT, however, he had begun to feel that the enthusiasm surrounding his theory exceeded even its considerable merits. In a 1956 article called “The Bandwagon,” he cited attempts to apply information theory to fields such as “biology, psychology, linguistics, fundamental physics, economics, the theory of organization, and many others” and undertook to “inject a note of moderation in this situation.”