Claude Shannon: Reluctant Father of the Digital Age

A juggling unicyclist transformed “information” from a vague idea into a precise concept that underlies the digital revolution.

MIT Technology Review Editorsarchive page

July 1, 2001

Pick up a favorite CD. Now drop it on the floor. Smear it with your fingerprints. Then slide it into the slot on the player-and listen as the music comes out just as crystal clear as the day you first opened the plastic case. Before moving on with the rest of your day, give a moment of thought to the man whose revolutionary ideas made this miracle possible: Claude Elwood Shannon.

Shannon, who died in February after a long illness, was one of the greatest of the giants who created the information age. John von Neumann, Alan Turing and many other visionaries gave us computers that could process information. But it was Claude Shannon who gave us the modern concept of information-an intellectual leap that earns him a place on whatever high-tech equivalent of Mount Rushmore is one day established.

The entire science of information theory grew out of one electrifying paper that Shannon published in 1948, when he was a 32-year-old researcher at Bell Laboratories. Shannon showed how the once-vague notion of information could be defined and quantified with absolute precision. He demonstrated the essential unity of all information media, pointing out that text, telephone signals, radio waves, pictures, film and every other mode of communication could be encoded in the universal language of binary digits, or bits-a term that his article was the first to use in print. Shannon laid forth the idea that once information became digital, it could be transmitted without error. This was a breathtaking conceptual leap that led directly to such familiar and robust objects as CDs. Shannon had written “a blueprint for the digital age,” says MIT information theorist Robert Gallager, who is still awed by the 1948 paper.

And that’s not even counting the master’s dissertation Shannon had written 10 years earlier-the one where he articulated the principles behind all modern computers. “Claude did so much in enabling modern technology that it’s hard to know where to start and end,” says Gallager, who worked with Shannon in the 1960s. “He had this amazing clarity of vision. Einstein had it, too-this ability to take on a complicated problem and find the right way to look at it, so that things become very simple.”

Tinkering toward Tomorrow

For Shannon, it was all just another way to have fun. “Claude loved to laugh, and to dream up things that were offbeat,” says retired Bell Labs mathematician David Slepian, who was a collaborator of Shannon’s in the 1950s. Shannon went at math like a stage magician practicing his sleight of hand: “He would circle around and attack the problem from a direction you never would have thought of,” says Slepian-only to astonish you with an answer that had been right in front of your face all the time. But then, Shannon also had a large repertoire of real card tricks. He taught himself to ride a unicycle and became famous for riding it down the Bell Labs hallways at night-while juggling. (“He had been a gymnast in college, so he was better at it than you might have thought,” says his wife Betty, who gave him the cycle as a Christmas present in 1949.)

At home, Shannon spent his spare time building all manner of bizarre machines. There was the Throbac (THrifty ROman-numerical BAckward-looking Computer), a calculator that did arithmetic with Roman numerals. There was Theseus, a life-sized mechanical mouse that could find its way through a maze. And perhaps most famously, there was the “Ultimate Machine”- a box with a large switch on the side. Turn the switch on, and the lid would slowly rise, revealing a mechanical hand that would reach down, turn the switch off, and withdraw-leaving the box just as it was.

“I was always interested in building things with funny motions,” Shannon explained in a 1987 interview with Omni magazine (one of the few times he spoke about his life publicly). In his northern Michigan hometown of Gaylord, he recalled, he spent his early years putting together model planes, radio circuits, a radio-controlled model boat and even a telegraph system. And when he entered the University of Michigan in 1932, he had no hesitation about majoring in electrical engineering.

After graduating in 1936, Shannon went directly to MIT to take up a work-study position he had seen advertised on a postcard tacked to a campus bulletin board. He was to spend half his time pursuing a master’s degree in electrical engineering and the other half working as a laboratory assistant for computer pioneer Vannevar Bush, MIT’s vice president and dean of engineering. Bush gave Shannon responsibility for the Differential Analyzer, an elaborate system of gears, pulleys and rods that took up most of a large room-and that was arguably the mightiest computing machine on the planet at the time (see “Computing After Silicon,” TR May/June 2000).

Conceived by Bush and his students in the late 1920s, and completed in 1931, the Differential Analyzer was an analog computer. It didn’t represent mathematical variables with ones and zeroes, as digital computers do, but by a continuous range of values: the physical rotation of the rods. Shannon’s job was to help visiting scientists “program” their problems on the analyzer by rearranging the mechanical linkages between the rods so that their motions would correspond to the appropriate mathematical equations.

Shannon couldn’t have asked for a job more suited to his love of funny motions. He was especially drawn to the analyzer’s wonderfully complicated control circuit, which consisted of about a hundred “relays”-switches that could be automatically opened and closed by an electromagnet. But what particularly intrigued him was how closely the relays’ operation resembled the workings of symbolic logic, a subject he had just studied during his senior year at Michigan. Each switch was either closed or open-a choice that corresponded
exactly to the binary choice in logic, where a statement was either true or false. Moreover, Shannon quickly realized that switches combined in circuits could carry out standard operations of symbolic logic. The analogy apparently had never been recognized before. So Shannon made it the subject of his master’s thesis, and spent most of 1937 working out the implications. He later told an interviewer that he “had more fun doing that than anything else in my life.”

True or False?

Certainly his dissertation, “A Symbolic Analysis of Relay and Switching Circuits,” makes for a compelling read-especially given what’s happened in the 60-plus years since it was written. As an aside toward the end, for example, Shannon pointed out that the logical values true and false could equally well be denoted by the numerical digits 1 and 0. This reali-zation meant that the relays could perform the then arcane operations of binary arithmetic. Thus, Shannon wrote, “it is possible to perform complex mathematical operations by means of relay circuits.” As an illustration, Shannon showed the design of a circuit that could add binary numbers.

Even more importantly, Shannon realized that such a circuit could also make comparisons. He saw the possibility of a device that could take alternative courses of action according to circumstances-as in, “if the number X equals the number Y, then do operation A.” Shannon gave a simple illustration of this possibility in his thesis by showing how relay switches could be arranged to produce a lock that opened if and only if a series of buttons was pressed in the proper order.

The implications were profound: a switching circuit could decide-an ability that had once seemed unique to living beings. In the years to come, the prospect of decision-making machines would inspire the whole field of artificial intelligence, the attempt to model human thought via computer. And perhaps by no coincidence, that field would fascinate Claude Shannon for the rest of his life.

From a more immediate standpoint, though, a switching circuit’s ability to decide was what would make the digital computers that emerged after World War II something fundamentally new. It wasn’t their mathematical abilities per se that contemporaries found so startling (although the machines were certainly very fast); even in the 1940s, the world was full of electromechanical desktop calculators that could do simple additions and subtractions. The astonishing part was the new computers’ ability to operate under the control of an internal program, deciding among various alternatives and executing complex sequences of commands on their own.

All of which is why “A Symbolic Analysis of Relay and Switching Circuits,” published in 1938, has been called the most important master’s thesis of the 20th century. In his early 20s, Claude Shannon had had the insight crucial for organizing the internal operations of a modern computer-almost a decade before such computers even existed. In the intervening years, switching tech-nology has progressed from electromechanical relays to microscopic transistors etched on silicon. But to this day, microchip designers still talk and think in terms of their chips’ internal “logic”-a concept borne largely of Shannon’s work.

Perfect Information

With the encouragement of Vannevar Bush, Shannon decided to follow up his master’s degree with a doctorate in mathematics-a task that he completed in a mere year and a half. Not long after receiving this degree in the spring of 1940, he joined Bell Labs. Since U.S. entry into World War II was clearly just a matter of time, Shannon immediately went to work on military projects such as antiaircraft fire control and cryptography (code making and breaking).

Nonetheless, Shannon always found time to work on the fundamental theory of communications, a topic that had piqued his interest several years earlier. “Off and on,” Shannon had written to Bush in February 1939, in a letter now preserved in the Library of Congress archives, “I have been working on an analysis of some of the fundamental properties of general systems for the transmission of intelligence, including telephony, radio, television, telegraphy, etc.” To make progress toward that goal, he needed a way to specify what was being transmitted during the act of communication.

Building on the work of Bell Labs engineer Ralph Hartley, Shannon formulated a rigorous mathematical expression for the concept of information. At least in the simplest cases, Shannon said, the information content of a message was the number of binary ones and zeroes required to encode it. If you knew in advance that a message would convey a simple choice-yes or no, true or false-then one binary digit would suffice: a single one or a single zero told you all you needed to know. The message would thus be defined to have one unit of information. A more complicated message, on the other hand, would require more digits to encode, and would contain that much more information; think of the thousands or millions of ones and zeroes that make up a word-processing file.

As Shannon realized, this definition did have its perverse aspects. A message might carry only one binary unit of information-“Yes”-but a world of meaning-as in, “Yes, I will marry you.” But the engineers’ job was to get the data from here to there with a minimum of distortion, regardless of its content. And for that purpose, the digital definition of information was ideal, because it allowed for a precise mathematical analysis. What are the limits to a communication channel’s capacity? How much of that capacity can you use in practice? What are the most efficient ways to encode information for transmission in the inevitable presence of noise?

Judging by his comments many years later, Shannon had outlined his answers to such questions by 1943. Oddly, however, he seems to have felt no urgency about sharing those insights; some of his closest associates at the time swear they had no clue that he was working on information theory. Nor was he in any hurry to publish and thus secure credit for the work. “I was more motivated by curiosity,” he explained in his 1987 interview, adding that the process of writing for publication was “painful.” Ultimately, however, Shannon overcame his reluctance. The result: the groundbreaking paper “A Mathematical Theory of Communication,” which appeared in the July and October 1948 issues of the Bell System Technical Journal.

Shannon’s ideas exploded with the force of a bomb. “It was like a bolt out of the blue,” recalls John Pierce, who was one of Shannon’s best friends at Bell Labs, and yet as surprised by Shannon’s paper as anyone. “I don’t know of any other theory that came in a complete form like that, with very few antecedents or history.” Indeed, there was something about this notion of quantifying information that fired peoples’ imaginations. “It was a revelation,” says Oliver Selfridge, who was then a graduate student at MIT. “Around MIT the reaction was, Brilliant! Why didn’t I think of that?’”

Much of the power of Shannon’s idea lay in its unification of what had been a diverse bunch of technologies. “Until then, communication wasn’t a unified science,” says MIT’s Gallager. “There was one medium for voice transmission, another medium for radio, still others for data. Claude showed that all communication was fundamentally the same-and furthermore, that you could take any source and represent it by digital data.”

That insight alone would have made Shannon’s paper one of the great analyti-cal achievements of the 20th century. But there was more. Suppose you were trying to send, say, a birthday greeting down a telegraph line, or through a wireless link, or even in the U.S. mail. Shannon was able to show that any such communication channel had a speed limit, measured in binary digits per second. The bad news was that above that speed limit, perfect fidelity was impossible: no matter how cleverly you encoded your message and compressed it, you simply could not make it go faster without throwing some information away.

The mind-blowing good news, however, was that below this speed limit, the transmission was potentially perfect. Not just very good: perfect. Shannon gave a mathematical proof that there had to exist codes that would get you right up to the limit without losing any information at all. Moreover, he demonstrated, perfect transmission would be possible no matter how much static and distortion there might be in the communication channel, and no matter how faint the signal might be. Of course, you might need to encode each letter or pixel with a huge number of bits to guarantee that enough of them would get through. And you might have to devise all kinds of fancy error-correcting schemes so that corrupted parts of the message could be reconstructed at the other end. And yes, in practice the codes would eventually get so long and the communication so slow that you would have to give up and let the noise win. But in principle, you could make the probability of error as close to zero as you wanted.

This “fundamental theorem” of information theory, as Shannon called it, had surprised even him when he discovered it. The conquest of noise seemed to violate all common sense. But for his contemporaries in 1948, seeing the theorem for the first time, the effect was electrifying. “To make the chance of error as small as you wish? Nobody had ever thought of that,” marvels MIT’s Robert Fano, who became a leading information theorist himself in the 1950s-and who still has a reverential photograph of Shannon hanging in his office. “How he got that insight, how he even came to believe such a thing, I don’t know. But almost all modern communication engineering is based on that work.”

Shannon’s work “hangs over everything we do,” agrees Robert Lucky, corporate vice president of applied research at Telcordia, the Bell Labs spinoff previously known as Bellcore. Indeed, he notes, Shannon’s fundamental theorem has served as an ideal and a challenge for succeeding generations. “For 50 years, people have worked to get to the channel capacity he said was possible. Only recently have we gotten close. His influence was profound.”

And, Lucky adds, Shannon’s work inspired the development of “all our modern error-correcting codes and data-compression algorithms.” In other words: no Shannon, no Napster.

Shannon’s theorem explains how we can casually toss around compact discs in a way that no one would have dared with long-playing vinyl records: those error-correcting codes allow the CD player to practically eliminate noise due to scratches and fingerprints before we ever hear it. Shannon’s theorem likewise explains how computer modems can transmit compressed data at tens of thousands of bits per second over ordinary, noise-ridden telephone lines. It explains how NASA scientists were able to get imagery of the planet Neptune back to Earth across three billion kilometers of interplanetary space. And it goes a long way toward explaining why the word “digital” has become synonymous with the highest possible standard in data quality.

Switching Off

The accolades for Shannon’s work were quick in coming. Warren Weaver, director of the Rockefeller Foundation’s Natural Sciences Division, declared that information theory encompassed “all of the procedures by which one mind may affect another,” including “not only written and oral speech, but also music, the pictorial arts, the theatre, the ballet, and in fact all human behavior.” Fortune magazine could barely contain its enthusiasm, dubbing information theory one of man’s “proudest and rarest creations, a great scientific theory which could profoundly and rapidly alter man’s view of the world.” Shannon himself soon had to set aside an entire room in his home just to hold all his citations, plaques and testimonials.

Within a year or two of his paper’s publication, however, Shannon was horrified to find that information theory was becoming-well, popular. People were saying ridiculous things about the amount of information coming out of the sun, or even the information content of noise. Scientists were submitting grant applications that referred to “information theory” whether their proposals had anything to do with it or not. “Information theory” was becoming a buzzword, much as “artificial intelligence,” “chaos” and “complexity” would in the 1980s and 1990s. And Shannon hated it. In a 1956 paper entitled “The Bandwagon,” in the journal Transactions on Information Theory, he declared that information theory was being greatly oversold. “It has perhaps ballooned to an importance beyond its actual accomplishments,” he wrote.

Rather than continue to fight what he knew was a losing battle, Shannon dropped out. Although he continued, for a time, his research on information theory, he turned down almost all the endless invitations to lecture, or to give newspaper interviews; he didn’t want to be a celebrity. He likewise quit responding to much of his mail. Correspondence from major figures in science and government ended up forgotten and unanswered in a file folder he labeled “Letters I’ve procrastinated too long on.” As the years went by, in fact, Shannon started to withdraw not just from the public eye but from the research community-an attitude that worried his colleagues at MIT, who had hired him away from Bell Labs in 1958. “He wrote beautiful papers-when he wrote,” says MIT’s Fano. “And he gave beautiful talks-when he gave a talk. But he hated to do it.”

From time to time, Shannon did continue to publish. A notable example, before he became too horrified by his celebrity and withdrew more completely, was a seminal 1950 article for Scientific American describing how a computer might be programmed to play chess. But he slowly faded from the academic scene, recalls Peter Elias, another leader of the MIT information theory group. “Claude’s vision of teaching was to give a series of talks on research that no one else knew about. But that pace was very demanding; in effect, he was coming up with a research paper every week.” By the mid-1960s, Elias recalls, Shannon had stopped teaching.

After his official retirement in 1978, at age 62, Shannon happily withdrew to his home in the Boston suburb of Winchester, MA. Money was not a concern; thanks to his knowledge of the high-tech industries springing up around Boston’s Route 128, he had made some canny investments in the stock market. Nor did there seem to be any diminution of his ingenuity. “He still built things!” remembers Betty Shannon with a laugh. “One was a…figure of W. C. Fields that bounced three balls on a drumhead. It made a heck of a noise, let me tell you!”
Nonetheless, there came a time around 1985 when he and Betty began to notice certain lapses. He would go for a drive and forget how to get home. By 1992, when the Institute of Electrical and Electronics Engineers was preparing to publish his collected papers, Shannon was disturbed to realize that he couldn’t remember writing many of them. And by mid-1993, with his condition becoming apparent to everyone, the family confirmed what many had begun to suspect: Claude Shannon had Alzheimer’s disease. Later that year his family reluctantly placed him in a nursing home.

In 1998, when his hometown of Gaylord, MI, commemorated the 50th anniversary of information theory by unveiling a bust of its creator in a city park, Betty Shannon thanked the town in his stead. Physically, she says, he was fine almost until the end, when everything seemed to collapse at once. But on February 24, just two months shy of Shannon’s 85th birthday, the end did come. “The response to his death has been overwhelming,” she says. “I think it would have astounded him.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.