Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo


Unsupported browser: Your browser does not meet modern web standards. See how it scores »

{ action.text }

At Carnegie Mellon, the Bakers discovered that their approach to speech recognition was way out of step with the mainstream. At the time, many AI researchers believed a machine could recognize spoken sentences only if it could first understand a great deal of context, including who the speaker was, what the speaker knew and what the speaker might be trying to say, as well as the rules of English grammar. In other words, to recognize speech, a machine would have to be quite intelligent.

The Bakers tried a completely different tack. Building on Jim’s experience with Markov Models, they created a program that operated in a purely statistical realm. First, they began to calculate the probability that any two words or three words would appear one after the other in English. Then they created a phonetic dictionary with the sounds of those word groups. The next step was an algorithm to decipher a string of spoken words based not only on a good sound match, but also according to the probability that someone would speak them in that order. The system had no knowledge of English grammar, no knowledge base, no rule-based expert system, no intelligence. Nothing but numbers.

“It was a very heretical and radical idea,” says Janet. “A lot of people said, That’s not speech or language, that’s mathematics! That’s something else!’ “

Although the Bakers’ thinking met with widespread skepticism, says Victor Zue, associate director of MIT’s Laboratory for Computer Science and a fellow speech research pioneer, “time has proved [the Bakers] to be correct in pursuing this kind of approach.” Indeed, the Bakers’ system, which they named “Dragon” after the creature that graced their china set, soon began to consistently out-perform competing methods.

When the Bakers received their doctorates from Carnegie Mellon in 1975, their pioneering work soon landed them both jobs at IBM’s Thomas J. Watson Research Center, outside New York City. At the time, IBM was one of the only organizations working in large vocabulary, continuous speech recognition. “We didn’t go to [IBM] and say, You have to hire both of us,’” recalls Jim. “It just worked out that way.” It was, however, a pattern that would repeat itself. Today, with Jim as chairman/CEO and Janet as president of Dragon Systems, the Bakers take pride in having nearly identical resumes.

At IBM, the Bakers designed a program that could recognize continuous speech from a 1,000-word vocabulary. It was far from real time, though. Running on an IBM 370 computer, the program took roughly an hour to decode a single spoken sentence. But what frustrated the Bakers more than waiting for time on the mainframe was IBM’s refusal to test speech recognition under real-world conditions.

“IBM is an excellent research institution and we enjoyed working there,” says Janet. “But we were very eager to get things out into the marketplace and get real users.” Certainly real users couldn’t wait an hour for a computer to transcribe a sentence. But, she notes, “you could have done simpler things using much less [computer] resources.” IBM’s management felt differently, and told the Bakers they were being premature.
It was the heyday of missed opportunities at IBM (count relational databases and RISC microprocessors among the key inventions the company failed to commercialize) and in 1979 the Bakers’ frustration boiled over. The couple jumped to Verbex, a Boston-based subsidiary of Exxon Enterprises that had built a system for collecting data over the telephone via spoken digits. Jim (as newly minted vice president of advanced development) and Janet (as vice president of research) set out to make the program handle continuous speech.

But less than three years later, Exxon got out of the speech recognition business, and the Bakers were looking for work again. This time, their look-alike resumes spelled trouble-there were no jobs for either of them. The duo realized that they faced a choice: divorce themselves from speech recognition by changing fields, or set out on their own.

In 1982, with no venture capital, no business plan, two preschool-aged children and a big mortgage, the Bakers founded Dragon Systems. They ran the company from their living room, and figured their savings could last 18 months-perhaps 24 if they ate little enough.

0 comments about this story. Start the discussion »

Tagged: Business, Communications

Reprints and Permissions | Send feedback to the editor

From the Archives


Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me