Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo


Unsupported browser: Your browser does not meet modern web standards. See how it scores »

{ action.text }

Speed is not the only advantage that the Fast-Talk method has over alternative methods. With speech recognition, says Clements, “your goal is to take audio that’s input speech and either recognize what was spoken according to some very constrained grammar, or you can use a natural language approach and try to find the sequence of words that is most probable.” If the term is not in a system’s spoken lexicon, however, or if you are uncertain about the sequence of words in text form you’re looking for, the search may prove fruitless. By contrast, Clements says, the Fast-Talk approach “processes the speech in such a way that you can later go back and search it very efficiently for any set of sounds-they don’t have to have any lexical existence at all.”

Fenn believes this is a much more versatile approach. Fast-Talk, she says, is “focusing on the pure audio aspect” of spoken communications, which she says “seems to give them a faster algorithm and greater flexibility in dealing with new items that aren’t in the vocabulary.” Fast-Talk also handles accents well. The key is having the software practice recognizing phonemes in the accent with which you’ll be dealing. For instance, a system trained by a speaker from Canada would transcribe the sound “hoos” into the word “house.”

Fast-Talk’s software does not, however, employ certain strategies that natural language queries might-leading to some shortcomings. Proximity searches-for instance, where a natural language tool would recognize a word like “Georgia,” because it usually occurs right after the word, “Atlanta”-are not possible. Also, since Fast-Talk recognizes by sound, not spelling, it cannot distinguish homonyms. “We would not be able to tell the difference between the word, discreet’ as in cautious versus discrete’ as in individual items,” says Clements. Another disadvantage, cautions Fenn, is that the system is not amenable to text mining. “If you wanted to look for patterns and clusters of related concepts, you couldn’t. You’d have to have a transcript.”

Those qualifications aside, however, the technology appears to offer significant benefit in several applications. Television and radio networks have thousands of hours of programming but no fast way to index and negotiate them. “If you want to find where, say, an NPR news account talked about a panda,” says Clements, “it just takes forever to do that right now.” Another potentially hot use is in call centers. Rasmus says “call centers want to know if anyone had a conversation about X kind of product. Looking at those voice recordings as a means for getting information to somebody at a call center who’s trying to help a client is incredibly time saving, he says. Office workers might ultimately find the technology useful too, Clements says. “Imagine that you have all your voicemail as audio files interleaved with all of your e-mail,” he says. “Our tool would make it so you could manage it.”

0 comments about this story. Start the discussion »

Tagged: Web

Reprints and Permissions | Send feedback to the editor

From the Archives


Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me