Cell phones and wireless PDAs have one perennial problem: either no keyboard or a very small one. That makes typing anything more than a phone number a tedious, fumbling task. But a solution is on the way: mobile devices that are adept at recognizing spoken language.
Some cell phones already use speech recognition as an alternative to keypad entry for simple tasks such as dialing a number, but someday soon you may also find yourself dictating a text message into your phone, asking your car for directions, or telling your MP3 player that you want to listen to the Beatles. Indeed, today’s high-end cell phones are capable of running sophisticated speech recognition software that could eventually mean the end of pecking at keyboards. “The fundamental problem of inputting information into mobile devices is the interface, and voice overcomes that,” says Rich Geruson, CEO of VoiceSignal, a speech technology company based in Woburn, MA.
While companies like IBM and Dragon Systems (now part of Peabody, MA-based ScanSoft) have been selling desktop speech recognition software for more than a decade, mobile devices with even limited speech recognition abilities appeared only several years ago. And until now, such devices have largely been “speaker dependent” – meaning they work well only for their principal users and have to be trained to recognize individual words.
Faster processors and more efficient software, however, are enabling new speaker-independent systems that can recognize the speech of any user and require no training. These systems can discern thousands, rather than dozens, of names and are designed to work even when the speaker is in a noisy environment, such as the front seat of a speeding car.
For the engineers at VoiceSignal, the key to this advance was a shift in focus from accuracy to efficiency. The highly accurate speech recognition algorithms designed for desktop computers are too complex to run on mobile devices. Traditional algorithms for mobile devices required less processing power, but because they worked by matching the sound wave of an entire word to a sound wave stored in a device’s memory, they were limited to a small vocabulary.
Instead of storing an entire sound wave for each word in its lexicon, VoiceSignal’s new system stores information about phonemes – the smallest units of recognizable speech. Every phoneme can be described according to a set of acoustic parameters, such as pitch. The software measures a user’s utterances along these parameters and then looks for words that match. Parameter values take up less memory than audio files, so the software can handle a larger vocabulary without requiring any additional storage space.
And that’s opening up applications beyond simple voice dialing. For example, VoiceSignal offers software that lets users jump to any node of a cell phone’s menu with a single utterance. “If you try to send a [text] message on your phone right now, you have to do about ten clicks just to get to the message space,” says Geruson. “With our technology, you just say, ‘Send message to John Smith’s mobile,’ and your cursor is flashing and ready to go.” The first phone with this capability was released in August. Within six months the company also plans to release software for phones that lets users dictate text messages and e-mails, which Geruson anticipates will be particularly useful in Asia. “If you think it’s hard to input into a keyboard in Western or Latin languages, think about the problem in Japan or China, where you have thousands of characters,” he says.