Google is mum on how – if at all – it plans to use its recently granted patent for a voice-enabled search engine – despite the fact that it has also hired several speech-recognition researchers.
Originally filed in February 2001, the patent was granted for “Voice interface for a search engine,” in a move that likely signals some level of development at Google on voice technology for searching the Web using handhelds. Further fueling speculation is Google’s poaching of several speech-recognition specialists – the kind of move that often signals that a new product is afoot.
“They’ve put together a very strong group of people who are experts in speech-recognition technology,” says Nelson Morgan, director of the International Computer Science Institute, which is affiliated with the University of California at Berkeley. He explains that Google picked up a number of engineers from Nuance, “the big gun in speech-recognition technology,” which recently lost researchers after it was purchased by Scansoft, a speech-technology company (the combined company is called Nuance).
Investing in voice-search technology is a prescient move by Google, since the mobile Web appears to be at a tipping point. Globally, 28 percent of all mobile-phone users now surf the Web (up three percent over last year), according to Ipsos Insight, a Paris-based market research firm. In the United States, three out of every four homes now have mobile phones, and increasingly those phones are being used for more than just talking. In 2005, over half (52 percent) of all mobile-phone users had either sent or received a text message and 37 percent had sent or received an e-mail message.
Despite this growth, though, searching the Internet on handhelds can be maddening because of tiny keyboards and unfriendly designs, which require, at best, a stylus or, at worst, a series of clicks through numerous menu screens. Speech recognition software may provide a better way. And a handful of companies are exploring the technology. San Diego’s V-Enable offers voice search services to help users find ring-tone listings, and Promptu of Menlo Park, CA plans to offer similar services this summer.
At their most fundamental, speech recognition systems are composed of three major functions. First, words are captured and translated into a digital signal. Then a speech-recognition algorithm compares those signals to words and phrases from a pre-set dictionary, of, say, ring-tone options or movie listings. Finally, the software offers the most likely match for the spoken phrase.