Krish Prabhu, CEO of AT&T Labs, believes that making speech technology widely available will allow mobile computing to be more capable and grow faster. “In the context of a world where we’ve largely solved connectivity and reach problems—though there are still issues—this effort on speech comes from a conviction that the interface to the network has to get simpler,” he said at a lab demonstration in New York City last week. “We are trying to pave the way so that technology is not the thing that stops us.”
AT&T’s APIs for speech-to-text, to launch in June, consist of seven versions tailored to specific uses, such as dictating text messages, searching for local businesses, responding to questions, turning voice-mails into text, and performing general dictation. In the future, specific APIs for online games and social networks will also be added.
Later, APIs may become available that translate text between English and six other languages: Spanish, French, Italian, German, Chinese, and Japanese. Other languages, including Korean and Arabic, are in the pipeline, but AT&T will be far behind competitors. For example, Google already offers developers tools that can translate between any of over a thousand language pairs.
Gilbert says the use of all the APIs would carry a $99 registration fee for 2012, and that post-2012 plans were not public. Google charges for its own translation APIs.
Improving the accuracy of speech-recognition or translation software requires getting more example data to train the underlying algorithms. To help that process, AT&T could eventually solicit feedback from people using products that have its speech and translation technology built-in. “Crowdsourcing would enable this to reach much higher levels of accuracy, and this would, in turn, drive broader adoption and much happier users,” says Sam Ramji, a computer scientist who is vice president for strategy at Apigee, which builds API platforms and is working on the AT&T project.
Ramji believes that making good speech-recognition technology easily available could slowly make traditional menus and text-driven interfaces extinct. “Today’s user interfaces are like trees that we have to navigate to reflect the structure of the program. What should happen is that devices parse the command coming out of our mouths,” he says.