In an effort to make speech the dominant way that people control technology, AT&T is opening up its speech-recognition technology for others to use. Starting in June, software engineers can tap into a cloud service offered by the company to make any device that can connect to the Internet respond to its master’s voice.
AT&T believes the technology could ultimately be used for everything from smart-phone apps and online games to cars and appliances. While the initial offering will only convert speech into text, and corresponding commands, the company is considering a broader set of offerings later, including ones that translate English text into six other languages and vice versa, and can also synthesize translated speech.
“We believe there are a lot of smart people out there who can create applications and services we have never dreamt of before,” says Mazin Gilbert, vice president for intelligent systems research at AT&T Labs in Florham Park, New Jersey. To use the technology, developers write code into their software to take advantage of an API (application programming interface) specified by AT&T. That code causes an application to send speech to AT&T over the Internet, where it is converted to text and returned to the device. The new APIs were announced last week. AT&T claims the technology is 95 percent accurate in taking English speech and rendering it as text. It says its accuracy at converting the meaning of English text to and from other languages ranges from 70 percent to 80 percent.
The underlying speech technology now being offered by AT&T is already used in many of its own applications, including the AT&T translator app for Android and iOS phones, and mobile voice directory search provided by Yellow Pages. “I want to be able to have a million apps riding on our platform, not hundreds, as we have today,” Gilbert says. “Whatever your wild idea is—we want to provide those APIs. I’ll be honest: I don’t know what people are going to use it for.”
The AT&T technology builds on decades of innovation at Bell Labs prior to the breakup of AT&T and the subsequent establishment of AT&T’s own service-centric labs. However, the company must compete with more established providers of speech-recognition technology, especially in the realm of smart phones.
For example, Nuance provides speech-recognition capabilities to many companies including, reportedly, Apple for its Siri personal assistant. Google’s speech-recognition technology is offered throughout its Android smart-phone operating system, and by any app written for an Android device. Microsoft also has speech-recognition technology, which appears in its Windows Phone operating system and in products from partners such as Ford, with its Sync system for in-car entertainment.