Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

Krish Prabhu, CEO of AT&T Labs, believes that making speech technology widely available will allow mobile computing to be more capable and grow faster. “In the context of a world where we’ve largely solved connectivity and reach problems—though there are still issues—this effort on speech comes from a conviction that the interface to the network has to get simpler,” he said at a lab demonstration in New York City last week. “We are trying to pave the way so that technology is not the thing that stops us.”

AT&T’s APIs for speech-to-text, to launch in June, consist of seven versions tailored to specific uses, such as dictating text messages, searching for local businesses, responding to questions, turning voice-mails into text, and performing general dictation. In the future, specific APIs for online games and social networks will also be added.

Later, APIs may become available that translate text between English and six other languages: Spanish, French, Italian, German, Chinese, and Japanese. Other languages, including Korean and Arabic, are in the pipeline, but AT&T will be far behind competitors. For example, Google already offers developers tools that can translate between any of over a thousand language pairs.

Gilbert says the use of all the APIs would carry a $99 registration fee for 2012, and that post-2012 plans were not public. Google charges for its own translation APIs.

Improving the accuracy of speech-recognition or translation software requires getting more example data to train the underlying algorithms. To help that process, AT&T could eventually solicit feedback from people using products that have its speech and translation technology built-in. “Crowdsourcing would enable this to reach much higher levels of accuracy, and this would, in turn, drive broader adoption and much happier users,” says Sam Ramji, a computer scientist who is vice president for strategy at Apigee, which builds API platforms and is working on the AT&T project.

Ramji believes that making good speech-recognition technology easily available could slowly make traditional menus and text-driven interfaces extinct. “Today’s user interfaces are like trees that we have to navigate to reflect the structure of the program. What should happen is that devices parse the command coming out of our mouths,” he says.

5 comments. Share your thoughts »

Credit: AT&T Labs

Tagged: Computing, Apple, AT&T, Nuance

Reprints and Permissions | Send feedback to the editor

From the Archives

Close

Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me