Early adopters can now get a sneak peek at the future of the Web by downloading the latest prerelease, or “beta,” version of Chrome, Google’s Web browser. One of the most interesting new features is an ability to translate speech to text—entirely via the Web.
The feature is the result of work Google has been doing with the World Wide Web Consortium’s HTML Speech Incubator Group, the mission of which is “to determine the feasibility of integrating speech technology in HTML5,” the Web’s new, emerging standard language.
A Web page employing the new HTML5 feature could have an icon that, when clicked, initiates a recording through the computer’s microphone, via the browser. Speech is captured and sent to Google’s servers for transcription, and the resulting text is sent back to the website.
To experiment with the voice-to-text feature, download the latest beta version of Chrome here. Then go to this webpage, click on the microphone, and start talking. You’ll probably find the results mixed, and sometimes hilarious. Using the finest elocution I could muster, I read the opening passage of Richard Yates’s Revolutionary Road: “The final dying sounds of their dress rehearsal left the Laurel Players with nothing to do but stand there, silent and helpless.” I got error messages several times in a row (“speech not recognized” or “connection to speech servers failed”). Once, I received this transcription: “9 sounds good restaurants on the world there’s nothing to do with fam vans island.”
The new feature derives in large part from experiments Google conducted through its Android operating system for mobile devices. For more than a year, says Vincent Vanhoucke, a member of Google’s voice recognition team, Android app developers have been able to integrate voice recognition into their apps using technology provided by Google. This has provided Google with useful voice data with which to train its voice-recognition algorithms. Today, some 20 percent of searches on Android phones are conducted using voice recognition, says Vanhoucke: people use voice recognition to write texts, send emails, or conduct searches. “It has really opened up interesting new avenues,” says Vanhoucke.
However, unlike desktop voice-to-text software, which first accustoms itself to a user’s voice, Chrome is trying to churn out text from voice without prior training.