Can Google Get Web Users Talking?

Voice-driven search is a futuristic idea, and may take some getting used to.

Tom Simonitearchive page

June 29, 2011

The notion of asking a computer for information out loud is familiar to most of us only from science fiction. Google is trying to change that by adding speech recognition to its search engine, and releasing technology that would allow any browser, website, or app to use the feature.

But are you ready to give up your keyboards and talk to Google instead?

Over the last two weeks, speech input for Google has gradually been rolled out to every person using Google’s Chrome browser. A microphone icon appears at the right end of the iconic search box. If you have a microphone built-in or attached to your computer, clicking that icon creates a direct audio connection to Google’s servers, which will convert your spoken words into text.

It has been possible to speak Google search queries using a smart phone for almost three years; since last year, Android handsets have been able to take voice input in any situation where a keyboard would normally be used. “That was transformational, because people stopped worrying about when they could and couldn’t speak to the phone,” says Vincent Vanhoucke, who leads the voice search engineering team at Google. Over the last 12 months, the number of spoken inputs, search or otherwise, via Android devices has climbed six times, and every day, tens of thousands of hours of audio speech are fed into Google’s servers. “On Android, a large fraction of the use is people dictating e-mail and SMS,” says Vanhoucke.

Vanhoucke’s team now wants using voice on the Web to be as easy as it is on Android. “It’s a big bet,” he says. “Voice search for desktop is the flagship for this, [but] we want to take speech everywhere.”

Voice recognition is more technically challenging on a desktop or laptop computer, says Vanhoucke, because it requires noise suppression algorithms that are not needed for mobile speech recognition. These algorithms filter out sounds such as those of a computer’s fan or air conditioners. “The quality of the audio is paramount for phone manufacturers, and you hold it close to your mouth,” says Vanhoucke. “On a PC, the microphone is an afterthought, and you are further away. You don’t get the best quality.”

Google asked thousands of people to read phrases aloud to their computers to gather data on the conditions its speech recognition technology would have to handle. As people use the service for real, it is trained further, says Vanhoucke, which should increase its popularity. Data from users of mobile voice search shows that people are much more likely to use the feature again when it is accurate for them the first time.

A bigger challenge to getting users to embrace voice recognition on the desktop could be the existing tools for entering information, says Keith Vertanen, a lecturer at Princeton University who researches voice-recognition technology. “On the desktop, you’re up against a very fast and efficient means of input in the keyboard,” he says. “On a phone, you don’t have that available, and you are often in hands- or eyes-free situations where voice input really helps.”

Vertanen says people are less tolerant of glitches when using speech recognition on a desktop computer because of the close proximity of a tried-and-true way of entering text. He says users might find voice recognition more compelling on on other Internet-connected devices in the home. “Nonconventional devices like a DVR, television, or game console don’t usually have good text input,” he points out. Google TV devices can already take voice input spoken into a connected Android phone.

Vanhoucke acknowledges that speech recognition fulfills a more immediate need on phones, but argues that users are ready for it on conventional computers, too. “People will use it in ways that surprise us,” he says. “At this point, it’s still an experiment.” Situations when people may have their hands full is one example, says Vanhoucke (although it should be noted that desktop voice search today still involves using the mouse to activate the feature).

Google isn’t performing this experiment alone. The company is pushing the Web standards body W3C to introduce a standard set of HTML markup that allows any website or app to call on voice recognition via the Web browser, and has already enabled a version of this markup in the Chrome browser. For now, Google is the only major company with a browser able to use the prototype feature, but Mozilla, Microsoft, and AT&T are all working with the W3C effort.

“It’s a collaborative effort that other browser makers are part of,” says Vanhoucke. “Any designer can add it to their Web page. It’s something anyone can use.” Extensions for the Chrome browser that make use of voice input (like this one) have already appeared, and can be used to enter text on any website.

However, those extensions reveal that although Google’s desktop speech recognition is accurate for search queries, it’s not much good for tasks like composing e-mail.

Enabling the system to learn the personal quirks of each person’s pronunciation, a feature already enabled on Android phones, could address that. Vertanen points out that the personalization learned through mobile search could easily be ported over to the desktop for people logged into their Google account. It could also make it possible for the technology to spring up elsewhere. “The advantage of Google’s networked approach is that a [speech] model in the cloud can adapt to your voice in all these different places and follow you around, whether that’s in your living room or in your car.”