Inspired by the success of voice recognition software on mobile phones, Nuance hope to put its speech interfaces in many more places, most notably the television and the automobile. Both are popular and ripe for innovation.
To find a show on TV, or to schedule a DVR recording, viewers currently have to navigate awkward menus using a remote that was never designed for keying in text queries. Products that were supposed to make finding a show easier, such as Google TV, have proved too complex for people who just want to relax for an evening’s entertainment.
At Nuance’s research labs, Sejnoha demonstrated software called Dragon TV running on a television in a mocked-up living room. When a colleague said, “Dragon TV, find movies starring Meryl Streep,” the interface instantly scanned through channel listings to select several appropriate movies. A version of this technology is already in some televisions sold by Samsung.
Apple is widely rumored to be developing its own television, and it’s speculated that Siri will be its controller. The idea has been fueled by Walter Isaacson’s biography of Steve Jobs, in which the late CEO is said to have claimed that he’d “finally solved” the TV interface.
Meanwhile, the Sync entertainment system in Ford automobiles already uses Nuance’s technology to let drivers pull up directions, weather information, and songs. About four million Ford cars on the road have Sync with voice recognition. Last week, Nuance introduced software called Dragon Drive that will let other car manufacturers add voice-control features to vehicles.
Both these new contexts are challenging. One reason voice interfaces have become popular on smart phones is that users speak directly into the device’s microphone. To ensure that the system works well in televisions and cars, where there is more background noise, the company is experimenting with array microphones and noise-canceling technology.
Nuance makes a number of software development kits available to anyone who wants to include voice recognition technology in an application. Montrue Technologies, a company based in Ashland, Oregon, used Nuance’s mobile medical SDK to develop an iPad app that lets physicians dictate notes.
“It’s astonishingly accurate,” says Brian Phelps, CEO and cofounder of Montrue and himself an ER doctor. “Speech has turned a corner; it’s gotten to a point where we’re getting incredible accuracy right out of the box.”
In turn, the kits shore up Nuance’s position, helping the company improve its voice recognition and language processing algorithms by sending ever more voice data through its servers. As MIT’s Glass says, “there has been a long-time saying in the speech-recognition community: ‘There’s no data like more data’.” Nuance says it stores the data in an anonymous format to protect privacy.
Sejnoha believes that within a few years, mobile voice interfaces will be much more pervasive and powerful. “I should just be able to talk to it without touching it,” he says. “It will constantly be listening for trigger words, and will just do it—pop up a calendar, or ready a text message, or a browser that’s navigated to where you want to go.”
Perhaps people will even speak to computers they wear, like the photo-snapping eyeglasses in development at Google. Sources at Nuance say they are actively planning how speech technology would have to be architected to run on wearable computers.