Voice Recognition for the Internet of Things

With natural-language processing aided by crowdsourced data, Wit.ai aims to make smartphones, wearables, and drones heed your call.

Rachel Metzarchive page

October 24, 2014

It’s not unusual to find yourself talking to an uncoöperative appliance or gadget. Soon, though, it could soon be more common for those devices to actually pay attention.

A startup called Wit.ai plans to make it easy for hardware makers and software developers to add custom voice controls to everything from smartphones and smart watches to Internet-connected thermostats and drones.

While big companies like Apple and Google have their own voice recognition technology, smaller companies and independent developers don’t have the deep pockets required to create voice software that continuously learns from mountains of data.

Wit.ai, based in Palo Alto, California, is taking aim at the swiftly growing number of devices with small displays, or no screen at all, and at activities like driving and cooking, where you may want the aid of technology but don’t want to look at or touch a display.

And to give all kinds of developers access to a simple-to-use, always-learning natural-language service, the company is offering it free to those who agree to share their user data with the Wit.ai community. Collecting this data should help improve the accuracy of the system over time.

“Everyone will benefit from that,” cofounder and CEO Alex Lebrun says.

Lebrun has been thinking about how to make something like Wit.ai work for a while. He previously founded and led VirtuOz, a company that spent months building Siri-like voice-controlled software for clients like eBay and AT&T (bought by the speech recognition company Nuance in late 2012, these days it goes by the name Nina Web).

With Wit.ai, developers type a handful of plain-English commands they want it to recognize, such as “Wake me up tomorrow at 6” or “Wake me up in 20 minutes,” and note what they want to accomplish through each command—in this case, set the alarm on a hypothetical voice-controlled smart watch. Wit.ai uses what it knows about language to figure out the different ways a command might be expressed. Then, when a user wants to set the alarm for a specific time, that person’s utterances are sent to a Wit.ai server, which analyzes the audio and sends structured data back to the gadget—here, the instruction to set the alarm for the proper date and time. A demo on the company’s site gives an idea of how this can work. Already, about 4,600 developers are using Wit.ai with things like mobile apps, robots, home automation, and wearable devices.

Nick Mostowich, a student at the University of Waterloo in Ontario, is one of them. At a hackathon last month at his school, Mostowich and his team used Wit.ai to add voice control to a toaster and microwave. Mostowich says they quickly put together a set of commands and targets that could be mapped to a list of recipes on a remote server, so a user could say something like “Cook me some bacon” and the microwave would turn itself on, set to the right power level and time.

Voice-powered bacon-nuking aside, there are still plenty of obstacles for Wit.ai to overcome. Like many similar systems that rely on the cloud, such as Siri, it’s not as quick to respond as it could be, and it can’t work if you don’t have an Internet connection. And while Lebrun says Wit.ai can also be used to varying extents in Spanish, French, German, Italian, and Swedish, it’s still far better in English.

Lebrun believes that as more data is added to the system, the non-English languages will improve. And he hopes to enable developers to use Wit.ai online to build and train voice interactions and then download it so it can be used on, say, a smartphone, without needing an Internet connection. Instead, it could just occasionally check in with Wit.ai’s servers to update its learning.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.