One day your voice will control all your gadgets, and they will control you

Karen Haoarchive page

January 11, 2019

Everything you own in the future will be controlled by your voice. That’s what this year’s CES, the world’s largest annual gadget bonanza, has made abundantly clear.

Google and Amazon have been in fierce competition to put their assistants into your TV, your car, and even your bathroom. It all came to a head this week in Las Vegas, where the full line-up of voice-enabled products underscored the scope of each company’s ambitions.

Maybe it seems like a wasteful side effect of capitalism that you can now ask Alexa to lift your toilet cover (or maybe not—you do you), but there’s more to the ubiquity of voice interfaces than a never-ending series of hardware companies jumping on the bandwagon.

It’s tied to an idea that leading AI expert Kai-Fu Lee calls OMO, online-merge-of-offline. OMO, as he describes it, refers to combining our digital and physical worlds in such a way that every object in our surrounding environment will become an interaction point for the internet—as well as a sensor that collects data about our lives. This will power what he dubs the “third wave” of AI: our algorithms, finally given a comprehensive view of all our behaviors, will be able to hyper-personalize our experiences, whether in the grocery store or the classroom.

But this vision requires everything to be connected. It requires your shopping cart to know what’s in your fridge so it can recommend the optimal shopping list. It requires your front door to know your online purchases and whether you’re waiting for an in-home delivery. That’s where voice interfaces come in: installing Alexa into your fridge, your door, and all your other disparate possessions neatly ties them to one software ecosystem. It’s quite the clever scheme: by selling you the powerful and seamless convenience of voice assistants, Google and Amazon have slowly inched their way into being the central platform for all your data and the core engine for algorithmically streamlining your life.

Whether or not you trust either company with that much control, such a grand undertaking will be limited by what voice assistants can understand. And compared with other subfields of AI, progress in natural-language processing and generation has kind of lagged behind.

But that could be about to change. Last year several research teams used new machine-learning techniques to make impressive breakthroughs in language comprehension. In June, for example, research nonprofit OpenAI developed an unsupervised learning technique to train systems on unstructured, rather than cleaned and labeled, text. It dramatically lowered the costs of acquiring more training data, thereby increasing their system’s performance. A few months later, Google released an even better unsupervised algorithm that is as good as humans at completing sentences with multiple-choice answers.

All these advancements are getting us closer to a day when machines that really understand what we mean could render physical and visual interfaces obsolete—and usher in the full potential of an OMO world. For better or worse.

This originally appeared in our AI newsletter The Algorithm. To have it directly delivered to your inbox, subscribe here for free.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.