Everything you own in the future will be controlled by your voice. That’s what this year’s CES, the world’s largest annual gadget bonanza, has made abundantly clear.
Google and Amazon have been in fierce competition to put their assistants into your TV, your car, and even your bathroom. It all came to a head this week in Las Vegas, where the full line-up of voice-enabled products underscored the scope of each company’s ambitions.
Maybe it seems like a wasteful side effect of capitalism that you can now ask Alexa to lift your toilet cover (or maybe not—you do you), but there’s more to the ubiquity of voice interfaces than a never-ending series of hardware companies jumping on the bandwagon.
It’s tied to an idea that leading AI expert Kai-Fu Lee calls OMO, online-merge-of-offline. OMO, as he describes it, refers to combining our digital and physical worlds in such a way that every object in our surrounding environment will become an interaction point for the internet—as well as a sensor that collects data about our lives. This will power what he dubs the “third wave” of AI: our algorithms, finally given a comprehensive view of all our behaviors, will be able to hyper-personalize our experiences, whether in the grocery store or the classroom.
But this vision requires everything to be connected. It requires your shopping cart to know what’s in your fridge so it can recommend the optimal shopping list. It requires your front door to know your online purchases and whether you’re waiting for an in-home delivery. That’s where voice interfaces come in: installing Alexa into your fridge, your door, and all your other disparate possessions neatly ties them to one software ecosystem. It’s quite the clever scheme: by selling you the powerful and seamless convenience of voice assistants, Google and Amazon have slowly inched their way into being the central platform for all your data and the core engine for algorithmically streamlining your life.
Whether or not you trust either company with that much control, such a grand undertaking will be limited by what voice assistants can understand. And compared with other subfields of AI, progress in natural-language processing and generation has kind of lagged behind.
But that could be about to change. Last year several research teams used new machine-learning techniques to make impressive breakthroughs in language comprehension. In June, for example, research nonprofit OpenAI developed an unsupervised learning technique to train systems on unstructured, rather than cleaned and labeled, text. It dramatically lowered the costs of acquiring more training data, thereby increasing their system’s performance. A few months later, Google released an even better unsupervised algorithm that is as good as humans at completing sentences with multiple-choice answers.
All these advancements are getting us closer to a day when machines that really understand what we mean could render physical and visual interfaces obsolete—and usher in the full potential of an OMO world. For better or worse.
This originally appeared in our AI newsletter The Algorithm. To have it directly delivered to your inbox, subscribe here for free.
The inside story of how ChatGPT was built from the people who made it
Exclusive conversations that take us behind the scenes of a cultural phenomenon.
AI is dreaming up drugs that no one has ever seen. Now we’ve got to see if they work.
AI automation throughout the drug development pipeline is opening up the possibility of faster, cheaper pharmaceuticals.
GPT-4 is bigger and better than ChatGPT—but OpenAI won’t say why
We got a first look at the much-anticipated big new language model from OpenAI. But this time how it works is even more deeply under wraps.
The original startup behind Stable Diffusion has launched a generative AI for video
Runway’s new model, called Gen-1, can change the visual style of existing videos and movies.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.