Everything you own in the future will be controlled by your voice. That’s what this year’s CES, the world’s largest annual gadget bonanza, has made abundantly clear.
Google and Amazon have been in fierce competition to put their assistants into your TV, your car, and even your bathroom. It all came to a head this week in Las Vegas, where the full line-up of voice-enabled products underscored the scope of each company’s ambitions.
Maybe it seems like a wasteful side effect of capitalism that you can now ask Alexa to lift your toilet cover (or maybe not—you do you), but there’s more to the ubiquity of voice interfaces than a never-ending series of hardware companies jumping on the bandwagon.
It’s tied to an idea that leading AI expert Kai-Fu Lee calls OMO, online-merge-of-offline. OMO, as he describes it, refers to combining our digital and physical worlds in such a way that every object in our surrounding environment will become an interaction point for the internet—as well as a sensor that collects data about our lives. This will power what he dubs the “third wave” of AI: our algorithms, finally given a comprehensive view of all our behaviors, will be able to hyper-personalize our experiences, whether in the grocery store or the classroom.
But this vision requires everything to be connected. It requires your shopping cart to know what’s in your fridge so it can recommend the optimal shopping list. It requires your front door to know your online purchases and whether you’re waiting for an in-home delivery. That’s where voice interfaces come in: installing Alexa into your fridge, your door, and all your other disparate possessions neatly ties them to one software ecosystem. It’s quite the clever scheme: by selling you the powerful and seamless convenience of voice assistants, Google and Amazon have slowly inched their way into being the central platform for all your data and the core engine for algorithmically streamlining your life.
Whether or not you trust either company with that much control, such a grand undertaking will be limited by what voice assistants can understand. And compared with other subfields of AI, progress in natural-language processing and generation has kind of lagged behind.
But that could be about to change. Last year several research teams used new machine-learning techniques to make impressive breakthroughs in language comprehension. In June, for example, research nonprofit OpenAI developed an unsupervised learning technique to train systems on unstructured, rather than cleaned and labeled, text. It dramatically lowered the costs of acquiring more training data, thereby increasing their system’s performance. A few months later, Google released an even better unsupervised algorithm that is as good as humans at completing sentences with multiple-choice answers.
All these advancements are getting us closer to a day when machines that really understand what we mean could render physical and visual interfaces obsolete—and usher in the full potential of an OMO world. For better or worse.
This originally appeared in our AI newsletter The Algorithm. To have it directly delivered to your inbox, subscribe here for free.
Yann LeCun has a bold new vision for the future of AI
One of the godfathers of deep learning pulls together old ideas to sketch out a fresh path for AI, but raises as many questions as he answers.
Inside a radical new project to democratize AI
A group of over 1,000 AI researchers has created a multilingual large language model bigger than GPT-3—and they’re giving it out for free.
Sony’s racing AI destroyed its human competitors by being nice (and fast)
What Gran Turismo Sophy learned on the racetrack could help shape the future of machines that can work alongside humans, or join us on the roads.
DeepMind has predicted the structure of almost every protein known to science
And it’s giving the data away for free, which could spur new scientific discoveries.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.