An AI Makes Some Sense of the World by Watching Videos Alone
DeepMind has developed software that forms links between activities and sounds in video through unsupervised learning. New Scientist reports that the firm's new AI uses three neural nets: one for image recognition, another for identifying sounds, and a third that ties results from the two together. But unlike many machine-learning algorithms, which are provided with labeled data sets to help them associate words with what they see or hear, this system was given a pile of raw data and left to fend for itself.
It was left alone with 60 million video stills, each of which came paired with a one-second audio clip taken from the same point in a video from where the frame was captured. Without human assistance, the system then slowly learned how sounds and image features were related—ultimately finding itself able to link, say, crowds with a cheer and typing hands with that familiar clickety-clack. It can’t yet put a word to any of its observations, but it is another step toward AIs being able to make sense of the world without constantly being told about what they see.
Deep Dive
Uncategorized
Our best illustrations of 2022
Our artists’ thought-provoking, playful creations bring our stories to life, often saying more with an image than words ever could.
How CRISPR is making farmed animals bigger, stronger, and healthier
These gene-edited fish, pigs, and other animals could soon be on the menu.
The Download: the Saudi sci-fi megacity, and sleeping babies’ brains
10 Breakthrough Technologies 2023
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.