DeepMind has developed software that forms links between activities and sounds in video through unsupervised learning. New Scientist reports that the firm's new AI uses three neural nets: one for image recognition, another for identifying sounds, and a third that ties results from the two together. But unlike many machine-learning algorithms, which are provided with labeled data sets to help them associate words with what they see or hear, this system was given a pile of raw data and left to fend for itself.
It was left alone with 60 million video stills, each of which came paired with a one-second audio clip taken from the same point in a video from where the frame was captured. Without human assistance, the system then slowly learned how sounds and image features were related—ultimately finding itself able to link, say, crowds with a cheer and typing hands with that familiar clickety-clack. It can’t yet put a word to any of its observations, but it is another step toward AIs being able to make sense of the world without constantly being told about what they see.
Embracing CX in the metaverse
More than just meeting customers where they are, the metaverse offers opportunities to transform customer experience.
Identity protection is key to metaverse innovation
As immersive experiences in the metaverse become more sophisticated, so does the threat landscape.
The modern enterprise imaging and data value chain
For both patients and providers, intelligent, interoperable, and open workflow solutions will make all the difference.
Scientists have created synthetic mouse embryos with developed brains
The stem-cell-derived embryos could shed new light on the earliest stages of human pregnancy.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.