Cast your mind back to the internet in 2016. Do you have hazy memories of the Mannequin Challenge? Well, the viral YouTube trend has now been used to train a neural network in understanding 3D scenes.
The context: We are naturally good at interpreting 2D videos as 3D scenes, but machines need to be taught how to do it. It’s a useful skill to have: the ability to reconstruct the depth and arrangement of freely moving objects can help robots maneuver in unfamiliar surroundings. That’s why the challenge has long captivated computer-vision researchers, especially in the context of self-driving cars.
The data: To approach this problem, a team at Google AI turned to an unexpected data set: thousands of YouTube videos of people performing the Mannequin Challenge. (If it happened to pass you by at the time, this involved standing as still as possible while someone moved around you, filming the pose from all angles.) These videos also happen to be a novel source of data for understanding the depth of a 2D image.
The method: The researchers converted 2,000 of the videos into 2D images with high-resolution depth data and used them to train a neural network. It was then able to predict the depth of moving objects in a video at much higher accuracy than was possible with previous state-of-the-art methods. Last week, the researchers were awarded a best paper honorable mention at a major computer vision conference.
Unknowing participants: The researchers also released their data set to support future research, meaning that thousands of people who participated in the Mannequin Challenge will unknowingly continue to contribute to the advancement of computer vision and robotics research. While that may come as an uncomfortable surprise to some, this is the rule in AI research rather than the exception.
Many of the field’s most foundational data sets, including Fei-Fei Li’s ImageNet, which kicked off the deep-learning revolution, were compiled from publicly available data scraped from Twitter, Wikipedia, Flickr, and other sources. The practice is motivated by the immense amount of data required to train deep-learning algorithms and has only been exacerbated in recent years as researchers produce ever bigger models to achieve breakthrough results.
Data privacy: As we have written before, this data-scraping practice is neither obviously good nor bad but calls into question the norms around consent in the industry. As data becomes increasingly commoditized and monetized, technologists should think about whether the way they’re using someone’s data aligns with the spirit of why it was originally generated and shared.
AI for everything: 10 Breakthrough Technologies 2024
Generative AI tools like ChatGPT reached mass adoption in record time, and reset the course of an entire industry.
What’s next for AI in 2024
Our writers look at the four hot trends to watch out for this year
OpenAI teases an amazing new generative video model called Sora
The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.
Google’s Gemini is now in everything. Here’s how you can try it out.
Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.