Skip to Content
Artificial intelligence

If you did the Mannequin Challenge, you are now advancing robotics research

June 26, 2019
Nexus | YouTubeNexus | YouTube

Cast your mind back to the internet in 2016. Do you have hazy memories of the Mannequin Challenge? Well, the viral YouTube trend has now been used to train a neural network in understanding 3D scenes.

The context: We are naturally good at interpreting 2D videos as 3D scenes, but machines need to be taught how to do it. It’s a useful skill to have: the ability to reconstruct the depth and arrangement of freely moving objects can help robots maneuver in unfamiliar surroundings. That’s why the challenge has long captivated computer-vision researchers, especially in the context of self-driving cars.

The data: To approach this problem, a team at Google AI turned to an unexpected data set: thousands of YouTube videos of people performing the Mannequin Challenge. (If it happened to pass you by at the time, this involved standing as still as possible while someone moved around you, filming the pose from all angles.) These videos also happen to be a novel source of data for understanding the depth of a 2D image.

The method: The researchers converted 2,000 of the videos into 2D images with high-resolution depth data and used them to train a neural network. It was then able to predict the depth of moving objects in a video at much higher accuracy than was possible with previous state-of-the-art methods. Last week, the researchers were awarded a best paper honorable mention at a major computer vision conference.

Unknowing participants: The researchers also released their data set to support future research, meaning that thousands of people who participated in the Mannequin Challenge will unknowingly continue to contribute to the advancement of computer vision and robotics research. While that may come as an uncomfortable surprise to some, this is the rule in AI research rather than the exception.

Many of the field’s most foundational data sets, including Fei-Fei Li’s ImageNet, which kicked off the deep-learning revolution, were compiled from publicly available data scraped from Twitter, Wikipedia, Flickr, and other sources. The practice is motivated by the immense amount of data required to train deep-learning algorithms and has only been exacerbated in recent years as researchers produce ever bigger models to achieve breakthrough results.

Data privacy: As we have written before, this data-scraping practice is neither obviously good nor bad but calls into question the norms around consent in the industry. As data becomes increasingly commoditized and monetized, technologists should think about whether the way they’re using someone’s data aligns with the spirit of why it was originally generated and shared.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Responsible technology use in the AI age

AI presents distinct social and ethical challenges, but its sudden rise presents a singular opportunity for responsible adoption.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.