Skip to Content
Artificial intelligence

If you did the Mannequin Challenge, you are now advancing robotics research

June 26, 2019
Nexus | YouTubeNexus | YouTube

Cast your mind back to the internet in 2016. Do you have hazy memories of the Mannequin Challenge? Well, the viral YouTube trend has now been used to train a neural network in understanding 3D scenes.

The context: We are naturally good at interpreting 2D videos as 3D scenes, but machines need to be taught how to do it. It’s a useful skill to have: the ability to reconstruct the depth and arrangement of freely moving objects can help robots maneuver in unfamiliar surroundings. That’s why the challenge has long captivated computer-vision researchers, especially in the context of self-driving cars.

The data: To approach this problem, a team at Google AI turned to an unexpected data set: thousands of YouTube videos of people performing the Mannequin Challenge. (If it happened to pass you by at the time, this involved standing as still as possible while someone moved around you, filming the pose from all angles.) These videos also happen to be a novel source of data for understanding the depth of a 2D image.

The method: The researchers converted 2,000 of the videos into 2D images with high-resolution depth data and used them to train a neural network. It was then able to predict the depth of moving objects in a video at much higher accuracy than was possible with previous state-of-the-art methods. Last week, the researchers were awarded a best paper honorable mention at a major computer vision conference.

Unknowing participants: The researchers also released their data set to support future research, meaning that thousands of people who participated in the Mannequin Challenge will unknowingly continue to contribute to the advancement of computer vision and robotics research. While that may come as an uncomfortable surprise to some, this is the rule in AI research rather than the exception.

Many of the field’s most foundational data sets, including Fei-Fei Li’s ImageNet, which kicked off the deep-learning revolution, were compiled from publicly available data scraped from Twitter, Wikipedia, Flickr, and other sources. The practice is motivated by the immense amount of data required to train deep-learning algorithms and has only been exacerbated in recent years as researchers produce ever bigger models to achieve breakthrough results.

Data privacy: As we have written before, this data-scraping practice is neither obviously good nor bad but calls into question the norms around consent in the industry. As data becomes increasingly commoditized and monetized, technologists should think about whether the way they’re using someone’s data aligns with the spirit of why it was originally generated and shared.

Deep Dive

Artificial intelligence

What is AI?

Everyone thinks they know but no one can agree. And that’s a problem.

What are AI agents? 

The next big thing is AI tools that can do more complex tasks. Here’s how they will work.

How to use AI to plan your next vacation

AI tools can be useful for everything from booking flights to translating menus.

Why Google’s AI Overviews gets things wrong

Google’s new AI search feature is a mess. So why is it telling us to eat rocks and gluey pizza, and can it be fixed?

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.