A new AI creates original video clips from text cues

February 26, 2018

A short, typed description of a scene is enough to get this software making footage.

How it works: Science reports that the AI uses two neural networks—one to create video, another to assess if it’s realistic in order to improve the first’s output. We named these kinds of AIs one of our 10 Breakthrough Technologies of 2018.

What it does: First, the system is trained on footage of activities labelled with descriptions like “playing golf on grass.” It can then recreate similar scenes given a snippet of text. Plus, it can make clips combining disparate concepts from training data, such as “sailing on snow.”

This story is only available to subscribers.

Don’t settle for half the story.
Get paywall-free access to technology news for the here and now.

Already a subscriber?

You’ve read all your free stories.

MIT Technology Review provides an intelligent and independent filter for the flood of information about technology.

Already a subscriber?

Why it matters: Automatic generation of video from text could be incredibly useful—for creating huge sets of synthetic training data for autonomous cars, say. It could also lead to some worrying fake content too.

But: The clips are just 32 frames long and 64×64 pixels in size. They’re still not wholly convincing, and if they’re made larger, accuracy plummets. All that needs fixing to build a useful text-to-video converter.

This is your last free story.

Our most popular stories