A new AI creates original video clips from text cues

Jamie Condliffearchive page

February 26, 2018

A short, typed description of a scene is enough to get this software making footage.

How it works: Science reports that the AI uses two neural networks—one to create video, another to assess if it’s realistic in order to improve the first's output. We named these kinds of AIs one of our 10 Breakthrough Technologies of 2018.

What it does: First, the system is trained on footage of activities labelled with descriptions like "playing golf on grass." It can then recreate similar scenes given a snippet of text. Plus, it can make clips combining disparate concepts from training data, such as "sailing on snow."

Why it matters: Automatic generation of video from text could be incredibly useful—for creating huge sets of synthetic training data for autonomous cars, say. It could also lead to some worrying fake content too.

But: The clips are just 32 frames long and 64x64 pixels in size. They're still not wholly convincing, and if they're made larger, accuracy plummets. All that needs fixing to build a useful text-to-video converter.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.