Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not a subscriber? Subscribe now for unlimited access to online articles.

Intelligent Machines

When AI Supplies the Sound in Video Clips, Humans Can’t Tell the Difference

Which of these video soundtracks are real and which are generated by a machine?

Machine learning is changing the way we think about images and how they are created. Researchers have trained machines to generate faces, to draw cartoons, and even to transfer the style of paintings to pictures. It is just a short step from these techniques to creating videos in this way, and indeed this is already being done.

All that points to a way of creating virtual environments entirely by machine. That opens all kinds of possibilities for the future of human experience.

But there is a problem. Video is not just a visual experience; generating realistic sound is just as important. So an interesting question is whether machines can convincingly generate the audio component of a video.

Today we get an answer thanks to the work of Yipin Zhou and pals at the University of North Carolina at Chapel Hill and a few buddies at Adobe Research. These guys have trained a machine-learning algorithm to generate realistic soundtracks for short video clips. 

Indeed, the sounds are so realistic that they fool most humans into thinking they are real. You can take a test yourself here to see if you can tell the difference.

Can you tell which of these video clips have real sound and which are computer generated?

The team take the standard approach to machine learning. Algorithms are only ever as good as the data used to train them, so the first step is to create a large, high-quality annotated data set of video examples.

The team create this data set by selecting a subset of clips from a Google collection called Audioset, which consists of over two million 10-second clips from YouTube that all include audio events. These videos are divided into human-labeled categories focusing on things like dogs, chainsaws, helicopters, and so on

To train a machine, the team must have clips in which the sound source is clearly visible. So any video that contains audio from off-screen events is unsuitable. The team filters these out using crowdsourced workers from Amazon’s Mechanical Turk service to find clips in which the audio source is clearly visible and dominates the soundtrack.

That produced a new data set with over 28,000 videos, each about seven seconds in length, covering 10 different categories.

Next, the team used these videos to train a machine to recognize the waveforms associated with each category and to reproduce them from scratch using a neural network called SampleRNN.

Finally, they tested the results by asking human evaluators to rate the quality of the sound accompanying a video and to determine whether it is real or artificially generated.

The results suggest that machines can become pretty good at this task. “Our experiments show that the generated sounds are fairly realistic and have good temporal synchronization with the visual inputs,” say Zhou and co.

And human evaluators seem to agree. “Evaluations show that over 70% of the generated sound from our models can fool humans into thinking that they are real,” say Zhou and co. 

That’s interesting work that paves the way for automated sound editing. A common problem in videos is that extraneous noise from an off-screen source can ruin a clip. So having a way to automatically replace the sound with a realistic machine-generated alternative will be useful.

And with Adobe’s involvement in this research, it may not be long before we see this kind of capability in commercial video editing software.

Ref: arxiv.org/abs/1712.01393 : Visual to Sound: Generating Natural Sound for Videos in the Wild

Keep up with the latest in intelligent machines at EmTech Digital.

The Countdown has begun.
March 25-26, 2019
San Francisco, CA

Register now
More from Intelligent Machines

Artificial intelligence and robots are transforming how we work and live.

Want more award-winning journalism? Subscribe to MIT Technology Review.
  • Print + All Access Digital {! insider.prices.print_digital !}* Best Value

    {! insider.display.menuOptionsLabel !}

    The best of MIT Technology Review in print and online, plus unlimited access to our online archive, an ad-free web experience, discounts to MIT Technology Review events, and The Download delivered to your email in-box each weekday.

    See details+

    12-month subscription

    Unlimited access to all our daily online news and feature stories

    6 bi-monthly issues of print + digital magazine

    10% discount to MIT Technology Review events

    Access to entire PDF magazine archive dating back to 1899

    Ad-free website experience

    The Download: newsletter delivery each weekday to your inbox

    The MIT Technology Review App

  • All Access Digital {! insider.prices.digital !}*

    {! insider.display.menuOptionsLabel !}

    The digital magazine, plus unlimited site access, our online archive, and The Download delivered to your email in-box each weekday.

    See details+

    12-month subscription

    Unlimited access to all our daily online news and feature stories

    Digital magazine (6 bi-monthly issues)

    Access to entire PDF magazine archive dating back to 1899

    The Download: newsletter delivery each weekday to your inbox

  • Print Subscription {! insider.prices.print_only !}*

    {! insider.display.menuOptionsLabel !}

    Six print issues per year plus The Download delivered to your email in-box each weekday.

    See details+

    12-month subscription

    Print magazine (6 bi-monthly issues)

    The Download: newsletter delivery each weekday to your inbox

/3
You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.