Skip to Content
Artificial intelligence

Google’s auto-complete for speech can cover up glitches in video calls

aehdeschaine / Flickr

The news: With many of us now relying on video calls for face-to-face interaction, choppy connections are more frustrating than ever. An artificial intelligence that mimics an individual speaker’s way of talking can smooth over the cracks by filling in small gaps with snippets of generated speech. Developed by a team at Google, the technology is now being used in Google’s video-calling app Duo

What’s the problem? When you’re on an online call your voice gets chopped up into lots of tiny pieces that are zipped across the internet in data blocks known as packets. Packets often arrive at the other end jumbled up and software has to reorder them. But sometimes packets don’t arrive at all, which creates glitches and gaps in a conversation. This happens at the best of times. According to Google 99% of Duo calls have to deal with jumbled up or lost packets. A tenth of those calls lose more than 8% of their audio.   

Generating speech: To fix the problem, the team built on a neural network developed by DeepMind that can generate realistic speech from text. Called WaveNetEQ, the new neural network was then trained on a large dataset of 100 recorded human voices speaking 48 different languages until it could auto-complete short sections of speech based on common patterns in the way people talk. Because Duo is end-to-end encrypted, the AI runs on the device, not the cloud. During a call, WaveNetEQ is able to learn characteristics of a speaker’s voice and generates audio snippets that match both the style and content of what the speaker is saying. When a packet is lost, the AI generated voice is inserted in its place. 

For now, the AI can only generate syllables rather than whole words or phrases. But short samples Google posted online show that the results can be pretty lifelike. In one case, the AI replaces the second syllable of the word “trouble” in a voice that mimics the male speaker exactly.

Deep Dive

Artificial intelligence

chasm concept
chasm concept

Artificial intelligence is creating a new colonial world order

An MIT Technology Review series investigates how AI is enriching a powerful few by dispossessing communities that have been dispossessed before.

open sourcing language models concept
open sourcing language models concept

Meta has built a massive new language AI—and it’s giving it away for free

Facebook’s parent company is inviting researchers to pore over and pick apart the flaws in its version of GPT-3

spaceman on a horse generated by DALL-E
spaceman on a horse generated by DALL-E

This horse-riding astronaut is a milestone in AI’s journey to make sense of the world

OpenAI’s latest picture-making AI is amazing—but raises questions about what we mean by intelligence.

labor exploitation concept
labor exploitation concept

How the AI industry profits from catastrophe

As the demand for data labeling exploded, an economic catastrophe turned Venezuela into ground zero for a new model of labor exploitation.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.