Skip to Content
Artificial intelligence

Google’s AI can now translate your speech while keeping your voice

Researchers trained a neural network to map audio “voiceprints” from one language to another.
May 20, 2019
An image of U.N. Secretary-General Ban Ki-moon listening to a translation device
An image of U.N. Secretary-General Ban Ki-moon listening to a translation device
An image of U.N. Secretary-General Ban Ki-moon listening to a translation deviceEMILIO MORENATTI/AP

Listen to this Spanish audio clip.

This is how its English translation might sound when put through a traditional automated translation system.

Now this is how it sounds when put through Google’s new automated translation system.

The results aren’t perfect, but you can sort of hear how Google’s translator was able to retain the voice and tone of the original speaker. It can do this because it converts audio input directly to audio output without any intermediary steps. In contrast, traditional translational systems convert audio into text, translate the text, and then resynthesize the audio, losing the characteristics of the original voice along the way.

The new system, dubbed the Translatotron, has three components, all of which look at the speaker’s audio spectrogram—a visual snapshot of the frequencies used when the sound is playing, often called a voiceprint.  The first component uses a neural network trained to map the audio spectrogram in the input language to the audio spectrogram in the output language. The second converts the spectrogram into an audio wave that can be played. The third component can then layer the original speaker’s vocal characteristics back into the final audio output.

Not only does this approach produce more nuanced translations by retaining important nonverbal cues, but in theory it should also minimize translation error, because it reduces the task to fewer steps.

Translatotron is currently a proof of concept. During testing, the researchers trialed the system only with Spanish-to-English translation, which already took a lot of carefully curated training data. But audio outputs like the clip above demonstrate the potential for a commercial system later down the line. You can listen to more of them here.

Deep Dive

Artificial intelligence

chasm concept
chasm concept

Artificial intelligence is creating a new colonial world order

An MIT Technology Review series investigates how AI is enriching a powerful few by dispossessing communities that have been dispossessed before.

open sourcing language models concept
open sourcing language models concept

Meta has built a massive new language AI—and it’s giving it away for free

Facebook’s parent company is inviting researchers to pore over and pick apart the flaws in its version of GPT-3

spaceman on a horse generated by DALL-E
spaceman on a horse generated by DALL-E

This horse-riding astronaut is a milestone in AI’s journey to make sense of the world

OpenAI’s latest picture-making AI is amazing—but raises questions about what we mean by intelligence.

labor exploitation concept
labor exploitation concept

How the AI industry profits from catastrophe

As the demand for data labeling exploded, an economic catastrophe turned Venezuela into ground zero for a new model of labor exploitation.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.