A new algorithm can mimic your voice with just snippets of audio
Baidu has a new neural-network-powered system that is amazingly good at cloning voices.
Mic check: To re-create a voice, AI typically needs to listen to hours of recordings of someone talking. But as New Scientist reports, a new process could get that down to one minute. Baidu researchers have unveiled an upgraded version of Deep Voice, their text-to speech synthesis system, that can now, once trained, clone any voice after listening to a few snippets of audio.
Details: The more samples Deep Voice hears, the better the results, but just 10 samples of less than five seconds each were enough for it to produce a synthetic voice that could fool a voice-recognition system more than 95 percent of the time. Baidu hosted some of the voice-cloning samples here for anyone to take a listen.
Of course there’s a downside: Technology like this could seriously undermine biometric security that uses someone’s voice as a security feature. People are already falling for e-mails “from” their friends—so what happens when it sounds like your mom calling and asking to borrow some money?
Deep Dive
Artificial intelligence
Large language models can do jaw-dropping things. But nobody knows exactly why.
And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.
Google DeepMind’s new generative model makes Super Mario–like games from scratch
Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.
What’s next for generative video
OpenAI's Sora has raised the bar for AI moviemaking. Here are four things to bear in mind as we wrap our heads around what's coming.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.