Baidu has a new neural-network-powered system that is amazingly good at cloning voices.
Mic check: To re-create a voice, AI typically needs to listen to hours of recordings of someone talking. But as New Scientist reports, a new process could get that down to one minute. Baidu researchers have unveiled an upgraded version of Deep Voice, their text-to speech synthesis system, that can now, once trained, clone any voice after listening to a few snippets of audio.
Details: The more samples Deep Voice hears, the better the results, but just 10 samples of less than five seconds each were enough for it to produce a synthetic voice that could fool a voice-recognition system more than 95 percent of the time. Baidu hosted some of the voice-cloning samples here for anyone to take a listen.
Of course there’s a downside: Technology like this could seriously undermine biometric security that uses someone’s voice as a security feature. People are already falling for e-mails “from” their friends—so what happens when it sounds like your mom calling and asking to borrow some money?