Soong says that this approach can convert between any pair of 26 languages, including Mandarin Chinese, Spanish, and Italian.
Preserving a person’s voice when synthesizing speech for them in another language would likely be reassuring to a user, and could make interactions reliant on translation software more meaningful, says Shrikanth Narayanan, a professor at the University of Southern California, in Los Angeles, leads a research group working on systems to translate speech in situations such as doctor-patient consultations.
“The word is just one part of what a person is saying,” he says, and to truly convey all the information in a person’s speech, translation systems will need to be able to preserve voices and much more. “Preserving voice, preserving intonation, those things matter, and this project clearly knows that,” says Narayanan. “Our systems need to capture the expression a person is trying to convey, who they are, and how they’re saying it.”
His research group is investigating how features such as emphasis, intonation, and the way people use pauses or hesitation affects the effectiveness and perceived quality of a word-for-word translation. “We’re asking if you can build systems that can mediate between people as well as just replacing the words,” he says. “I view this [Microsoft research] as a part of how you make this happen.”