Text and Voice Translation in Real Time

Tom Simonitearchive page

April 19, 2011

The translation technology in Google’s smart-phone apps is the closest thing to a universal translator today. Google Goggles, for example, extracts text from an image snapped by a user and converts it with the help of a technique called statistical machine translation, relying on a cloud server for the necessary processing power. So far it can recognize text in just five European languages and translate the text into any of those languages plus another 12. But Google is aggressively expanding into new languages, particularly those used in markets where mobile devices are often the only computers people own.

Augmented reality could integrate such translation services into everyday views of a foreign city. With Word Lens, from Quest Visual of San Francisco (see “A New Reality”), a user can aim the phone at a sign or document and immediately consult the display to see translated text replacing the original. Because the translation takes place inside the device and doesn’t rely on network connections, it’s handy when traveling. So far Word Lens can translate only between Spanish and English. More languages are in the works.

Google recently launched a cloud-based app that can take in spoken phrases in any of 15 languages and synthesize translated speech in any of 23. The next frontier: translating live speech. In “conversation mode” you speak into your Android device; Google Translate will read a translation. At a January conference, Google chairman Eric Schmidt said the system “seems to work like magic.” Yet it works only between speakers of English and Spanish, and accents and noise degrade accuracy. Adding languages and improving accuracy is an active area of research.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.