Technology Review - Published By MIT
Advertisement

Not Lost in Translation

Computer programmers use statistics to convert Arabic and Mandarin Chinese texts into English.

By Stephen Ornes

Thursday, November 16, 2006

smaller text tool iconmedium text tool iconlarger text tool icon
As computer programmers develop new techniques for translating texts between languages with different alphabets, they are increasingly turning to a science that seems to have little in common with the conventions of grammar: statistics.

Last week, the National Institute of Standards and Technology (NIST) released the results of its yearly evaluation of computer algorithms that translate Arabic and Mandarin Chinese texts into English. Topping the charts was Google, whose translations in both languages received higher marks than 39 other entries. A machine-calculated metric called BLEU (BiLingual Evaluation Understudy) used scores from professional human translators to assign a single, final score between zero and one. The higher the score, the more the machine translation approximated a human effort.

"If you get a good score, you're doing well," says Peter Norvig, Google's head of research. "If you get a bad score, then either you did poorly or you did something so novel that the translator didn't see it."

The Google team, led by Franz Och, designed an algorithm that first isolates short sequences of words in the text to be translated and then searches current translations to see how those word sequences have been translated before. The program looks for the most likely correct interpretation, regardless of syntax.

"We look for matches between texts and find several different translations," Norvig says. "You take all these possibilities and ask, What is the most probable in terms of what's been done in the past?"

By comparing the same document (a newspaper article, for example) in two languages, the software builds an active memory that correlates words and phrases. Google's statistical approach, Norvig says, reflects an organic approach to language learning. Rather than checking every translated word against the rules and exceptions of the English language, the program begins with a blank slate and accumulates a more accurate view of the language as a whole. It "learns" the language as the language is used, not as the language is prescribed. (Google's program is still in development, but other publicly available webpage translators use a similar method.)

"This is a more natural way to approach language," Norvig says. "We're not saying we don't like rules, or there's something wrong with them, but right now we don't have the right data ... We're getting most of the benefit of having grammatical rules without actually formally naming them."

Comments

Log In

Forgot your password?     Register »
Advertisement

Videos

Making 3D Maps on the Move
Technology Review November/December 2009

Current Issue

Natural Gas Changes the Energy Map
The United States has vast supplies of this cleaner fossil fuel. But how should we use it?
Advertisement
Advertisement
Subscribe to Technology Review's daily e-mail update. Enter your e-mail address

TECHNOLOGY RESOURCES

More Technology News from Forbes

Advertisement
MIT Massachusetts Institute of Technology © 2009 Technology Review. All Rights Reserved.