Not Lost in Translation

Computer programmers use statistics to convert Arabic and Mandarin Chinese texts into English.

Stephen Ornesarchive page

November 16, 2006

As computer programmers develop new techniques for translating texts between languages with different alphabets, they are increasingly turning to a science that seems to have little in common with the conventions of grammar: statistics.

Last week, the National Institute of Standards and Technology (NIST) released the results of its yearly evaluation of computer algorithms that translate Arabic and Mandarin Chinese texts into English. Topping the charts was Google, whose translations in both languages received higher marks than 39 other entries. A machine-calculated metric called BLEU (BiLingual Evaluation Understudy) used scores from professional human translators to assign a single, final score between zero and one. The higher the score, the more the machine translation approximated a human effort.

“If you get a good score, you’re doing well,” says Peter Norvig, Google’s head of research. “If you get a bad score, then either you did poorly or you did something so novel that the translator didn’t see it.”

The Google team, led by Franz Och, designed an algorithm that first isolates short sequences of words in the text to be translated and then searches current translations to see how those word sequences have been translated before. The program looks for the most likely correct interpretation, regardless of syntax.

“We look for matches between texts and find several different translations,” Norvig says. “You take all these possibilities and ask, What is the most probable in terms of what’s been done in the past?”

By comparing the same document (a newspaper article, for example) in two languages, the software builds an active memory that correlates words and phrases. Google’s statistical approach, Norvig says, reflects an organic approach to language learning. Rather than checking every translated word against the rules and exceptions of the English language, the program begins with a blank slate and accumulates a more accurate view of the language as a whole. It “learns” the language as the language is used, not as the language is prescribed. (Google’s program is still in development, but other publicly available webpage translators use a similar method.)

“This is a more natural way to approach language,” Norvig says. “We’re not saying we don’t like rules, or there’s something wrong with them, but right now we don’t have the right data … We’re getting most of the benefit of having grammatical rules without actually formally naming them.”

Not every team has Google’s resources. And while most of them do use a similar statistical approach, many reflect the influence of linguistics. Ongoing research at Kansas State University utilizes not only computer scientists, but also anthropologists, modern-language scholars, and psychologists to develop new approaches to machine translations. In addition, researchers are using the statistical methods to find, summarize, and extract information from existing texts–applications in the broader field of data mining.

Kansas State’s team, under the direction of associate professor William Hsu, submitted a translation algorithm for NIST’s evaluation for the first time this year. Hsu and his team were not the only newcomers: from 2005 to 2006, the number of submissions to the NIST program doubled.

The machine-calculated scoring system BLEU does not look at the algorithms themselves. Rather, with a high-tech “honor system” in place, NIST sends original documents to the entrant, who translates the texts using his or her algorithm and returns the finished translation. After the evaluations, the participants are required to attend a conference where they can share ideas and approaches.

Mark Przybocki, the coordinator of the NIST Machine Translation Evaluations, has worked on the program since it began in 2001; he believes that the past five years have shown tremendous improvement. “If you compare translations from 2001 or 2003, your intuition tells you they are improving,” he says.

The NIST evaluations grew out of a translation project sponsored by the Defense Advanced Research Projects Agency (DARPA), the primary research organization for the Department of Defense. Once the evaluations for DARPA were finished, Przybocki says, NIST officials realized that researchers who worked in machine translation had no touchstone for measuring progress and success. Even though language translation is subjective, the annual NIST evaluation provides scientists in the field with an infrastructure for discussion and research. And whether scientists confront the language barrier from statistics or linguistics, an ongoing dialogue might inspire unexpected hybrids of the two approaches.

“The technology is interesting and young,” Przybocki says. “It’s a hard call to say any one technology is going to be the dominant force in the future.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.