Not Lost in Translation
Computer programmers use statistics to convert Arabic and Mandarin Chinese texts into English.
As computer programmers develop new
techniques for translating texts between languages with different
alphabets, they are increasingly turning to a science that seems to
have little in common with the conventions of grammar: statistics.
Last week, the National Institute of Standards and Technology (NIST) released the results of
its yearly evaluation of computer algorithms that translate Arabic and
Mandarin Chinese texts into English. Topping the charts was Google,
whose translations in both languages received higher marks than 39
other entries. A machine-calculated metric called BLEU (BiLingual
Evaluation Understudy) used scores from professional human translators
to assign a single, final score between zero and one. The higher the
score, the more the machine translation approximated a human effort.
“If
you get a good score, you’re doing well,” says Peter Norvig, Google’s
head of research. “If you get a bad score, then either you did poorly
or you did something so novel that the translator didn’t see it.”
This story is only available to subscribers.
Don’t settle for half the story.
Get paywall-free access to technology news for the here and now.
Subscribe now
Already a subscriber?
Sign in
The
Google team, led by Franz Och, designed an algorithm that first
isolates short sequences of words in the text to be translated and then
searches current translations to see how those word sequences have been
translated before. The program looks for the most likely correct
interpretation, regardless of syntax.
“We
look for matches between texts and find several different
translations,” Norvig says. “You take all these possibilities and ask,
What is the most probable in terms of what’s been done in the past?”
By
comparing the same document (a newspaper article, for example) in two
languages, the software builds an active memory that correlates words
and phrases. Google’s statistical approach, Norvig says, reflects an
organic approach to language learning. Rather than checking every
translated word against the rules and exceptions of the English
language, the program begins with a blank slate and accumulates a more
accurate view of the language as a whole. It “learns” the language as
the language is used, not as the language is prescribed. (Google’s
program is still in development, but other publicly available webpage
translators use a similar method.)
“This
is a more natural way to approach language,” Norvig says. “We’re not
saying we don’t like rules, or there’s something wrong with them, but
right now we don’t have the right data … We’re getting most of the
benefit of having grammatical rules without actually formally naming
them.”
Not
every team has Google’s resources. And while most of them do use a
similar statistical approach, many reflect the influence of
linguistics. Ongoing research at Kansas State University utilizes not
only computer scientists, but also anthropologists, modern-language
scholars, and psychologists to develop new approaches to machine
translations. In addition, researchers are using the statistical
methods to find, summarize, and extract information from existing
texts–applications in the broader field of data mining.
Kansas
State’s team, under the direction of associate professor William Hsu,
submitted a translation algorithm for NIST’s evaluation for the first
time this year. Hsu and his team were not the only newcomers: from 2005
to 2006, the number of submissions to the NIST program doubled.
The
machine-calculated scoring system BLEU does not look at the algorithms
themselves. Rather, with a high-tech “honor system” in place, NIST
sends original documents to the entrant, who translates the texts using
his or her algorithm and returns the finished translation. After the
evaluations, the participants are required to attend a conference where
they can share ideas and approaches.
Mark Przybocki, the coordinator of the NIST Machine Translation Evaluations,
has worked on the program since it began in 2001; he believes that the
past five years have shown tremendous improvement. “If you compare
translations from 2001 or 2003, your intuition tells you they are
improving,” he says.
The
NIST evaluations grew out of a translation project sponsored by the
Defense Advanced Research Projects Agency (DARPA), the primary research
organization for the Department of Defense. Once the evaluations for
DARPA were finished, Przybocki says, NIST officials realized that
researchers who worked in machine translation had no touchstone for
measuring progress and success. Even though language translation is
subjective, the annual NIST evaluation provides scientists in the field
with an infrastructure for discussion and research. And whether
scientists confront the language barrier from statistics or
linguistics, an ongoing dialogue might inspire unexpected hybrids of
the two approaches.
“The
technology is interesting and young,” Przybocki says. “It’s a hard call
to say any one technology is going to be the dominant force in the
future.”