Not every team has Google’s resources. And while most of them do use a similar statistical approach, many reflect the influence of linguistics. Ongoing research at Kansas State University utilizes not only computer scientists, but also anthropologists, modern-language scholars, and psychologists to develop new approaches to machine translations. In addition, researchers are using the statistical methods to find, summarize, and extract information from existing texts–applications in the broader field of data mining.
Kansas State’s team, under the direction of associate professor William Hsu, submitted a translation algorithm for NIST’s evaluation for the first time this year. Hsu and his team were not the only newcomers: from 2005 to 2006, the number of submissions to the NIST program doubled.
The machine-calculated scoring system BLEU does not look at the algorithms themselves. Rather, with a high-tech “honor system” in place, NIST sends original documents to the entrant, who translates the texts using his or her algorithm and returns the finished translation. After the evaluations, the participants are required to attend a conference where they can share ideas and approaches.
Mark Przybocki, the coordinator of the NIST Machine Translation Evaluations, has worked on the program since it began in 2001; he believes that the past five years have shown tremendous improvement. “If you compare translations from 2001 or 2003, your intuition tells you they are improving,” he says.
The NIST evaluations grew out of a translation project sponsored by the Defense Advanced Research Projects Agency (DARPA), the primary research organization for the Department of Defense. Once the evaluations for DARPA were finished, Przybocki says, NIST officials realized that researchers who worked in machine translation had no touchstone for measuring progress and success. Even though language translation is subjective, the annual NIST evaluation provides scientists in the field with an infrastructure for discussion and research. And whether scientists confront the language barrier from statistics or linguistics, an ongoing dialogue might inspire unexpected hybrids of the two approaches.
“The technology is interesting and young,” Przybocki says. “It’s a hard call to say any one technology is going to be the dominant force in the future.”