Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

As computer programmers develop new techniques for translating texts between languages with different alphabets, they are increasingly turning to a science that seems to have little in common with the conventions of grammar: statistics.

Last week, the National Institute of Standards and Technology (NIST) released the results of its yearly evaluation of computer algorithms that translate Arabic and Mandarin Chinese texts into English. Topping the charts was Google, whose translations in both languages received higher marks than 39 other entries. A machine-calculated metric called BLEU (BiLingual Evaluation Understudy) used scores from professional human translators to assign a single, final score between zero and one. The higher the score, the more the machine translation approximated a human effort.

“If you get a good score, you’re doing well,” says Peter Norvig, Google’s head of research. “If you get a bad score, then either you did poorly or you did something so novel that the translator didn’t see it.”

The Google team, led by Franz Och, designed an algorithm that first isolates short sequences of words in the text to be translated and then searches current translations to see how those word sequences have been translated before. The program looks for the most likely correct interpretation, regardless of syntax.

“We look for matches between texts and find several different translations,” Norvig says. “You take all these possibilities and ask, What is the most probable in terms of what’s been done in the past?”

By comparing the same document (a newspaper article, for example) in two languages, the software builds an active memory that correlates words and phrases. Google’s statistical approach, Norvig says, reflects an organic approach to language learning. Rather than checking every translated word against the rules and exceptions of the English language, the program begins with a blank slate and accumulates a more accurate view of the language as a whole. It “learns” the language as the language is used, not as the language is prescribed. (Google’s program is still in development, but other publicly available webpage translators use a similar method.)

“This is a more natural way to approach language,” Norvig says. “We’re not saying we don’t like rules, or there’s something wrong with them, but right now we don’t have the right data … We’re getting most of the benefit of having grammatical rules without actually formally naming them.”

3 comments. Share your thoughts »

Tagged: Business

Reprints and Permissions | Send feedback to the editor

From the Archives

Close

Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me