Computing

Repetez, en anglais, s'il vous plait

Most commercial language translation software is pretty bad -- but there may be a better way.

  • Wednesday, January 18, 2006
  • By Kate Greene

While the quality of computer-rendered translations has improved greatly over the past 20 years, some results are still just as grammatically goofy as the instructions on a chopstick wrapper. Take, for example, a website for a Japanese apple farm that was converted to English using Google's automatic translation service:

"The Someya apple garden it will pass very! It is planted in 1954, furthermore even now, it exceeds tree's age 50 year old prosperously, large - coming the tree abnormal play alligator apple is fructified. The tasty apple where temperature difference of day and night tightened to be extreme Gunma prefecture Numata city which four seasons are clear large the nature, hard is created."*

Advertisement

Yes, the big picture gets across, but much is lost by Google's Japanese-to-English translation algorithm. Google has been offering its translation feature for a number of years, as has the Canadian-based internet company Babel Fish. More recently, though, commercial software developers have begun exploring translation beyond a static webpage or electronic document and are applying the technology to the real-time Internet instant-messaging conversations. Earlier this month, AvMedia released an instant-messaging translator designed to make chatting with friends who speak German, Spanish, French, Italian, and Portuguese easier for English speakers, and vice versa (French can also be translated to German, and German to French).

But all of this software still lacks sufficient accuracy to be useful in demanding situations, such as business negotiations or military planning. This is probably because most commercial software follows a traditional approach to machine translation, says Kevin Knight, computer scientist at the University of Southern California’s Information Sciences Institute (ISI) and co-founder of the California-based company Language Weaver.

Traditionally, machine translation software has depended on algorithms that sort through thousands of grammar rules for the two languages to be translated, Knight says. The problem, he explains, is that so many rules need to be written manually, as do the exceptions to these rules, and inaccuracy creeps in when complex sets of rules contradict each other. “If you write the 5000th rule, sometimes you break things,” Knight says.

With Language Weaver and his research at USC, Knight, as well as a handful of other researchers throughout the world, approach the problem differently. Instead of following rigid grammatical rules, Language Weaver matches correct words and phrases across languages based on the probability that such words and phrases are correct in a given context.

This statistical approach draws from a large number of examples from already translated documents, says Michael Collins, a computer engineer at MIT who uses the same method for a software application he's building to perform German-to-English translations. IBM pioneered this approach in the 1990s, he says, in part, by taking advantage of a huge database of Canadian parliamentary proceedings published in both French and English versions.

The statistical variety of machine translation not only produces better results than the traditional method, says Knight, but also the software is designed to continue to improve on its own. The more translated documents the software encounters, the more likely it will match phrases correctly. “A few years ago, for our Chinese and Arabic languages, all we could get was the basic topic of what an article was about,” Knight says. “Now, the resolution is at the sentence level.” [Continued on next page]

---

*Correction, January 18, 2006, 10:00 a.m. EST: In the original version of this story, we cited the following translation of the Someya Apple Farm website: “The apple orchard with big trees over 50 years old. The natural environment around Numata, with the huge temperture difference between day and night, creates uniquly delitious apple.” In fact, this was an excerpt from Google's translation of the apple farm's own English version of its website, not from Google's translation of the original Japanese page. Therefore it was not a valid example of the poor quality of some machine translation algorithms. In the story we have now substituted Google's translation of the original Japanese site. Thanks to our readers for pointing out the error. - Eds.

Print

Related Articles

Less Lost in Translation

Non-native English speakers attempting to express themselves in the global language of business and science get a software assist from Microsoft's Beijing lab.

Translation in the Age of Terror

A new U.S. government center will connect linguists on the front lines of the war against terror with translation assistance technologies that can digitize, parse, and digest raw intelligence material.

The Translation Challenge

Software based on rules, examples, or statistics seeks to erase language barriers. It's far from perfect, but sometimes close is good enough.

Close Comments

To comment, please sign in or register

Forgot my password

Guest (Pat)

  • 2219 Days Ago
  • 01/18/2006

Wrong Example from the beginning of the Article

Go to the Japanese Apple Farm Web site, and look carefully. You will find out that the example given from the beginning is actually (most likely) a human translation--because it is part of the English version of the site.

Go to the Japanese version and use Google Translate, you will get this:
"The Someya apple garden it will pass very! It is planted in 1954, furthermore even now, .... " Much more unintelligible.

Admittedly a simple mistake. But well, if this esteemed magazine gets such very basics wrong, how can we trust it with more complicated stuff it routinely tries to cover!
(A high-profile example is the obviously highly biased coverage of Aubrey de Grey's work. By the way, when are you going to publish the result of SENS challenge?)

Reply

Guest (Mike Maxwell)

  • 2219 Days Ago
  • 01/18/2006

LDC info wrong

The para about the LDC is simply wrong.  The LDC has been around for 13 years; I worked there until September.  (See their website at http://www.ldc.upenn.edu.)

There are other errors, too, some of them trivial ("Kevin Caballero" is not a surname--only "Caballero" is a surname, "millions of sentences" should be "millions of words", etc.).  But all in all, I would say they need better fact checking.

Reply

Guest (Benoit Ozell)

  • 2219 Days Ago
  • 01/18/2006

Répétez...

If the aim is to have a good translation, the title should be: "Répétez, en anglais, s'il vous plaît".

Reply

Guest (Wade Roush)

  • 2219 Days Ago
  • 01/18/2006

Error corrected

Pat: Thank you for helping us to catch this mistake. We have corrected it, and appended an explanation of the error at the bottom of the article. -- Wade Roush, Executive Web Editor, TechnologyReview.com.

Reply

Guest (Mike Maxwell)

  • 2219 Days Ago
  • 01/18/2006

Error correction still unclear

From the Correction:

"...an excerpt from Google's translation of the apple farm's own English version of its website..."

I don't get it.  Google is translating English into English?  Shouldn't that be "...an excerpt from the apple farm's own English version of its website..."?

Reply

Guest (Andrew Mole)

  • 2218 Days Ago
  • 01/19/2006

Mis-spellings

Surely it should have been obvious that it was a human translation, since the spelling was so bad - "temperture", "uniquly" and "delitious", and equally clear that it is a translation by someone who is not a native English-speaker - i.e. not even a Google human translation, but rather home-grown at the Japanese Apple Farm. A correction to the correction is definitely in order as per Mike's comment.

Interesting article though! :)

Reply

Advertisement

MAGAZINE

Can We Build Tomorrow's Breakthroughs?

Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.

Sponsored Content

Technologies from National Instruments

Adding Data Logging
Log measured data to a file and open it in Microsoft Excel

> Click here for more National Instruments Videos <
Whitepaper

Temperature Measurements with Thermocouples: How-To Guide

This document is part of the “How-To Guide for Most Common Measurements” centralized resource portal. This tutorial provides a detailed guide for measurement and device considerations to take temperature measurements using thermocouples. Get an introduction to thermocouples, which are inexpensive sensing devices widely used with PC-based data acquisition systems. Also review some specific thermocouple examples and learn how thermocouples work and ways to integrate them into a data acquisition measurement system.

View full PDF > Listen to story >
Find us on Youtube

Videos

A Robot Recruit that Can Do It All

More

Advertisement

Technology Review Lists

TR50

Our list of the 50 most innovative companies, including the following:

SpaceX

Square

Siemens

Groupon

More

Advertisement

Facebook

Advertisement