The World Wide Translator

Will Web-wide “translation memory” finally make machine translation pay off?

Alan Leoarchive page

September 21, 2001

“Hour is the moment for all the good men to come to the subsidy of them country.”

Hardly a rousing cry. Despite hundreds of millions of dollars and decades of research, such gibberish typifies the results of language translation software. As a result, the translation business hasn’t come very far from its days as a cottage industry-an expensive, time-consuming process dependent on highly specialized human translators.

Globalization companies hope to break through this barrier with software that employs translation memory-a way to use past translations to speed new ones. But building a useful database of translations is a slow and expensive endeavor, and companies guard their translations jealously.

Even worse, globalization software makers have been slower than other high-tech industries to develop standards for interoperability. If, for example, General Motors decides to switch translation software, it can’t take its translation memory with it-a potential loss of millions of dollars of intellectual property.

“You might have a huge translation memory, but if your client requires you to use another tool, you can’t use it,” says Kara Warburton, a terminology expert at IBM. Warburton belongs to two industry groups working toward a solution: a technical committee at the International Organization for Standards, and the Localization Industry Standards Organization, a trade group.

Their ultimate goal: when anyone, anywhere, corrects the sentence above, it will forever after translate: “Now is the time for all good men to come to the aid of their country.”

Extremely Complex

“This whole area of language is extremely complex,” says IDC analyst Steve McClure. “It’s probably the most complicated problem in computer science that I’m aware of.”

Computer-assisted translation typically involves two steps. First, a rules engine parses the original sentence, attempting to identify the relationships between the words. The engine then translates each word within the context that it believes to be correct-often with mixed results.

That’s how most machine translation works, including Altavista’s Babelfish Web site (source of the example above, translated from English to Italian and back) and freetranslation.com.

“Unfortunately,” says Mark Lancaster, CEO of SDL International, a London-based globalization firm, “the way that we speak is very ambiguously. And so it’s very difficult to interpret random input, which is essentially how we speak.” As a result, no matter how good a rules engine is, a human translator still must correct its mistakes (“Hour is the moment”).

This second step remains the most time-consuming and expensive aspect of translation, often requiring expertise in a specific technical field as well as in the source and target languages. Moreover, two human experts may translate the same passage differently in texts where consistency is desired.

To correct this problem, translation memory stores the human-corrected translation along with the original, non-translated text. For each document, the software compares each sentence of the original to its growing translation memory.

When it finds a sentence it has seen before, it uses the remembered translation instead of the rules engine-knowing, instead of guessing. It then flags the new sections, cutting down the time spent by human reviewers. And as it adds each successive document to its translation memory, it knows more and guesses less.

For closely related sentences, fuzzy matching allows the software to produce a partial translation while flagging the differences for a human reviewer.

While not all computer-aided translation incorporates translation memory, many globalization software providers, including Trados, Mendez, Star AG, Atril, SDL, and Alchemy Software offer products that do.

Who Wants to Play?

Lancaster is excited about the potential to share translation memories. “We’ve been building translation memories for ten years, so we have pretty big database repositories,” he says.

For now, Lancaster says, SDL uses those databases only for its own translation work but plans to develop a shareable one: customers using SDL’s translation software, SDLX, will gain access to a massive database of past translations. The price of admission? Customers will have to share their resultsWho or pay a premium to keep them private.

But the idea remains controversial. Would a company willingly share its intellectual property, potentially with competitors? They might in exchange for a discount, claims Lancaster.

Such a tradeoff may appeal to small or medium-sized companies, says McClure, but large companies consider their translation memories valuable intellectual property and would be unlikely to share them.

“If Cisco has to go to the trouble of translating the gigabit router instructions to Mandarin Chinese, that’s not going to be easy,” agrees analyst Eric Schmitt of Forrester Research. “It’s going to be expensive. Cisco doesn’t want to go to the trouble and then have Alcatel and Juniper come along and get the same benefit.”

Still, while these challenges remain great, they may not be computer translation’s largest stumbling block, says David Parmenter of Basis Technology in Cambridge, MA, a firm that assists companies in moving their business worldwide.

“The bulk of the translation business is built on foreign translators who do the work piecemeal,” Parmenter says. “It’s hard to beat the economics of that.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.