In the early, post-World War II days of computing, scientists dreamed of creating software so intelligent it could accurately translate one language into another. If computers could crack enemy codes, the thinking went, then why not foreign languages? Five decades later, researchers are still working on the problem. But what was a dream in the 1950s has become an overwhelming demand as business increasingly ignores traditional borders. In fact, by 2007, the already huge translation market-mostly manual today-is expected to reach $13 billion, as advertising, Web pages, and even internal company documents have to be tailored to different countries. “There’s so much more text than anybody could hope to translate by hand,” says Stephen Richardson, a senior researcher at Microsoft Research, “and it’s growing exponentially.”
Creating effective translation software requires solving many of the same problems that natural-language processing faces, and then some. Both systems must determine from context whether “light bulb,” for example, means a source of light or a less-than-heavy plant bulb. But while customer service software at, say, General Electric could be fairly sure of its interpretation, translation software might have to handle treatises on horticulture as well. Translation software must also contend with idioms whose figurative meaning has nothing to do with their literal meaning-and then find parallel idioms in a different language. It’s a problem that makes word-for-word translation impracticable.
Researchers are making progress today using three basic approaches drawn from natural-language processing. Knowledge-based machine translation, for example, relies on human programmers to write lists of rules that describe all possible relationships between verbs, nouns, prepositions, and so on for each language. To generate a sensible translation, the software then scours those rules to find matching words and relationships in the target language. IBM employs knowledge-based software in its WebSphere Translation Server, which the company and its global customers use to rapidly translate e-mail, Web sites, and even real-time chats.
A second approach, example-based systems, relies chiefly on raw computing power. Software algorithms search through millions of words and phrases in documents that already have been accurately translated into multiple languages, compare the translations, and then create an enormous database of vocabulary and word relationships to resolve ambiguity for future translations.
Statistical techniques also depend on computing power to compare reams of previously translated text. However, this strategy selects the most likely translation using sophisticated mathematical models that the software continually upgrades based on how often its interpretations prove accurate. If it correctly translates “light bulb,” then the probability score assigned to that phrase for future translations goes up. Microsoft used a combination of example-based and statistical techniques to automatically translate its entire 60-million-word product-support-services knowledge base from English to Spanish.
With accuracy rates hovering between 70 and 80 percent, no system is foolproof. But, says Michael McCord of IBM’s Language Analysis and Translation group in Yorktown Heights, NY, translation software significantly reduces the time it takes for humans to translate documents, and sometimes an approximate translation will do. Meanwhile, accuracy rates keep inching up. “More work, bigger machines, more people. We’ll get better,” he says.
A Roomba recorded a woman on the toilet. How did screenshots end up on Facebook?
Robot vacuum companies say your images are safe, but a sprawling global supply chain for data from our devices creates risk.
A startup says it’s begun releasing particles into the atmosphere, in an effort to tweak the climate
Make Sunsets is already attempting to earn revenue for geoengineering, a move likely to provoke widespread criticism.
10 Breakthrough Technologies 2023
These exclusive satellite images show that Saudi Arabia’s sci-fi megacity is well underway
Weirdly, any recent work on The Line doesn’t show up on Google Maps. But we got the images anyway.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.