Found in Translation
September 11 affected millions of people in myriad ways. For Ed Bice, an American ex-architect, it sparked a desire to get ordinary Middle Easterners–and Westerners–talking together. Naturally, being based in the Bay Area, he turned to the Web for help.
The result, six years later, is Meadan, which means “town square” in Arabic. The basic idea is simple: it’s a website that brings English and Arabic speakers together around daily postings of news articles, broadcasts, and events that are of common interest, and it gives users a platform to communicate through dialogues, blogs, and other exchanges. All the while, it allows users to pinpoint their location so that people can share views across continents.
The hard part is creating a system that allows users to express their ideas in their native tongue. Enter IBM, which has committed $1.7 million to this not-for-profit project. The company has one of the most advanced systems for Arabic-English machine translation. It’s 84 percent accurate and can transmute Arabic to English and back again at a blistering 500 words per second.
This is no easy task, says Salim Roukos, a senior manager for multilingual natural-language processing technologies at IBM’s Watson Research Center. Because word order in Arabic sentences differs from word order in English, verbs can get lost–quite literally–in machine translation. Moreover, Arabic words have prefixes, suffixes, and other forms that allow them to agree in gender and number–a rigor that freewheeling English lacks and that makes translation from English to Arabic even trickier.
IBM’s statistically based translation system has been trained on a massive amount of material, called a parallel corpus, in both modern standard Arabic and formal English–the language of news reports. That means it has roughly 100 million words and more than 10 million phrases to call upon when presented with new text. But the system struggles with slang and other colloquialisms–all the more difficult in Arabic because street talk varies from country to country.
But this is exactly the sort of language that Meadan’s online community will use. So the alpha test, which was launched last month, also calls on the services of human translators to correct IBM’s machine translations. There is plenty of work to be done. Even a basic English expression like “That’s great!” comes out of the machine as the equivalent of “That’s big!” in Arabic. It’s up to users to point this out and up to designated translators to fix it. The correct pair of translations then becomes another piece of data from which the machine can learn.
Meadan hopes to roll out a beta version later this year–provided it raises the $2 million or so it needs to move forward. Bice has high hopes. “A year from now, I hope we are a global social network, talking across languages about events in the world.” Insha’allah, as we say in Arabic.
Keep Reading
Most Popular
The inside story of how ChatGPT was built from the people who made it
Exclusive conversations that take us behind the scenes of a cultural phenomenon.
How Rust went from a side project to the world’s most-loved programming language
For decades, coders wrote critical systems in C and C++. Now they turn to Rust.
Design thinking was supposed to fix the world. Where did it go wrong?
An approach that promised to democratize design may have done the opposite.
Sam Altman invested $180 million into a company trying to delay death
Can anti-aging breakthroughs add 10 healthy years to the human life span? The CEO of OpenAI is paying to find out.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.