September 11 affected millions of people in myriad ways. For Ed Bice, an American ex-architect, it sparked a desire to get ordinary Middle Easterners–and Westerners–talking together. Naturally, being based in the Bay Area, he turned to the Web for help.
The result, six years later, is Meadan, which means “town square” in Arabic. The basic idea is simple: it’s a website that brings English and Arabic speakers together around daily postings of news articles, broadcasts, and events that are of common interest, and it gives users a platform to communicate through dialogues, blogs, and other exchanges. All the while, it allows users to pinpoint their location so that people can share views across continents.
The hard part is creating a system that allows users to express their ideas in their native tongue. Enter IBM, which has committed $1.7 million to this not-for-profit project. The company has one of the most advanced systems for Arabic-English machine translation. It’s 84 percent accurate and can transmute Arabic to English and back again at a blistering 500 words per second.
This is no easy task, says Salim Roukos, a senior manager for multilingual natural-language processing technologies at IBM’s Watson Research Center. Because word order in Arabic sentences differs from word order in English, verbs can get lost–quite literally–in machine translation. Moreover, Arabic words have prefixes, suffixes, and other forms that allow them to agree in gender and number–a rigor that freewheeling English lacks and that makes translation from English to Arabic even trickier.
IBM’s statistically based translation system has been trained on a massive amount of material, called a parallel corpus, in both modern standard Arabic and formal English–the language of news reports. That means it has roughly 100 million words and more than 10 million phrases to call upon when presented with new text. But the system struggles with slang and other colloquialisms–all the more difficult in Arabic because street talk varies from country to country.
But this is exactly the sort of language that Meadan’s online community will use. So the alpha test, which was launched last month, also calls on the services of human translators to correct IBM’s machine translations. There is plenty of work to be done. Even a basic English expression like “That’s great!” comes out of the machine as the equivalent of “That’s big!” in Arabic. It’s up to users to point this out and up to designated translators to fix it. The correct pair of translations then becomes another piece of data from which the machine can learn.
Meadan hopes to roll out a beta version later this year–provided it raises the $2 million or so it needs to move forward. Bice has high hopes. “A year from now, I hope we are a global social network, talking across languages about events in the world.” Insha’allah, as we say in Arabic.
Hear more from IBM at EmTech 2014.