For tens of millions of people around the world-from West Africa to Southeast Asia to the Middle East-the Internet’s not such a friendly place. That’s because many of the world’s writing systems still aren’t encoded in software, which means millions of people can’t write e-mail, build Web sites, or search databases in their native scripts. A group of linguists at the University of California, Berkeley, is trying to change that, by making sure that nearly 100 additional scripts have a place in a crucial international standard that lets computers render, process, and send text data.
The university’s initiative “is an effort to rectify an oft-overlooked aspect of the digital divide: many scripts used by languages of under five million speakers in the world today are not represented in the international standard,” says Deborah Anderson, a linguist at Berkeley who leads the effort. That standard is called Unicode, which assigns a unique ID number to every written character, symbol, and punctuation mark in a written language. The ID numbers mean that characters won’t get misinterpreted as data move between software programs or across the Internet-a problem that sometimes shows up as a string of question marks on your screen and can cripple the ability of whole populations to communicate via the Internet. For example, Unicode is enabling radical economic transformations in Vietnam. Before this year, computer and software manufacturers had come up with 43 different ways to encode Vietnamese text, which meant computers couldn’t reliably swap data. Then, early this year, the Vietnamese government adopted Unicode as its national standard.
The problem is that the more obscure writing systems are not yet encoded in the Unicode standard. Adding another 100 scripts is a big task; only 52 are encoded today. To do the job, Berkeley is recruiting and funding linguists, as well as users of scripts like N’Ko (used in West Africa), Balinese (used in Indonesia), and Tifinagh (used in parts of Northern Africa), to determine how many characters each script contains, design fonts, and guide proposals through a bureaucratic maze of government agencies and computer standards bodies. The benefit will be visible to Internet users like Mamady Doumbouya, a Philadelphia publisher who would be able to offer an online version of his newspaper in N’Ko for the first time. “Without Unicode, it takes so much to set up your computer to read a newspaper in N’Ko,” Doumbouya says.
Such changes won’t happen overnight. Anderson estimates that the project, launched last year, will take 10 years to complete. Until recently, computer companies sustained the encoding effort, but their interest is dwindling because users of unencoded alphabets represent too small a market. The Berkeley project is part of a larger effort to make the Internet more globally available; already the World Wide Web Consortium has made it possible to register domain names in these new scripts, meaning, among other things, that the URLs of Web sites can reflect the writing systems of the people who own them.
U.S. national security experts are interested, too. Everette Jordan, head of the National Virtual Translation Center, a newly formed U.S. government office that provides foreign-language resources for the intelligence community, points out that “technologically, we’re deaf, dumb, and blind if we can’t read this stuff.” Soon, though, U.S. security agencies and African newspaper publishers alike could rally to a new standard.
Five poems about the mind
Work reinvented: Tech will drive the office evolution
As organizations navigate a new world of hybrid work, tech innovation will be crucial for employee connection and collaboration.
I taught myself to lucid dream. You can too.
We still don’t know much about the experience of being aware that you’re dreaming—but a few researchers think it could help us find out more about how the brain works.
Is everything in the world a little bit conscious?
The idea that consciousness is widespread is attractive to many for intellectual and, perhaps, also emotional
reasons. But can it be tested? Surprisingly, perhaps it can.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.