Technology Review - Published By MIT
Advertisement

September 2003

Computers Learn New ABCs

Efforts to encode the world's written languages will enable a truly global Internet.

By Michael Erard

smaller text tool iconmedium text tool iconlarger text tool icon

For tens of millions of people around the world-from West Africa to Southeast Asia to the Middle East-the Internet's not such a friendly place. That's because many of the world's writing systems still aren't encoded in software, which means millions of people can't write e-mail, build Web sites, or search databases in their native scripts. A group of linguists at the University of California, Berkeley, is trying to change that, by making sure that nearly 100 additional scripts have a place in a crucial international standard that lets computers render, process, and send text data.

The university's initiative "is an effort to rectify an oft-overlooked aspect of the digital divide: many scripts used by languages of under five million speakers in the world today are not represented in the international standard," says Deborah Anderson, a linguist at Berkeley who leads the effort. That standard is called Unicode, which assigns a unique ID number to every written character, symbol, and punctuation mark in a written language. The ID numbers mean that characters won't get misinterpreted as data move between software programs or across the Internet-a problem that sometimes shows up as a string of question marks on your screen and can cripple the ability of whole populations to communicate via the Internet. For example, Unicode is enabling radical economic transformations in Vietnam. Before this year, computer and software manufacturers had come up with 43 different ways to encode Vietnamese text, which meant computers couldn't reliably swap data. Then, early this year, the Vietnamese government adopted Unicode as its national standard.

The problem is that the more obscure writing systems are not yet encoded in the Unicode standard. Adding another 100 scripts is a big task; only 52 are encoded today. To do the job, Berkeley is recruiting and funding linguists, as well as users of scripts like N'Ko (used in West Africa), Balinese (used in Indonesia), and Tifinagh (used in parts of Northern Africa), to determine how many characters each script contains, design fonts, and guide proposals through a bureaucratic maze of government agencies and computer standards bodies. The benefit will be visible to Internet users like Mamady Doumbouya, a Philadelphia publisher who would be able to offer an online version of his newspaper in N'Ko for the first time. "Without Unicode, it takes so much to set up your computer to read a newspaper in N'Ko," Doumbouya says.

Such changes won't happen overnight. Anderson estimates that the project, launched last year, will take 10 years to complete. Until recently, computer companies sustained the encoding effort, but their interest is dwindling because users of unencoded alphabets represent too small a market. The Berkeley project is part of a larger effort to make the Internet more globally available; already the World Wide Web Consortium has made it possible to register domain names in these new scripts, meaning, among other things, that the URLs of Web sites can reflect the writing systems of the people who own them.

U.S. national security experts are interested, too. Everette Jordan, head of the National Virtual Translation Center, a newly formed U.S. government office that provides foreign-language resources for the intelligence community, points out that "technologically, we're deaf, dumb, and blind if we can't read this stuff." Soon, though, U.S. security agencies and African newspaper publishers alike could rally to a new standard.

September 2003

Would you like to read more articles from the September 2003 issue?

This article is from the September 2003 Issue of Technology Review. To read other articles from this issue simply register for My.TechnologyReview.com. It's free.

Subscribe today and save up to 41% »

Comments

Advertisement

Current Issue

Technology Review January/February 2009
Lifeline for Renewable Power
Without a radically expanded and smarter electrical grid, wind and solar will remain niche power sources.
•  Subscribe
Save 41%
•  Table of Contents
•  MIT News

Magazine Services

Career Resources

MIT Technology Insider

Stories and breaking news from inside MIT about the latest research, innovations, and startups--in a convenient monthly e-newsletter. Subscribe today
Advertisement

Follow us on Twitter

Twitter

Get Technology Review updates via the web, cellphone, or Instant Messager – Follow techreview on Twitter!

Advertisement

More Technology News from Forbes

Advertisement
Advertisement
TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology