A Deeper Understanding
For all their success, however, in no sense do these systems really “understand” what they hear. They deal only with rules of grammar, probabilities, and stored examples. Indeed, they excel precisely because their makers have turned away from the quest for a system intelligent enough to read and summarize a book or sustain a general conversation.
But other researchers retain a broader view of the possibilities for natural-language processing. Like Ron Kaplan, a research fellow at PARC who developed much of the basic grammatical theory behind many of today’s natural-language systems, they are building software that can cope with a far greater variety of inputs-from newspaper stories to the disorganized mass of multimedia information on the Web. Kaplan is critical of what he calls the “shallow methods” used for niche applications like call steering. “Compared to the alternative”-maintaining a costly staff of human customer-service agents-“they are actually not bad,” he says. “But compared to what you would like, they stink.” A more effective natural-language interface, Kaplan says, would eliminate the need to carefully tailor the systems and allow users to speak or write freely.
Two problems hindering that vision, in Kaplan’s view, are that the databases of language samples upon which simpler systems draw are too small, and the statistical algorithms they use are designed to eliminate the ambiguity in much of what people say, homing in as quickly as possible on the most likely meaning. Kaplan believes that if this ambiguity is eliminated too soon, the correct meaning of an utterance-especially a long or complex sentence-may be lost. So he has spent the last decade working on a grammar-driven system, called the Xerox Linguistic Environment, which actually tries to preserve ambiguity. The system parses an utterance into every possible sentence diagram allowed under a set of 314 rules governing relationships between various parts of speech (PARC researchers assembled the rules manually over three years). A complex sentence with 40 or more words, for instance, might be interpreted in as many as 1,000 different ways.
The system’s grammar analysis is so thorough that it correctly captures, on average, 75 percent of the logical relationships in a sentence-which is “actually very high compared to what most statistical methods do,” says Kaplan. That accuracy rate can be increased to about 80 percent if the software takes advantage of those statistical methods, comparing each possible interpretation to similar diagrams in a “trained” database-in the PARC software’s case, a store of hundreds of thousands of accurate diagrams of sentences drawn from Wall Street Journal articles.
Kaplan plans to first unleash the system on Xerox’s huge digital knowledge base of copier repair techniques, which is constantly consulted and updated by the company’s field technicians. There it will compare thousands of individual entries in order to weed out redundancies and contradictions. “It could be that a lot of technicians have discovered the same solution to a common problem,” such as replacing a copier’s drum, Kaplan explains. “You get a bunch of entries saying the same thing, only in different ways.” Finding and pruning out such redundancy automatically, he adds, can help technicians spend less time sorting through options. The software could also eventually become the core of an advanced system for translating documents into different languages-a task particularly plagued by ambiguity (see “The Translation Challenge”).
Before a computer can understand or translate stored information expressed in natural language, however, it has to find it. That’s getting more difficult as the digital universe expands-which is why IBM is pursuing an ambitious project to employ natural-language processing in the management of “unstructured information,” the mass of digital text, images, video, and audio stored on computer networks. Much of IBM’s business rests on its database product, DB2, but a traditional database can only retrieve information that has already been organized and indexed. IBM wants to give business users and consumers immediate access to the unindexed data languishing on millions of hard drives around the world, effectively extending its dominance in structured-data management into the realm of unstructured information. To get there, the company is pursuing an initiative designed to merge different language-processing approaches into powerful software that can intelligently search, organize, and translate all this data. The project, called the Unstructured Information Management Architecture, could fuel the company’s business well into the Internet age. “As research bets go, this is a big one,” says Alfred Spector, the division’s senior vice president.
Translation software and other products that use the new architecture are still in the prototype stage. But ultimately, says David Ferrucci, the project’s lead software architect, the architecture will help IBM build systems that pluck the latest information a user wants from any digital source, in any language, and deliver it in organized form. Already, U.S. companies spend $900 million a year on “enterprise information portals” that help employees find the records they need, according to Giga Information Group in Cambridge, MA, and the opportunities for IBM and other companies developing software for managing unstructured information will only multiply as that information accumulates. “There is now clearly a business rationale to deal with unstructured data,” concludes Spector.
If efforts to cope with ambiguity, unstructured information, and other complexities of language succeed, we might ultimately stop treating computers like toddlers, simplifying everything we say to fit their immature understanding of the world. When that day comes, and it could come soon, consumers can expect to find automated voice interfaces at every turn, allowing them to use plain English (or French or Chinese) to interact with everything from Web archives to appliances and automobiles.
And that would really be something to talk about.
Language Processing’s Babel COMPANY TECHNOLOGY LOCATION AT&T Automated speech recognition; natural-sounding speech synthesis
New York, NY Banter Automated e-mail classification and response San Francisco, CA, and Jerusalem, Israel IBM Automated speech recognition;
translation; standard architectures for managing unstructured information Armonk, NY Intel Audiovisual speech recognition Santa Clara, CA Inxight Software for discovering, exploring, and categorizing text data on corporate networks Sunnyvale, CA iPhrase Technologies Natural-language text searching of corporate Web sites Cambridge, MA Microsoft Grammar checking; query interfaces; translation Redmond, WA Nuance Communications Interactive voice response systems for telephone-based customer service Menlo Park, CA Palo Alto Research Center Improved algorithms for extracting meaning from written text Palo Alto, CA SpeechWorks Interactive voice response systems for telephone-based customer service Boston, MA StreamSage Natural-language search and indexing of video and audio material Washington, DC