Nokia Phones Go to Natural Language Class

Nokia and MIT researchers are teaching cell phones to take commands in natural language.

Katherine Bourzacarchive page

April 27, 2006

As part of a research collaboration with MIT computer scientists, the Nokia Research Center Cambridge, in Cambridge MA, is developing cell phones that can understand and respond to written commands typed in English.

Using the MobileStart system on the phone on the right, you can remind your mother to take her medication. The phone on the left shows a new calendar event created in “mom’s” phone by MobileStart. (Courtesy of Boris Katz and Federico Mora, MIT.)

Robert Iannucci, head of Nokia’s research centers, says the company wants to transform phones from simple calling terminals to “information gateways” – to the Internet, GPS and sensors, MP3s, desktop computers, iPods, and other devices. And, he says, that requires rethinking the entire interface between people and handhelds. For both Nokia and MIT, that means using text interaction.

“Humans are good with language,” says Boris Katz, lead research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory, the principle group working with Nokia. “We want language to be a first-rate citizen” on cell phones, he says.

Natural language navigation systems have been long on promise and short on delivery. But it’s no longer unrealistic to think these systems may be in the hands – and handsets – of consumers in the near future. One caveat: the complex underpinnings of these new applications and the algorithms that parse language will have to be hidden from cell-phone users – lest they get frustrated navigating through layers of menus.

To power Nokia’s natural language technology, MIT’s Katz is using a software system he developed in 1993 called Start, which interprets human questions and finds answers using websites such as the Internet Movie Database (IMDB) and Mapquest. Using the Web version of Start as a base, Katz is currently working with the Nokia center to develop a mobile version of the software for cell phones, called MobileStart.

Here’s how the Web version of Start works: users type a question into a text field. The software interprets the query, decides where to seek the answer (in its database or on another website), and responds with a written explanation, a link to a website, or an image.

[click here for examples of text processing on a cell phone.]

“Start extracts answers, not hits,” says Katz, because it interprets human language, rather than looking for keywords, like Google and other search engines.

The Start system understands English sentences by breaking them down into a series of relationships between object, property, and value. For instance, if one types, “What is the population of Iraq?”, Start interprets the query: the object is Iraq, the property is population, and the value is what Start seeks.

That’s straightforward; however, people tend to ask more complex questions, particularly if they’re looking for specific information. If a person asks a question such as, “How many people live in the capital of the third-largest country in Asia?” Start will break it down into three separate queries to process one at a time: What is the third-largest country in Asia, what is that country’s capital, and what is the population of the capital? (Start decides how to break up questions and how to prioritize its evaluations using an algorithm Katz designed.)

The most obvious change this kind of interaction would make for users is a simplifying of the complex menu structures that have evolved as cell phones handle more tasks. By using a natural-language navigation system, users can perform functions without digging through layers of menus or sifting through dozens of Google hits on a tiny screen. The mobile version of Start can also glean information from a phone’s GPS device and the Web, or interact with and send commands to applications on the phone, such as an address book and calendar.

For instance, if the user is lost, he or she might simply ask the phone, “Where am I?” – and a map of his or her current location would appear on the screen. Or they could even ask, “How do I get to Brad’s house from here?” and the phone would locate that address in a contacts list, determine your current location using GPS, go to Mapquest, and pull up online directions.

The language commands will also enable people to have their various technologies communicate with other’s devices, removing entirely the need to send a dizzying array of text messages, e-mails, and voice mails to others. For instance, an individual would have the ability to tell the phone, “Remind my mother to take her medicine at three tomorrow,” and Nokia’s application would set up an alarm in the mother’s phone calendar, if she has a MobileStart phone.

Of course, with more complex actions – such as interacting with other devices – Start and other natural-language navigation software systems begin to bog down. In order to perform some kinds of applications that cell-phone users will want, MobileStart will have to learn “the way the user sees the world,” says Katz. For example, if someone tells the phone to “call Joe,” but there is more than one Joe in the address book, Katz says MobileStart will need to use past behavior to infer which Joe is meant, a task that takes more complex algorithms that are still being developed. “We want to make sure the phone understands what the user is saying without burdening them with clarification questions.”

MobileStart should also be able to deal with more complicated preference issues. For instance, if a user tells the phone, “Remind my mother to take her medicine at three tomorrow,” it needs to determine how to best deliver that message. If the phone knows – through past experience or a set command – that both people prefer to communicate by text messages, it could send a text message.

What’s more, Katz said there may also be a larger challenge on the horizon – one that’s distinct to digital, mobile culture: text-messaging shorthand. The Web-based version of Start can currently detect and prompt the user to correct spelling errors – but if someone types “how do I get 2 Boston from Cambridge,” – a common shorthand with mobile devices – the phone will send this misreading: “Unfortunately, I don’t know how you get Boston from Cambridge.”

Currently, the Start system knows only English (although it can access Google’s language tools to translate phrases). Its parsing system – what it uses to divide a query into object, property, and value – could be used with any human language, but Katz will need to teach it a new vocabulary and syntax for each new language.

Nokia is trying to make its mobile devices more user-friendly and reduce the interface problems that keep devices from being as practical as they can be. With this attempt to make natural languages and technology compatible, Nokia is entering the Web 2.0 movement – just as Google recently did with its calendar application [see “Google’s Time Keeper”].

Ideally, says Katz, MobileStart will be combined with voice-to-text software to make using cell phones even easier. Indeed, Nokia’s Iannucci points to an irony: “cell phones are inherently voice devices, but they don’t use voice as a modality.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.