Generally, computers are useless at holding a conversation. They just take things a bit too literally. But Google is teaching computers how to make sense of the vagaries of human speech and text.
Starting today, Google is opening up those algorithms to outside software developers. The tools released will help programmers build language-based apps and services that are less prone to annoying misunderstandings than many of today’s chatbots. And they should help get developers hooked on the powerful machine-learning techniques Google is honing.
Google’s own mastery of grammar and syntax helps the company deliver more accurate search results, and it will be increasingly important as more of its devices and services come to depend on voice control.
Smartphones based on Google's software can, of course, already be voice controlled, and the company is widely thought to be developing home devices, similar to Amazon’s Echo, that depend more heavily on voice interaction. So releasing a tool that makes language understanding more accessible makes a lot of strategic sense.
“Most of our users interact with us through language,” says Fernando Pereira, who leads the company’s efforts in natural-language understanding and machine learning. “They ask queries, typed or spoken. And so for us to serve the user well, we have to make our systems understand what users want.”
One of the tools released today, called SyntaxNet, can learn to understand the meaning of words and phrases given their context and common usage. This works with the deep-learning framework previously released by Google, called TensorFlow. And it is the most complex and sophisticated component built using TensorFlow to date.
Google has also released a pre-trained parser for English, called Parsey McParseface (a spokesperson says the company was having trouble coming up with a name when someone suggested this catchy moniker). Text fed into the parser will automatically be broken into syntactic components such as nouns, verbs, subjects, and objects. This makes it easier for a computer to parse ambiguous queries or commands correctly.
Google usually relies on data and machine learning—and indeed some other approaches, such as Facebook’s, are trying to train computers to parse language by feeding them large quantities of largely unlabeled data (see “Teaching Machines to Understand Us”). But Google’s language-understanding project, described in a paper online, is instead built around human expertise. For more than eight years, professional linguists have been working on annotating text for the company. And recent progress has been made by feeding those annotations into a large deep-learning neural network.
Understanding language is incredibly difficult for computers because language is often ambiguous. A search query as simple as “Find me cats in hats” may be interpreted as a request for either cats wearing hats or cats sitting in hats. While humans use general knowledge to disambiguate such sentences, Google’s technology uses machine learning. Its deep-learning system, trained with syntactic text, makes a judgment about the most likely correct structure of a statement. In the case of cats in hats, it presumes the searcher is interested in fashion-forward felines.
Dave Orr, the product manager at Google responsible for finding commercial applications for the company’s research on language understanding, demonstrated the technology for me. He fed several articles from MIT Technology Review into an internal version of the language parser. It made a couple of trivial errors—for example, confusing the word “will” at the start of a sentence with my first name—but generally seemed to annotate sentences with impressive accuracy, identifying syntactic structures that correctly captured the meaning of the headline or lead. “It’s the best parser anyone has created,” Orr says. “We think it’s close to human level.”
Internally, Google combines its natural-language system with a database of semantic information called the Knowledge Graph. This allows it to recognize particular objects, people, places, and other concepts and respond accordingly. The system is also often able to correctly classify new words by comparing them with other words that appear in a similar context. So far, the technology works for 15 languages. Some languages are more challenging to parse linguistically, making training more difficult, Orr says.
The technology is, however, far from capable of understanding English perfectly. “Our systems work best on well-structured, well-edited text,” Pereira says. “The irregularity of social-media and search queries is more challenging. We’ve made progress there, but there’s a lot of headroom.”
There are also still many ambiguities that require a human level of common sense—“things that we learn from experience, and from instruction from our peers and our parents,” Pereira says. “That kind of very rich ability to solve problems is where our systems are completely lost.”
Noah Goodman, a professor at Stanford University who researches language understanding, says improved syntactic understanding is just the beginning of what computers need in order to master language. “Syntax is certainly an important part of language,” he says. “But it’s a big step from that to semantics and from shallow semantics to inferred meaning.”