Talking to a machine over the phone or through a chat window can be an infuriating experience. However, several research groups, including some at large technology companies like Facebook and Google, are making steady progress toward improving computers’ language skills by building upon recent advances in machine learning.
The latest advance in this area comes from a startup called MetaMind, which has published details of a system that is more accurate than other techniques at answering questions about several lines of text that tell a story. MetaMind is developing technology designed to be capable of a range of different artificial-intelligence tasks and hopes to sell it to other companies. The startup was founded by Richard Socher, a prominent machine-learning expert who earned a PhD at Stanford.
MetaMind’s approach combines two forms of memory with an advanced neural network fed large quantities of annotated text. The first is a kind of database of concepts and facts; the other is short-term, or “episodic.” When asked a question, the system, which the company calls a dynamic memory network, will search for relevant patterns in the text that it has learned from; after finding associations, it will use its episodic memory to return to the question and look for further, more abstract patterns. This enables it to answer questions that require connecting several pieces of information.
A paper related to the work, posted online today, gives the following example:
Statements fed into the system:
Jane went to the hallway.
Mary walked to the bathroom.
Sandra went to the garden.
Daniel went back to the garden.
Sandra took the milk there.
Where is the milk?
Because Socher’s system was trained using data sets that covered sentiment and structure, the system can answer questions about the sentiment, or emotional tone, of text, as well as basic questions about its structure. “The cool thing is that it learns that from example,” Socher says. “It wires the episodic memory for itself.”
MetaMind tested its system using a data set released by Facebook for measuring machine performance at question-and-answer tasks. The startup’s software outperformed Facebook’s own algorithms by a narrow margin.
Making computers better at understanding everyday language could have significant implications for companies such as Facebook. It could provide a much easier way for users to find or filter information, allowing them to enter requests written as normal sentences. It could also enable Facebook to glean meaning from the information its users post on their pages and those of their friends. This could offer a powerful way to recommend information, or to place ads alongside content more thoughtfully.
The work is a sign of ongoing progress toward giving machines better language skills. Much of this work now revolves around an approach known as deep learning, which involves feeding vast amounts of data into a system that performs a series of calculations to help identify abstract features in, say, an image or an audio file.
“One thing that’s promising here is that the architecture clearly separates modules for ‘episodic’ and ‘semantic’ memory,” says Noah Smith, an associate professor at Carnegie Mellon University who studies natural language processing. “That’s been a shortcoming of many neural-network-based architectures, and in my view it’s great to see a step in the direction of models that can be inspected to allow engineering of further improvements.”
Yoshua Bengio, a professor at the University of Montreal and a leading figure in the field of deep learning, describes Socher’s system as “a novel variant” of the methods championed at Facebook and Google. Bengio adds that the work is one of many recent advances that are experiment but promising. “The potential for this type of research is very important for language understanding and natural language interfaces, which is crucial and of very high value for many companies,” he says.
Others are less impressed. Robert Berwick, a professor of computational linguistics and computer science at MIT, says Socher’s method uses established techniques and offers only incremental progress. He adds that it achieves little similarity with the way episodic memory works in the human brain, and ignores important progress that has been made in linguistics.
Socher believes the biggest significance of his work is that it makes progress toward more generalizable machine intelligence. “This idea of adding memory components is something that’s in the air right now,” he says. “A lot of people are building different kinds of models, but our goal is to try to find one model that can perform lots of different tasks.”