Science in a nutshell

A neural network developed by physicists can summarize scientific papers in simple terms.

David L. Chandlerarchive page

June 26, 2019

A conceptual illustration showing a computer, papers and AIMinji Moon

A team of scientists at MIT and elsewhere has developed a neural network that can read scientific papers and render a brief plain-English summary. Such a system could help editors, writers, and scientists scan a large number of papers to get a preliminary sense of what they’re about. And the approach could also be used in machine translation and speech recognition.

Physics professor Marin Soljačić, grad students Rumen Dangovski and Li Jing, and colleagues had been developing neural networks to tackle thorny problems in physics when they realized that they could apply some of their physics knowledge to improve certain AI algorithms.

Neural networks mimic one way humans learn: the computer examines many different examples and identifies the key underlying patterns. While widely used for pattern recognition, such systems often have difficulty correlating information from a long string of data, such as a research paper. Other techniques used to improve this capability—including one called long short-term memory (LSTM)—can’t handle natural-language processing tasks that require really long-term memory.

While neural networks are typically based on the multiplication of matrices, Soljačić’s team developed one based on vectors rotating in a multidimensional space. It uses what they call a rotational unit of memory (RUM), which they came up with to help with certain tough physics problems such as the behavior of light in complex engineered materials. They then adapted it to natural-language processing to help with memorization and recall.

Essentially, each word in the text is represented by a vector. Each subsequent word swings this vector in some direction, represented in a theoretical space that can ultimately have thousands of dimensions. At the end of the process, the final vector or set of vectors is translated back into its corresponding string of words.

When the team fed the same press release about a research paper through a conventional LSTM-based neural network and through the RUM-based system, the LSTM system yielded this repetitive and fairly technical summary: “Baylisascariasis,” kills mice, has endangered the allegheny woodrat and has caused disease like blindness or severe consequences. This infection, termed “baylisascariasis,” kills mice, has endangered the allegheny woodrat and has caused disease like blindness or severe consequences. This infection, termed “baylisascariasis,” kills mice, has endangered the allegheny woodrat.

The RUM system produced a much more readable summary: Urban raccoons may infect people more than previously assumed. 7 percent of surveyed individuals tested positive for raccoon roundworm antibodies. Over 90 percent of raccoons in Santa Barbara play host to this parasite.

The researchers have since expanded the system so it can summarize entire papers, not just press releases.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.