An AI helps you summarize the latest in AI

Semantic Scholar, a scientific literature search engine, is using recent breakthroughs in natural-language processing to give researchers the tl;dr on papers.

Karen Haoarchive page

November 18, 2020

Ms Tech | howadesign/Noun Project

The news: A new AI model for summarizing scientific literature can now assist researchers in wading through and identifying the latest cutting-edge papers they want to read. On November 16, the Allen Institute for Artificial Intelligence (AI2) rolled out the model onto its flagship product, Semantic Scholar, an AI-powered scientific paper search engine. It provides a one-sentence tl;dr (too long; didn’t read) summary under every computer science paper (for now) when users use the search function or go to an author’s page. The work was also accepted to the Empirical Methods for Natural Language Processing conference this week.

A screenshot of the TLDR feature in Semantic Scholar. — A screenshot of the tl;dr feature in Semantic Scholar.

The context: In an era of information overload, using AI to summarize text has been a popular natural-language processing (NLP) problem. There are two general approaches to this task. One is called “extractive,” which seeks to find a sentence or set of sentences from the text verbatim that captures its essence. The other is called “abstractive,” which involves generating new sentences. While extractive techniques used to be more popular due to the limitations of NLP systems, advances in natural language generation in recent years have made the abstractive one a whole lot better.

How they did it: AI2’s abstractive model uses what’s known as a transformer—a type of neural network architecture first invented in 2017 that has since powered all of the major leaps in NLP, including OpenAI’s GPT-3. The researchers first trained the transformer on a generic corpus of text to establish its baseline familiarity with the English language. This process is known as “pre-training” and is part of what makes transformers so powerful. They then fine-tuned the model—in other words, trained it further—on the specific task of summarization.

The fine-tuning data: The researchers first created a dataset called SciTldr, which contains roughly 5,400 pairs of scientific papers and corresponding single-sentence summaries. To find these high-quality summaries, they first went hunting for them on OpenReview, a public conference paper submission platform where researchers will often post their own one-sentence synopsis of their paper. This provided a couple thousand pairs. The researchers then hired annotators to summarize more papers by reading and further condensing the synopses that had already been written by peer reviewers.

To supplement these 5,400 pairs even further, the researchers compiled a second dataset of 20,000 pairs of scientific papers and their titles. The researchers intuited that because titles themselves are a form of summary, they would further help the model improve its results. This was confirmed through experimentation.

Semantic Scholar's TLDR feature on mobile. — The tl;dr feature is particularly useful for skimming papers on mobile.

Extreme summarization: While many other research efforts have tackled the task of summarization, this one stands out for the level of compression it can achieve. The scientific papers included in the SciTldr dataset average 5,000 words. Their one-sentence summaries average 21. This means each paper is compressed on average to 238 times its size. The next best abstractive method is trained to compress scientific papers by an average of only 36.5 times. During testing, human reviewers also judged the model’s summaries to be more informative and accurate than previous methods.

Next steps: There are already a number of ways that AI2 is now working to improve their model in the short term, says Daniel Weld, a professor at the University of Washington and manager of the Semantic Scholar research group. For one, they plan to train the model to handle more than just computer science papers. For another, perhaps in part due to the training process, they’ve found that the tl;dr summaries sometimes overlap too much with the paper title, diminishing their overall utility. They plan to update the model’s training process to penalize such overlap so it learns to avoid repetition over time.

In the long-term, the team will also work summarizing multiple documents at a time, which could be useful for researchers entering a new field or perhaps even for policymakers wanting to get quickly up to speed. “What we're really excited to do is create personalized research briefings,” Weld says, “where we can summarize not just one paper, but a set of six recent advances in a particular sub-area.”

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.