Skip to Content

Academic Search Engine Grasps for Meaning

A new tool for analyzing academic papers uses cutting-edge AI to find meaning in billions of words.
November 2, 2015

If computers can be taught to find meaning in the thousands of research papers published each year, then perhaps they could automatically highlight important new trends or discoveries, and draw conclusions from them. 

The Allen Institute for Artificial Intelligence is working toward this very goal, and has developed a new tool called Semantic Scholar that can search through millions of computer science papers. The tool, launched today, features ways of refining searches based on information extracted from the text of papers. 

It is, for instance, possible to narrow a search according to the journal in which a paper was published, or the conference at which it was presented, or by the data set used. Semantic scholar will also show key phrases in a paper.

Many academic search engines already exist, among them Google Scholar, Microsoft Academic Search, PubMed, and JSTOR. But these typically only search through papers using keywords and other information that is clearly categorized, such as the publication date.

Oren Etzioni, executive director of the Allen Institute, says a lot of pertinent information found in research papers is presented in different ways. The software behind Semantic Scholar was trained to extract different concepts using a variety of machine-learning techniques. “With millions of papers appearing every year, you just can’t keep up with them,” Etzioni says. “So you need some level of understanding.” 

There is a growing interest in using machine learning to train computers to recognize certain concepts in data. Google is building a so-called “knowledge graph” of concepts by training algorithms to crawl the Web and extract useful information. This is why, when you search for “How old is Barack Obama,” Google will not only serve up Web pages that may contain the information, but also tell you directly that that he is 54 years old.

Other companies are trying to do something similar with academic papers. A company called Meta also announced today a service that will automatically identify the people and entities mentioned in medical literature. Meta is using technology developed by SRI, through a project called FUSE, to forecast scientific trends using machine learning.

Meta’s CEO, Sam Molyneux, says the service, which goes live later this week, can recommend papers to a user based on the concepts within a paper they have previously read, and can even identify emerging technologies automatically. “Essentially, it allows you to track at the concept level, or the technology level, rather than the article level,” Molyneux says. “Concepts like the CRISPR technology, which is really revolutionizing how genome engineering is happening right now—we picked that up as an emerging concept a number of years ago.”

Etzioni says the goal for Semantic Scholar is to go further by giving computers a much deeper understanding of new scientific publications. His team is developing algorithms that will read graphs or charts in papers and try to extract the values presented therein. “We want ultimately to be able to take an experimental paper and say, ‘Okay, do I have to read this paper, or can the computer tell me that this paper showed that this particular drug was highly efficacious?’”

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.