If computers can be taught to find meaning in the thousands of research papers published each year, then perhaps they could automatically highlight important new trends or discoveries, and draw conclusions from them.
The Allen Institute for Artificial Intelligence is working toward this very goal, and has developed a new tool called Semantic Scholar that can search through millions of computer science papers. The tool, launched today, features ways of refining searches based on information extracted from the text of papers.
It is, for instance, possible to narrow a search according to the journal in which a paper was published, or the conference at which it was presented, or by the data set used. Semantic scholar will also show key phrases in a paper.
Many academic search engines already exist, among them Google Scholar, Microsoft Academic Search, PubMed, and JSTOR. But these typically only search through papers using keywords and other information that is clearly categorized, such as the publication date.
Oren Etzioni, executive director of the Allen Institute, says a lot of pertinent information found in research papers is presented in different ways. The software behind Semantic Scholar was trained to extract different concepts using a variety of machine-learning techniques. “With millions of papers appearing every year, you just can’t keep up with them,” Etzioni says. “So you need some level of understanding.”
There is a growing interest in using machine learning to train computers to recognize certain concepts in data. Google is building a so-called “knowledge graph” of concepts by training algorithms to crawl the Web and extract useful information. This is why, when you search for “How old is Barack Obama,” Google will not only serve up Web pages that may contain the information, but also tell you directly that that he is 54 years old.
Other companies are trying to do something similar with academic papers. A company called Meta also announced today a service that will automatically identify the people and entities mentioned in medical literature. Meta is using technology developed by SRI, through a project called FUSE, to forecast scientific trends using machine learning.
Meta’s CEO, Sam Molyneux, says the service, which goes live later this week, can recommend papers to a user based on the concepts within a paper they have previously read, and can even identify emerging technologies automatically. “Essentially, it allows you to track at the concept level, or the technology level, rather than the article level,” Molyneux says. “Concepts like the CRISPR technology, which is really revolutionizing how genome engineering is happening right now—we picked that up as an emerging concept a number of years ago.”
Etzioni says the goal for Semantic Scholar is to go further by giving computers a much deeper understanding of new scientific publications. His team is developing algorithms that will read graphs or charts in papers and try to extract the values presented therein. “We want ultimately to be able to take an experimental paper and say, ‘Okay, do I have to read this paper, or can the computer tell me that this paper showed that this particular drug was highly efficacious?’”