Over 24,000 coronavirus research papers are now available in one place

The news: Today researchers collaborating across several organizations released the Covid-19 Open Research Dataset (CORD-19), which includes over 24,000 research papers from peer-reviewed journals as well as sources like bioRxiv and medRxiv (websites where scientists can post non-peer-reviewed preprint papers). The research covers SARS-CoV-2 (the scientific name for the coronavirus), Covid-19 (the scientific name for the disease), and the coronavirus group. It represents the most extensive collection of scientific literature related to the ongoing pandemic and will continue to update in real time as more research is released.
How it came together: The database was compiled under the request of the White House Office of Science and Technology Policy (OSTP) through a collaboration between three organizations. The National Library of Medicine (NLM) at the National Institutes of Health provided access to existing scientific publications; Microsoft used its literature curation algorithms to find relevant articles; and research nonprofit the Allen Institute for Artificial Intelligence (AI2) converted them from web pages and PDFs into a structured format that can be processed by algorithms. The database is now available on AI2’s Semantic Scholar website.
More on coronavirus
Our most essential coverage of covid-19 is free, including:
How does the coronavirus work?
What are the potential treatments?
What's the right way to do social distancing?
Other frequently asked questions about coronavirus
---
Newsletter: Coronavirus Tech Report
Zoom show: Radio Corona
See also:
Please click here to subscribe and support our non-profit journalism.
What has already been done: As part of its Semantic Scholar service, which allows the scientific community to easily search through academic literature, AI2 has already processed the new corpus using the same information extraction and analysis techniques that it applies to all new research. It’s surfacing key pieces of information such as authors, methods, data, and citations to make it easier for scientists to quickly evaluate how each paper adds to the existing research.
It’s also using state-of-the-art natural-language models like ELMo and BERT to map out the similarities between papers. This map is now powering a new feature on Semantic Scholar that allows researchers to create a personalized research feed based on their interests.
Why it matters: Scientists are rushing against the clock to answer pressing questions about the nature of the virus in hopes of stemming its spread. The database not only helps them consolidate existing research in one place but also makes the body of literature easier to mine for insights with natural-language processing algorithms. The OSTP has launched an open call for AI researchers to develop new techniques for text and data mining that will help the medical community comb through the mass of information faster.
Deep Dive
Artificial intelligence
DeepMind’s cofounder: Generative AI is just a phase. What’s next is interactive AI.
“This is a profound moment in the history of technology,” says Mustafa Suleyman.
Deepfakes of Chinese influencers are livestreaming 24/7
With just a few minutes of sample video and $1,000, brands never have to stop selling their products.
AI hype is built on high test scores. Those tests are flawed.
With hopes and fears about the technology running wild, it's time to agree on what it can and can't do.
You need to talk to your kid about AI. Here are 6 things you should say.
As children start back at school this week, it’s not just ChatGPT you need to be thinking about.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.