The news: Today researchers collaborating across several organizations released the Covid-19 Open Research Dataset (CORD-19), which includes over 24,000 research papers from peer-reviewed journals as well as sources like bioRxiv and medRxiv (websites where scientists can post non-peer-reviewed preprint papers). The research covers SARS-CoV-2 (the scientific name for the coronavirus), Covid-19 (the scientific name for the disease), and the coronavirus group. It represents the most extensive collection of scientific literature related to the ongoing pandemic and will continue to update in real time as more research is released.
How it came together: The database was compiled under the request of the White House Office of Science and Technology Policy (OSTP) through a collaboration between three organizations. The National Library of Medicine (NLM) at the National Institutes of Health provided access to existing scientific publications; Microsoft used its literature curation algorithms to find relevant articles; and research nonprofit the Allen Institute for Artificial Intelligence (AI2) converted them from web pages and PDFs into a structured format that can be processed by algorithms. The database is now available on AI2’s Semantic Scholar website.
More on coronavirus
Our most essential coverage of covid-19 is free, including:
Newsletter: Coronavirus Tech Report
Zoom show: Radio Corona
What has already been done: As part of its Semantic Scholar service, which allows the scientific community to easily search through academic literature, AI2 has already processed the new corpus using the same information extraction and analysis techniques that it applies to all new research. It’s surfacing key pieces of information such as authors, methods, data, and citations to make it easier for scientists to quickly evaluate how each paper adds to the existing research.
It’s also using state-of-the-art natural-language models like ELMo and BERT to map out the similarities between papers. This map is now powering a new feature on Semantic Scholar that allows researchers to create a personalized research feed based on their interests.
Why it matters: Scientists are rushing against the clock to answer pressing questions about the nature of the virus in hopes of stemming its spread. The database not only helps them consolidate existing research in one place but also makes the body of literature easier to mine for insights with natural-language processing algorithms. The OSTP has launched an open call for AI researchers to develop new techniques for text and data mining that will help the medical community comb through the mass of information faster.
Artificial intelligence is creating a new colonial world order
An MIT Technology Review series investigates how AI is enriching a powerful few by dispossessing communities that have been dispossessed before.
Meta has built a massive new language AI—and it’s giving it away for free
Facebook’s parent company is inviting researchers to pore over and pick apart the flaws in its version of GPT-3
This horse-riding astronaut is a milestone in AI’s journey to make sense of the world
OpenAI’s latest picture-making AI is amazing—but raises questions about what we mean by intelligence.
How the AI industry profits from catastrophe
As the demand for data labeling exploded, an economic catastrophe turned Venezuela into ground zero for a new model of labor exploitation.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.