The news: Today researchers collaborating across several organizations released the Covid-19 Open Research Dataset (CORD-19), which includes over 24,000 research papers from peer-reviewed journals as well as sources like bioRxiv and medRxiv (websites where scientists can post non-peer-reviewed preprint papers). The research covers SARS-CoV-2 (the scientific name for the coronavirus), Covid-19 (the scientific name for the disease), and the coronavirus group. It represents the most extensive collection of scientific literature related to the ongoing pandemic and will continue to update in real time as more research is released.
How it came together: The database was compiled under the request of the White House Office of Science and Technology Policy (OSTP) through a collaboration between three organizations. The National Library of Medicine (NLM) at the National Institutes of Health provided access to existing scientific publications; Microsoft used its literature curation algorithms to find relevant articles; and research nonprofit the Allen Institute for Artificial Intelligence (AI2) converted them from web pages and PDFs into a structured format that can be processed by algorithms. The database is now available on AI2’s Semantic Scholar website.
More on coronavirus
Our most essential coverage of covid-19 is free, including:
Newsletter: Coronavirus Tech Report
Zoom show: Radio Corona
What has already been done: As part of its Semantic Scholar service, which allows the scientific community to easily search through academic literature, AI2 has already processed the new corpus using the same information extraction and analysis techniques that it applies to all new research. It’s surfacing key pieces of information such as authors, methods, data, and citations to make it easier for scientists to quickly evaluate how each paper adds to the existing research.
It’s also using state-of-the-art natural-language models like ELMo and BERT to map out the similarities between papers. This map is now powering a new feature on Semantic Scholar that allows researchers to create a personalized research feed based on their interests.
Why it matters: Scientists are rushing against the clock to answer pressing questions about the nature of the virus in hopes of stemming its spread. The database not only helps them consolidate existing research in one place but also makes the body of literature easier to mine for insights with natural-language processing algorithms. The OSTP has launched an open call for AI researchers to develop new techniques for text and data mining that will help the medical community comb through the mass of information faster.
A horrifying new AI app swaps women into porn videos with a click
Deepfake researchers have long feared the day this would arrive.
The therapists using AI to make therapy better
Researchers are learning more about how therapy works by examining the language therapists use with clients. It could lead to more people getting better, and staying better.
DeepMind says its new language model can beat others 25 times its size
RETRO uses an external memory to look up passages of text on the fly, avoiding some of the costs of training a vast neural network
2021 was the year of monster AI models
GPT-3, OpenAI’s program to mimic human language, kicked off a new trend in artificial intelligence for bigger and bigger models. How large will they get, and at what cost?
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.