A new paper shows how natural-language processing can accelerate scientific discovery.
The context: Natural-language processing has seen major advancements in recent years, thanks to the development of unsupervised machine-learning techniques that are really good at capturing the relationships between words. They count how often and how closely words are used in relation to one another, and map those relationships in a three-dimensional vector space. The patterns can then be used to predict basic analogies like “man is to king as woman is to queen,” or to construct sentences and power things like autocomplete and other predictive text systems.
New application: A group of researchers have now used this technique to munch through 3.3 million scientific abstracts published between 1922 and 2018 in journals that would likely contain materials science research. The resulting word relationships captured fundamental knowledge within the field, including the structure of the periodic table and the way chemicals’ structures relate to their properties. The paper was published in Nature last week.
Because of the technique’s ability to compute analogies, it also found a number of chemical compounds that demonstrate properties similar to those of thermoelectric materials but have not been studied as such before. The researchers believe this could be a new way to mine existing scientific literature for previously unconsidered correlations and accelerate the advancement of research in a field.
Related work: This isn’t the first time such techniques have discovered fascinating, sometimes surprising relationships in a vast amount of text. In 2017, for example, a paper published in Science found that the same technique used to process a giant corpus of text from the internet successfully reproduced historical human biases against race and gender, and even computed the ratio of men to women in different professions. These papers show how much rich information about our world is implicit in human language. Machine learning is now giving us the tools to unlock that knowledge.
To have more stories like this delivered directly to your inbox, sign up for our Webby-nominated AI newsletter The Algorithm. It's free.
This new data poisoning tool lets artists fight back against generative AI
The tool, called Nightshade, messes up training data in ways that could cause serious damage to image-generating AI models.
Rogue superintelligence and merging with machines: Inside the mind of OpenAI’s chief scientist
An exclusive conversation with Ilya Sutskever on his fears for the future of AI and why they’ve made him change the focus of his life’s work.
Driving companywide efficiencies with AI
Advanced AI and ML capabilities revolutionize how administrative and operations tasks are done.
Generative AI deployment: Strategies for smooth scaling
Our global poll examines key decision points for putting AI to use in the enterprise.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.