A new paper shows how natural-language processing can accelerate scientific discovery.
The context: Natural-language processing has seen major advancements in recent years, thanks to the development of unsupervised machine-learning techniques that are really good at capturing the relationships between words. They count how often and how closely words are used in relation to one another, and map those relationships in a three-dimensional vector space. The patterns can then be used to predict basic analogies like “man is to king as woman is to queen,” or to construct sentences and power things like autocomplete and other predictive text systems.
New application: A group of researchers have now used this technique to munch through 3.3 million scientific abstracts published between 1922 and 2018 in journals that would likely contain materials science research. The resulting word relationships captured fundamental knowledge within the field, including the structure of the periodic table and the way chemicals’ structures relate to their properties. The paper was published in Nature last week.
Because of the technique’s ability to compute analogies, it also found a number of chemical compounds that demonstrate properties similar to those of thermoelectric materials but have not been studied as such before. The researchers believe this could be a new way to mine existing scientific literature for previously unconsidered correlations and accelerate the advancement of research in a field.
Related work: This isn’t the first time such techniques have discovered fascinating, sometimes surprising relationships in a vast amount of text. In 2017, for example, a paper published in Science found that the same technique used to process a giant corpus of text from the internet successfully reproduced historical human biases against race and gender, and even computed the ratio of men to women in different professions. These papers show how much rich information about our world is implicit in human language. Machine learning is now giving us the tools to unlock that knowledge.
To have more stories like this delivered directly to your inbox, sign up for our Webby-nominated AI newsletter The Algorithm. It's free.
This artist is dominating AI-generated art. And he’s not happy about it.
Greg Rutkowski is a more popular prompt than Picasso.
What does GPT-3 “know” about me?
Large language models are trained on troves of personal data hoovered from the internet. So I wanted to know: What does it have on me?
DeepMind has predicted the structure of almost every protein known to science
And it’s giving the data away for free, which could spur new scientific discoveries.
An AI that can design new proteins could help unlock new cures and materials
The machine-learning tool could help researchers discover entirely new proteins not yet known to science.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.