A new paper shows how natural-language processing can accelerate scientific discovery.
The context: Natural-language processing has seen major advancements in recent years, thanks to the development of unsupervised machine-learning techniques that are really good at capturing the relationships between words. They count how often and how closely words are used in relation to one another, and map those relationships in a three-dimensional vector space. The patterns can then be used to predict basic analogies like “man is to king as woman is to queen,” or to construct sentences and power things like autocomplete and other predictive text systems.
New application: A group of researchers have now used this technique to munch through 3.3 million scientific abstracts published between 1922 and 2018 in journals that would likely contain materials science research. The resulting word relationships captured fundamental knowledge within the field, including the structure of the periodic table and the way chemicals’ structures relate to their properties. The paper was published in Nature last week.
Because of the technique’s ability to compute analogies, it also found a number of chemical compounds that demonstrate properties similar to those of thermoelectric materials but have not been studied as such before. The researchers believe this could be a new way to mine existing scientific literature for previously unconsidered correlations and accelerate the advancement of research in a field.
Related work: This isn’t the first time such techniques have discovered fascinating, sometimes surprising relationships in a vast amount of text. In 2017, for example, a paper published in Science found that the same technique used to process a giant corpus of text from the internet successfully reproduced historical human biases against race and gender, and even computed the ratio of men to women in different professions. These papers show how much rich information about our world is implicit in human language. Machine learning is now giving us the tools to unlock that knowledge.
To have more stories like this delivered directly to your inbox, sign up for our Webby-nominated AI newsletter The Algorithm. It's free.
A Roomba recorded a woman on the toilet. How did screenshots end up on Facebook?
Robot vacuum companies say your images are safe, but a sprawling global supply chain for data from our devices creates risk.
The viral AI avatar app Lensa undressed me—without my consent
My avatars were cartoonishly pornified, while my male colleagues got to be astronauts, explorers, and inventors.
Roomba testers feel misled after intimate images ended up on Facebook
An MIT Technology Review investigation recently revealed how images of a minor and a tester on the toilet ended up on social media. iRobot said it had consent to collect this kind of data from inside homes—but participants say otherwise.
How to spot AI-generated text
The internet is increasingly awash with text written by AI software. We need new tools to detect it.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.