Skip to Content

A Translation Algorithm Can Predict the “Language” of a Chemical Reaction


By thinking of organic chemistry as words and sentences instead of atoms and molecules, researchers have found a way for artificial intelligence to predict chemical reactions.

In a paper published on arXiv by researchers at IBM and being presented at this week’s Neural Information Processing Systems (NIPS) conference, the researchers demonstrate that by treating reaction predictions as a translation problem, they could come up with the correct reaction more often than was possible with previous models.

“Intuitively, there is an analogy between a chemist’s understanding of a compound and a language speaker’s understanding of a word,” the researchers write.

Using a neural network often used in machine translation, the researchers trained the system on a data set that included 395,496 reactions. From that data, the neural net had to learn the “syntax” of reactions to be able to predict unseen compounds. The algorithm gave researchers a list of the top five most likely reactions, and the top prediction was correct 80 percent of the time, beating another model that tried to predict reactions by six percentage points.

There are millions of chemical reactions that have yet to be documented, so this approach could help speed up research for things like drug discovery. But researchers say that as more data gets added to the models, more double-checking will have to take place. Teodoro Laino, one of the researchers, told IEEE Spectrum that they “didn't create this tool to replace organic chemists, but to help them.”