Skip to Content
Artificial intelligence

An AI that can design new proteins could help unlock new cures and materials 

The machine-learning tool could help researchers discover entirely new proteins not yet known to science.

September 15, 2022
two protein models made with ProteinMPNN
University of Washington

A new AI tool could help researchers discover previously unknown proteins and design entirely new ones. When harnessed, it could help unlock the development of more efficient vaccines, speed up research for the cure to cancer, or lead to completely new materials.

Alphabet-owned AI lab DeepMind took the world by surprise in 2020 when it announced AlphaFold, an AI tool that used deep learning to solve one of the “grand challenges” of biology: accurately predicting the shapes of proteins. Proteins are fundamental to life, and understanding their shape is vital to working with them. Earlier this summer DeepMind announced that AlphaFold could now predict the shapes of all proteins known to science. 

The new tool, ProteinMPNN, described by a group of researchers from the University of Washington in two papers published in Science today (available here and here), offers a powerful complement to that technology. 

The papers are the latest example of how deep learning is revolutionizing protein design by giving scientists new research tools. Traditionally researchers engineer proteins by tweaking those that occur in nature, but ProteinMPNN will open an entire new universe of possible proteins for researchers to design from scratch. 

“In nature, proteins solve basically all the problems of life, ranging from harvesting energy from sunlight to making molecules. Everything in biology happens from proteins,” says David Baker, one of the scientists behind the paper and director of the Institute for Protein Design at the University of Washington. 

“They evolved over the course of evolution to solve the problems that organisms faced during evolution. But we face new problems today, like covid. If we could design proteins that were as good at solving new problems as the ones that evolved during evolution are at solving old problems, it would be really, really powerful.” 

Proteins consist of hundreds to thousands of amino acids that are linked up in long chains, which then fold into three-dimensional shapes. AlphaFold helps researchers predict the resulting structure, offering insight into how they will behave.

ProteinMPNN will help researchers with the inverse problem. If they already have an exact protein structure in mind, it will help them find the amino acid sequence that folds into that shape. The system uses a neural network trained on a very large number of examples of amino acid sequences, which fold into three-dimensional structures. 

But researchers also need to solve another issue. To design proteins that are useful for real-world applications, such as a new enzyme that digests plastic, they first have to figure out what protein backbone would have that function. 

To do that, researchers in Baker’s lab use two machine-learning methods, detailed in an article in Science last July, that the team calls “constrained hallucination” and “in painting.” 

“Constrained hallucination” lets users do a random search among all possible protein sequences and favor sequences with certain functions. This “hallucination” makes it possible to explore the space of all possible protein structures, thanks to machine learning’s ability to crunch vast data sets. There are 20 amino acids, which can be combined into a massive number of possible sequences. 

“Nature has only sampled … a tiny fraction. So if you limited the search to those sequences that exist in nature, you wouldn’t get anywhere,” Baker says. 

“In painting” works much like autocomplete in a word processor, but for protein structures and sequences. Using these methods, the researchers can create a completely new protein that hasn’t been seen in nature before, such as a giant ring-like structure.

Baker’s team is experimenting with whether those ring-like structures could be used as components of tiny machines that operate at the nanoscale. In the future, these nanomachines could be used to unclog arteries, for example. 

The ability to use machine learning to design proteins in this way is “a very big deal,” says Lynne Regan, professor of biochemistry and biotechnology at the University of Edinburgh.

Machine learning will make the whole process a lot quicker and easier, and will allow researchers to create completely new proteins and structures on a much larger scale. The software is more than 200 times faster than the previous best tool and requires minimal user input, potentially lowering the barriers to entry for protein design. 

“These contributions and others recently are transforming the field of biomolecular structure prediction and design,” says Jeffrey Gray, a professor of chemical and biomolecular engineering at Johns Hopkins University.  

“The implications are dramatic in terms of understanding biology, health, and disease and in designing new molecules to reduce human suffering,” Gray says. 

Gray says his lab will combine deep-learning tools they developed with ones from the Baker lab to better understand the immune system and immune-related diseases, and use AI to design therapeutics. 

“AlphaFold launched biology into a new era by solving the protein structure predicting problem and demonstrating the transformative role that AI and [machine learning] will play in biology,” says Pushmeet Kohli, the head of DeepMind’s AI for Science team. “ProteinMPNN is another proof of this paradigm shift, designing proteins for specific tasks.”

ProteinMPNN, which is now available free on the open-source software repository GitHub, will give researchers the tools to make unlimited new designs. “The challenge, of course …  is what are you going to design?” Baker says. 

Correction: A previous version of this story stated that proteins consist of hundreds of thousands of amino acids, when in fact they consist of hundreds to thousands of amino acids. Sorry.

Deep Dive

Artificial intelligence

Why Meta’s latest large language model survived only three days online

Galactica was supposed to help scientists. Instead, it mindlessly spat out biased and incorrect nonsense.

DeepMind’s game-playing AI has beaten a 50-year-old record in computer science

The new version of AlphaZero discovered a faster way to do matrix multiplication, a core problem in computing that affects thousands of everyday computer tasks.

A bot that watched 70,000 hours of Minecraft could unlock AI’s next big thing

Online videos are a vast and untapped source of training data—and OpenAI says it has a new way to use it.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.