An AI that can design new proteins could help unlock new cures and materials

The machine-learning tool could help researchers discover entirely new proteins not yet known to science.

Melissa Heikkiläarchive page

September 15, 2022

University of Washington

A new AI tool could help researchers discover previously unknown proteins and design entirely new ones. When harnessed, it could help unlock the development of more efficient vaccines, speed up research for the cure to cancer, or lead to completely new materials.

Alphabet-owned AI lab DeepMind took the world by surprise in 2020 when it announced AlphaFold, an AI tool that used deep learning to solve one of the “grand challenges” of biology: accurately predicting the shapes of proteins. Proteins are fundamental to life, and understanding their shape is vital to working with them. Earlier this summer DeepMind announced that AlphaFold could now predict the shapes of all proteins known to science.

The new tool, ProteinMPNN, described by a group of researchers from the University of Washington in two papers published in Science today (available here and here), offers a powerful complement to that technology.

The papers are the latest example of how deep learning is revolutionizing protein design by giving scientists new research tools. Traditionally researchers engineer proteins by tweaking those that occur in nature, but ProteinMPNN will open an entire new universe of possible proteins for researchers to design from scratch.

“In nature, proteins solve basically all the problems of life, ranging from harvesting energy from sunlight to making molecules. Everything in biology happens from proteins,” says David Baker, one of the scientists behind the paper and director of the Institute for Protein Design at the University of Washington.

“They evolved over the course of evolution to solve the problems that organisms faced during evolution. But we face new problems today, like covid. If we could design proteins that were as good at solving new problems as the ones that evolved during evolution are at solving old problems, it would be really, really powerful.”

Proteins consist of hundreds to thousands of amino acids that are linked up in long chains, which then fold into three-dimensional shapes. AlphaFold helps researchers predict the resulting structure, offering insight into how they will behave.

ProteinMPNN will help researchers with the inverse problem. If they already have an exact protein structure in mind, it will help them find the amino acid sequence that folds into that shape. The system uses a neural network trained on a very large number of examples of amino acid sequences, which fold into three-dimensional structures.

But researchers also need to solve another issue. To design proteins that are useful for real-world applications, such as a new enzyme that digests plastic, they first have to figure out what protein backbone would have that function.

To do that, researchers in Baker’s lab use two machine-learning methods, detailed in an article in Science last July, that the team calls “constrained hallucination” and “in painting.”

“Constrained hallucination” lets users do a random search among all possible protein sequences and favor sequences with certain functions. This “hallucination” makes it possible to explore the space of all possible protein structures, thanks to machine learning’s ability to crunch vast data sets. There are 20 amino acids, which can be combined into a massive number of possible sequences.

“Nature has only sampled … a tiny fraction. So if you limited the search to those sequences that exist in nature, you wouldn’t get anywhere,” Baker says.

“In painting” works much like autocomplete in a word processor, but for protein structures and sequences. Using these methods, the researchers can create a completely new protein that hasn’t been seen in nature before, such as a giant ring-like structure.

Baker’s team is experimenting with whether those ring-like structures could be used as components of tiny machines that operate at the nanoscale. In the future, these nanomachines could be used to unclog arteries, for example.

The ability to use machine learning to design proteins in this way is “a very big deal,” says Lynne Regan, professor of biochemistry and biotechnology at the University of Edinburgh.

Machine learning will make the whole process a lot quicker and easier, and will allow researchers to create completely new proteins and structures on a much larger scale. The software is more than 200 times faster than the previous best tool and requires minimal user input, potentially lowering the barriers to entry for protein design.

“These contributions and others recently are transforming the field of biomolecular structure prediction and design,” says Jeffrey Gray, a professor of chemical and biomolecular engineering at Johns Hopkins University.

“The implications are dramatic in terms of understanding biology, health, and disease and in designing new molecules to reduce human suffering,” Gray says.

Gray says his lab will combine deep-learning tools they developed with ones from the Baker lab to better understand the immune system and immune-related diseases, and use AI to design therapeutics.

“AlphaFold launched biology into a new era by solving the protein structure predicting problem and demonstrating the transformative role that AI and [machine learning] will play in biology,” says Pushmeet Kohli, the head of DeepMind’s AI for Science team. “ProteinMPNN is another proof of this paradigm shift, designing proteins for specific tasks.”

ProteinMPNN, which is now available free on the open-source software repository GitHub, will give researchers the tools to make unlimited new designs. “The challenge, of course … is what are you going to design?” Baker says.

Correction: A previous version of this story stated that proteins consist of hundreds of thousands of amino acids, when in fact they consist of hundreds to thousands of amino acids. Sorry.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

Will Douglas Heavenarchive page

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Will Douglas Heavenarchive page

What’s next for generative video

OpenAI's Sora has raised the bar for AI moviemaking. Here are four things to bear in mind as we wrap our heads around what's coming.

Will Douglas Heavenarchive page

The AI Act is done. Here’s what will (and won’t) change

The hard work starts now.

Melissa Heikkiläarchive page

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

An AI that can design new proteins could help unlock new cures and materials

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

What’s next for generative video

The AI Act is done. Here’s what will (and won’t) change

Stay connected

Get the latest updates from
MIT Technology Review

The latest iteration of a legacy

Advertise with MIT Technology Review

About

Help

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

What’s next for generative video

The AI Act is done. Here’s what will (and won’t) change

Stay connected

Get the latest updates fromMIT Technology Review

Get the latest updates from
MIT Technology Review