Cracking the Cellular Code

The much-heralded genome is just a parts list. New research is revealing when, and why, genes do what they do in living cellsa big step toward understanding, and ultimately curing, disease.

Erika Jonietzarchive page

September 9, 2004

In the past few years, biologists have churned out the entire genetic sequence of dozens of organisms, including humans, dogs, mosquitoes, rats, and bacteria. But these strings of genes amount to the most basic molecular parts list, not much more helpful to deciphering how the genes combine to run a living cell than an array of microchips and wires would be for assembling a computer.

Researchers at MIT and at the MIT-affiliated Whitehead Institute for Biomedical Research have taken a major step toward understanding how those genes are organized to regulate cells. Refining a technique pioneered in geneticist Richard Youngs lab, the team has identified all of the controlling elements in the genome of bakers yeast, a common laboratory microorganism.

A parts list is nice, but in moving to an understanding of how a whole cell behaves, this is really the next step, says Young, who headed the project with Whitehead fellow Ernest Fraenkel and MIT computer scientist David Gifford. Weve been able to identify an important part of the genome in a very precise way that is key to regulation of life. Fraenkel and Gifford published their findings in the September 2 issue of the journal Nature.

Any cell, from yeast to human, uses multiple layers of control to coordinate which genes are switched on and off in response to stimuli such as temperature, nutrient availability, and outside chemical messengers. The central method of gene control, however, relies on proteins known as transcription factors. When these molecules attach to a region of DNA close to a particular gene, that gene is switched on; when the protein detaches, the gene shuts down. Mutations in transcription factors or in their binding sites on the genome are associated with many diseases, including hypertension, cancer, and diabetes.

Finding the binding sites is key to understanding how they influence the cell, but locating these tiny stretches of the genome has been difficult. In the new study, the MIT/Whitehead team used arrays of short DNA sequences on so-called gene chips, along with pattern-finding computer algorithms, to quickly identify the precise binding sites for almost every transcriptional regulator used by bakers yeastwhat Young calls the genomes regulatory code.

This research builds on work that Youngs group did two years ago, in which the general locations of about half the gene regulators in yeast were mapped using gene chips. In the new study, the team not only mapped the binding sites of all 203 regulatory proteins, but also studied their environment-specific use by subjecting yeast cells to a variety of conditions, such as low temperature and starvation. This reveals both which genes are turned on and off to help the cell cope with different conditions and how various transcription factors come into play in those switches.

The binding sites found with gene chips are a hundred or so DNA letters long. But regulatory proteins actually bind to very short sequences, only six to 10 letters long. So Fraenkel and his students developed novel bioinformatics algorithms to whittle down the estimated binding sites to the precise recognition sequences.

In the mid-1980s, geneticists started learning how proteins such as transcription factors recognize DNA. Finally, this is the next level, says Marc Vidal, a geneticist at Harvard Medical School. Understanding how 203 proteins interact with all 6,000 yeast genes is crucial, he believes. Vidal adds that trying to extract the rules of how you organize such a network is the really interesting questionone that will lead to an appreciation of how genes actually work with one another to keep a cell functioning.

In fact, Young and Fraenkel have already begun adapting the technique to an even more intricate network: the human genome. The problem is much more complex, says Fraenkel, since the human genome is about 100 times larger than the yeast genome, with 10 times as many regulatory proteins. In addition, the experiment to decipher how the human genome regulates proteins will have to be repeated at least 200 times, Young saysonce for every cell type in the body. Still, the researchers are enthusiastic about the possibilities: One of the exciting outcomes of this kind of work,” Young says, “is that we have the potential to understand why mutations in key human regulators cause disease.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.