Reinterpreting the Human Genome

Manolis Kellis helped lead a major effort to map the chemical tags that cells use to get their instructions from DNA.

Amanda Schafferarchive page

June 21, 2016

Manolis Kellis ’99, MEng ’99, PhD ’03, is a voracious observer of life. His website includes hundreds of photos from Budapest, Bali, and Burning Man: images of forests, caves, rice fields, mountains, and more. He takes an average of 800 photos per day on vacation and writes computer programs to organize them.

The drive to document life by processing huge volumes of data also animates Kellis’s scientific work as a computational biologist. He recently led one component of a massive, government-funded project to catalogue what’s known as the epigenome—the chemical tags that affect genes’ activity. “If the genome is the book of life, the epigenome is the complete set of annotations and bookmarks,” he says.

Epigenomic markings attach to specific DNA bases or to the proteins, called histones, that DNA wraps around. Though almost all cells in the body contain the same DNA, different types of cells have different markings. They help to switch genes on or off, thus dictating the cells’ various functions. New markings can be added as cells differentiate into specific types—and all the way into adulthood. These epigenomic changes can be passed along as cells divide, and some evidence suggests that in certain cases they can even pass from one generation to the next as another layer of heritable information.

Researchers hope that exploring the epigenome may reveal how genetics and disease—as well as environmental factors such as pollution, diet, and stress—can change the number and location of these chemical marks, potentially altering the behavior of cells. This could uncover new ways to diagnose and treat diseases, which can be caused in part by epigenomic modifications. Kellis is currently involved in epigenomic studies of Alzheimer’s disease, schizophrenia, cardiovascular disease, bone density, metabolic disorders, and cancer.

Charting the epigenome

A professor of computer science and a member of both the Computer Science and Artificial Intelligence Lab (CSAIL) and the Broad Institute, Kellis got his start comparing the genomes of yeast species as an MIT graduate student working with Eric Lander, founding director of the Broad. As part of this work, which was published in Nature in 2003, Kellis developed computational methods for sifting through vast amounts of data to pinpoint patterns of similarity and difference. His goal was to understand how yeast has evolved in order to develop general methods for understanding genomes. He soon turned from yeast to flies and ultimately to mammals, comparing multiple species to explore genes and their control switches in the human genome.

In 2008, the National Institutes of Health assembled the Roadmap Epigenomics Mapping Consortium. The group set out to determine where along the DNA the on-off switches lie—as well as how those switches function under different circumstances and in different types of cells. Just as the Human Genome Project, which was completed in 2003, provided researchers with a reference for studying genes, the epigenomics project aimed to create a tool that scientists could use to probe gene expression.

As the project got under way, close to 175 researchers at institutions including the Broad, the University of Washington in Seattle, and the University of California campuses in San Diego and San Francisco started churning out data. They began systematically analyzing a wide array of human cell types, noting which epigenomic marks each kind of cell contained. In all, they would capture this information for more than 100 kinds of cells and tissues, including embryonic stem cells and fetal cells as well as cells from adult stomach, fat, muscle, heart, and brain tissues. As if that weren’t enough, they were also looking at cells at different developmental stages and under different disease conditions.

If the genome is the book of life, the epigenome is the complete set of annotations and bookmarks.”

In 2009, Kellis and his group received a grant to coördinate the job of integrating and analyzing the “unprecedented volumes of data” the project was generating, says consortium researcher Bradley Bernstein of Harvard Medical School, Mass. General, and the Broad. Ultimately, the largest effort to date to catalogue the human epigenome produced more than 2,800 genome-wide data sets, encompassing over 150 billion “reads” of DNA. (A “read” is the result of sequencing a small segment of DNA.)

To make sense of all that data, Kellis’s group used computer algorithms whose development was led by Jason Ernst, starting when he was a postdoctoral researcher in Kellis’s lab from 2008 to 2011. (Ernst is now an assistant professor of biological chemistry and computer science at UCLA.) The algorithms homed in on patterns of chemical tagging, which in turn helped pinpoint the locations of regulatory regions that influence whether particular genes are turned on or off. (Part of the challenge in understanding gene regulation lies in identifying regulatory regions that may not be obvious because they’re not right next to the genes they control.) By integrating individual data sets produced by their consortium colleagues, the scientists organized these signatures into a series of “epigenomic maps” that researchers can search to learn which regulatory mechanisms are likely to be present in a particular area of the genome.

The team held weekly phone calls, led by Kellis, on how to make use of the data trove. He describes the work as “mostly one long race to the finish, many sleepless nights … [and] a massive Google document with comments by teams across multiple time zones working around the clock.”

Putting the map to work

Last year, the consortium published its comprehensive map of the epigenome online; a concurrent special issue of Nature included eight of the consortium’s two dozen papers on the subject. “It’s terrifically useful,” says Marcelo Nóbrega, an expert on gene regulatory networks at the University of Chicago.

In 2014, Nóbrega and his team in Chicago discovered that a genomic region associated with obesity, located in what’s known as the FTO gene, made connections to a faraway gene called IRX3. They also showed that IRX3 mediates body weight. Mice without IRX3 lost weight, even when they ate the same amount as other animals. Meanwhile, Nóbrega’s team found that humans with overactive IRX3 in the brain tend to have an increased risk of becoming obese.

Last year, following the publication of the map, Kellis, visiting professor Melina Claussnitzer, and colleagues used epi-genomic data to elaborate on the interaction between FTO, IRX3, and the process of dissipating energy as heat (which is known as thermogenesis). They also discovered a similar link between the IRX5 gene and thermogenesis. Kellis and Claussnitzer showed that this mechanism operates in the fat cells of both humans and mice. They also detailed how changes within the relevant region of FTO cause shifts in the expression of IRX3 and IRX5. “It’s beautiful that there was such a strong concordance between our work,” says Nóbrega, adding that a full understanding of the phenomenon might someday lead to treatments for people whose so-called slow metabolisms cause them to gain excessive weight.

In another paper published last year, Kellis, Li-Huei Tsai, and others at MIT used epigenomic markings in human and mouse brains to study the mechanisms leading to Alzheimer’s disease.

Crucially, they showed that immune cell activation and inflammation, which have long been associated with the condition, are not simply the result of neurodegeneration, as some researchers have argued. Rather, in mice engineered to develop Alzheimer’s-like symptoms, they found that immune cells start to change even before neural changes are observed.

When the researchers compared the changes in these mice with those in humans with Alzheimer’s disease, they found a striking overlap. In both mice and humans, immune system processes had increased activity, and neural ones had decreased activity. Moreover, they found that human genetic differences that led to increased risk of Alzheimer’s disease were concentrated only in genes and regulatory regions associated with immune and inflammatory processes and not in those governing neuronal processes. Their work, Kellis contends, supports the idea that “immune and inflammatory processes play the primary causal role in Alzheimer’s disease.”

Kellis hopes to use epigenomic data to learn more about other diseases, including schizophrenia, cancer, and metabolic disorders. With the map in hand, he says, it’s time for “the fun part”: to go and try to tackle disease.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.