Guide Dogs

By sequencing the dog genome, the Broad Institute has accelerated the search for mutations involved in human diseases. Now a push to sequence more mammals may reveal how our own genome works.

Katherine Bourzacarchive page

June 18, 2007

The ideal boxer dog is strong, squarely built, smooth. Breeders prize it for its chiseled head. “The beauty of the head depends upon the harmonious proportion of muzzle to skull,” read the American Kennel Club’s standards for the breed. “The blunt muzzle is one third the length of the head from the occiput to the tip of the nose, and two thirds the width of the skull.” Intelligent and alert, a boxer conveys grace with every movement.

The structure of the genome underlying the boxer’s graceful build is also “gorgeous,” says Kerstin Lindblad-Toh, the molecular biologist who led the effort to sequence the dog genome. As codirector of the program in genome sequencing and analysis at the Broad Institute for genomic medicine, Lindblad-Toh has also overseen projects involving the mouse and the opossum, so she’s well qualified to assess the boxer’s genomic beauty. The source of that beauty is a relative lack of genetic diversity, the result of a hundred years of tightly controlled breeding for traits like harmoniously proportioned skulls. The genetic homogeneity within dog breeds, and the narrowness of the differences between them, mean the dog genome holds valuable clues to the causes of common diseases in both dogs and humans.

The dog genome is about 2.4 billion bases long, but Lindblad-Toh’s group was able to sequence it in just six months. The Broad Institute, jointly run by MIT and Harvard University, is a genetic powerhouse, able to sequence 60 billion bases (the letters in the genomic alphabet) with 99 percent accuracy each year–the equivalent of several Human Genome Projects. It has a greater sequencing capacity than nearly any other public or academic facility in the world.

Today, much of that capacity is allocated to the Broad’s mammalian-genome project. The Broad is one of three research centers funded by the National Institutes of Health in a major effort to bring the number of mammals whose genomes have been sequenced up to around 30 within the next few years. (The other two centers are at the Baylor College of Medicine in Houston and the Washington University School of Medicine in St. Louis.) Lindblad-Toh is overseeing sequencing and analysis projects for more than 20 animals at the Broad, providing resources for researchers who rely on animal models when studying human diseases. Her team’s efforts will shed more light on how our own genome is regulated and fill in gaps in our understanding of human evolutionary history. Ultimately, this work could help answer some compelling questions: What makes a mammal a mammal, a primate a primate–and what makes us human?

Multimedia

View images of the Broad Insitute and their work.

Unnatural Selection
In 2002, when dog researchers asked Lindblad-Toh about sequencing the animal’s genome, she thought, “Oh, wow, this must be the ideal model.” Human-directed breeding has produced wrinkly shar-peis, mohawked Rhodesian ridgebacks, and slender borzois with beautiful lines. “But when you enrich for specific desirable traits, you often unfortunately capture disease traits with them,” says Lindblad-Toh. Along with its distinctive traits, each breed has distinctive genetic vulnerabilities: developmental defects, heart disease, hip problems, cancer. Dogs suffer from many of the same diseases as humans, so researchers can use their genome to identify genetic causes of the diseases in both species.

Lindblad-Toh and her collaborators at the National Human Genome Research Institute (part of NIH) first had to determine which breed to sequence. Humans have dramatically reduced the genetic diversity of some dogs by breeding, for example, only the most bowlegged bulldogs and the squarest-jawed boxers. Sequencing a breed with little genetic diversity is easier, because the two copies of each chromosome–one from the mother, one from the father–are similar to each other. So when a preliminary analysis suggested that boxers are among the least genetically diverse of all dog breeds, the researchers had their subject.

Even when genetic diversity is limited, drafting a genome is like putting together a very large, very difficult jigsaw puzzle whose pieces are scattered all over the house–wedged between sofa cushions or under a jar of mustard at the back of the fridge. First, researchers have to locate and identify all the pieces; then they face the formidable chore of assembling the puzzle. Lindblad-Toh and her collaborators spent six months sequencing segments of a female boxer’s genome, and three or four times as long putting them together and analyzing them. They published the dog genome, along with an extensive analysis comparing it to the mouse and human genomes, in the journal Nature in December 2005.

Lindblad-Toh and her team also compared the boxer genome with the existing survey sequence of the poodle genome and partial sequences of nine other dog breeds’ genomes that they had prepared for the purpose. As they expected, though each breed has its own distinctive traits and mutations, all breeds are still very similar to one another. (The domestic breeds haven’t been around long enough for much diversity to creep in.) These similarities within and between dog breeds should make disease-related mutations easier to spot. Researchers at the Broad and elsewhere are currently uncovering disease genes in dogs with the help of the American Kennel Club/Canine Health Foundation and the Morris Animal Foundation. The researchers go to dog shows to collect blood samples and pedigrees, or they get samples from veterinarians. Using DNA microarrays, they then look for genetic differences between healthy dogs and those with diseases.

Lindblad-Toh says she and other Broad researchers have identified about 10 genes for simple traits like coat color and complex diseases like cancer; they’re also looking into genes associated with cardiomyopathy and diabetes. “Of course, when you find a dog disease gene you look immediately in people with the same disease,” she says. Dogs and humans are quite closely related and share versions of most of the same genes.

It’s much easier to uncover disease mutations in dogs, though. Two random rottweilers are far more closely related than two random humans. If the two rottweilers both develop bone cancer, which is common in their breed, the disease will probably be caused by several mutations carried by both dogs. But two humans with bone cancer are less likely to have disease-causing mutations in common. “For the cancer studies, we’ve had about 50 to 100 sick dogs,” says Lindblad-Toh. You’d need a much larger sample of humans–thousands of patients and healthy people–to see similar patterns. Understanding which mutations cause a disease in dogs helps researchers figure out where to look for disease-causing mutations in humans.

Nearly two years after the publication of the dog genome, its promise as a tool for studying human disease is beginning to be borne out. Lindblad-Toh says that her group hopes to identify mutations involved in osteosarcoma, a rare but deadly cancer in adolescence. “It’s going to be very exciting within the next year to see whether applying [insights from studying dog-gene mutations] back to human patients with the same diseases will also show you mutations in the same genes,” she says. “My prediction is yes. I think if you can find strong risk factors in people by using dogs, that would be a great benefit.”

Beyond Genes
“When we analyzed the mouse and human genomes, we found that they are 5 percent functional,” says Lindblad-Toh. That is, 5 percent of each creature’s genome is very similar to 5 percent of the other’s, suggesting that the related sequences must be serving some purpose. The dog genome turned out to have the same 5 percent that overlaps in humans and mice, which confirms that it’s not just a coincidence. Most of that 5 percent, however, is not genes.

Our genome is made up of 46 chromosomes, which are distinct, long chains of organic compounds known as A, T, C, and G–the DNA letters. Sequencing the genome means figuring out which letter, or base, is at every point along each chain. Genes are stretches where the bases spell out a code that can be translated into proteins; they make up 1.5 percent of the genome. But an astonishing 95 percent of our genome is mutation-prone gobbledygook, the genetic equivalent of the sentences you can make by banging your head on a keyboard in frustration.

Now biologists are trying to characterize the functional parts of the genome that are not genes–regions they believe play an important role in regulating genes. They want to identify these regions and find out what elements they contain, how the regions are organized, and how they work.

The best way to find the regulatory elements amid the gobbledygook is to look at what’s conserved–what stays the same–across multiple species. Comparing genomes is “almost like the Rosetta stone,” says Lindblad-Toh. With the same message carved in the Greek alphabet, hieroglyphics, and demotic script (a sort of cursive hieroglyphics), the Rosetta stone made it possible to decipher the last two writing systems. In genomes, similarly, “you have a string of letters; anything that’s important will stay the same.”

Most of the genome is merely dead space between genes, where the sequence of bases is not important for the functions of life. Mutations that occur in such stretches occasionally produce functional sequences, but they’re usually neither helpful nor harmful. That means they accumulate relatively quickly: exerting no influence on an individual’s chances of reproducing, they are not subject to natural selection, so they are passed on at a much greater rate than changes in functional areas, which are often detrimental.

“The more evolutionary time there is between two mammalian [species], the more that unimportant things will change,” says Lindblad-Toh. But a string of DNA whose sequence is conserved unchanged across species probably has an important function. Once researchers pinpoint those sequences in the areas between genes, they can test them individually to determine their functions.

Comparing the human genome with that of a distantly related organism such as yeast, however, is difficult: to extend the Rosetta stone metaphor, the two texts carry too many different messages. Comparing genomes that carry many of the same messages, like those of the human and the dog, is more productive. “To really understand the human genome, we’re focusing on mammals,” says Lindblad-Toh. Broad scientists, some of whom participated in the Human Genome Project before the institute was founded in 2003, have been involved in sequencing the chimpanzee, mouse, dog, and horse, among other animals (see “The Broad’s Menagerie,” p. M17). Work on the guinea pig, elephant, rabbit, little brown bat, bush baby, and ground squirrel is also under way at the Broad.

Researchers initially teased out the functional 5 percent of the human genome by comparing it with that of only one other animal, the mouse. Adding the dog genome made for a powerful triad. Now researchers are focusing on what the functional non-gene elements do. Evidence is growing that they regulate the genome. Without regulatory elements, a gene would be like an unread book gathering dust in the corner of a library storage room: just an inert string of letters.

Knowing what each gene does is very important, but it is not enough. “Some diseases are clearly due to one protein not being there,” says Lindblad-Toh; enzyme deficiencies are an example. “But with common diseases like cancer or diabetes, it could just be a function of how much or how little protein you make, and if you make it at the right time or not.” It seems likely that those factors are controlled by regulatory elements.

“My strong belief is that a lot of common diseases are caused by regulatory mutations,” says Lindblad-Toh. Early Broad Institute research on the dog genome appears to support her hypothesis. Broad researchers are looking for mutations associated with several traits and diseases, including white coat color, thyroid problems, bone cancer, heart problems, and shar-pei fever. For several of these, there is preliminary evidence that the responsible mutations lie outside genes–in regulatory elements.

Zeroing in on where these elements are in the genome is much easier than figuring out what they do and how. One type of regulator, called an enhancer, starts the process that enables genes to pass along their information. Enhancers can be within, near, or very far from the genes they regulate. Researchers hypothesize that proteins bind to an enhancer and to an area immediately preceding a gene; they also bind to each other, pulling the DNA around in a loop and allowing transcription of the gene to begin. But such mechanisms are just beginning to be understood.

What Makes Us Mammals
Comparing the dog genome with the mouse and human genomes helped researchers identify a host of regulatory elements; further analysis showed that in all three mammals, half of those elements are clustered around only about 240 genes. In other words, a whopping 50 percent of our most conserved elements appear to regulate not much more than 1 percent of our genes.

So what do those genes do to deserve such special treatment? It turns out that they play pivotal roles during development. They control our body plan, the biological equivalent of a blueprint. By dictating the configuration of our spines, the structure of our brains, and the position of our opposable thumbs, they make us vertebrates, mammals, primates.

“It’s of course critically important that you produce a foot where your foot goes and not on your head sometimes by accident,” says Lindblad-Toh. “What we think we’ve found is that around these 200 or so genes sit a large proportion of these regulatory elements. They are basically the master regulators of the master genes that make mammals mammals.” Some of these clustered regulatory elements are so important that they are better conserved between species than genes are.

An analysis of the opossum genome published by the Broad this past May further supports Lindblad-Toh’s hypothesis that there are mammalian master regulators. Opossums are on a different branch of the mammalian family tree: they are marsupials, while humans, dogs, and mice are eutherians–mammals with placentas. “For every genome that we add on, we keep building on the hypothesis that the functional 5 percent from human, mouse, and dog genomes is mammalian- and eutherian-specific,” says Lindblad-Toh. “Looking at the opossum and other major branches on the vertebrate tree, we see that there’s innovation that’s added on.” That is, opossums don’t share the entirety of the 5 percent that placental mammals do. Much of the variation in the regulatory elements in marsupials is around the body-plan genes. These variations may be what make opossums marsupials, and what does not vary may be an important part of what constitutes mammalness.

Lindblad-Toh says it’s too early to ask what makes a primate a primate, or what makes humans human. It seems likely, though, that development in neural pathways is regulated by primate-specific genes. Each order, each species of animal probably has its own innovations, controlled by master regulators in non-gene areas of the genome.

Right now, Lindblad-Toh says, she and other Broad researchers are just trying to assemble a basic catalogue of these master regulators. Once more species have been sequenced, it will be possible to start answering the bigger questions. As that menagerie of genomes continues to grow, dogs–long man’s best friends in the field and on the farm–are proving to be valuable companions in the lab as well.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.