Your Genetic Destiny for Sale

To find disease-causing genes, researchers want access to millions of personal medical records-maybe even yours. Is this necessary science or dubious profiteering?

Gary Taubesarchive page

April 1, 2001

Large extended families have traditionally been the mother lode of genetic research. From them came a precious commodity: links between the presence of a disease and the errant genes responsible for it. When medical researcher Nancy Wexler, for instance, went looking for the genetic cause of Huntington’s disease in 1979, it was a 9,000-member Venezuelan family that enabled her to trace the telltale patterns of disease inheritance.

Wayne Gulliver’s family is not nearly so large, but it is impressive nonetheless. Until two years ago, when his great-great-aunt passed away, six generations of Gullivers were alive in Newfoundland. His grandmother, who died last October, had some hundred descendants, while his parents, only in their 60s, already have 26 grandchildren to go with their 10 children. All of this would be professionally irrelevant if Gulliver’s family were not typical of Newfoundland, and if Gulliver himself, a dermatologist who studies the genetics of psoriasis, were not involved in a rapidly emerging discipline called population genomics, the goal of which is to identify the underlying genes responsible for common chronic diseases, such as cancer and heart disease.

Two years ago Gulliver met Paul Kelly, CEO of the British company Gemini Genomics, which had already assembled a huge international network of twins to use in searching for gene-disease associations. Gulliver pitched Kelly the idea of supplementing Gemini’s database with population statistics from Newfoundland and Labrador. His selling points were simple: a population of 550,000, of which almost 90 percent are descended from the original Irish, Scottish and English immigrants who arrived before the mid-19th century. It is, Gulliver says, a population in which the locals often know their family lineages back to the original immigrants. “Not like the States,” he says, “where you have three kids, send them off to college, and you might be lucky if you see each other every fifth Thanksgiving.”

And many of those families, like Gulliver’s own, are large. In such a tightly knit population consisting of large extended families, common diseases might run in recognizable patterns-shared by siblings, for instance, or passing through paternal or maternal lines, or linked to other distinctive physical characteristics. All it would take to mine this rich vein of medical history for valuable clues to disease-causing genes would be a sufficient effort, some very advanced biotechnology tools and some startup capital.

Gulliver’s pitch prompted Gemini to launch Newfound Genomics in February 2000. In the near term, Newfound Genomics aims to concentrate on diseases endemic to the local population-psoriasis, diabetes, obesity, inflammatory bowel disease, osteoporosis and rheumatoid arthritis-with the hope, considering the Irish/English/Scottish ancestry, that any relevant genes or gene variants that might be uncovered would play significant roles in other populations. The expectations behind the company are anything but modest, at least judging by the inaugural press release. “We have the potential here to develop a major international powerhouse of clinical genetics,” said Kelly, “that will provide benefit not only for the Newfoundland and Labrador community but also patients suffering from these diseases worldwide.”

Newfound Genomics is just one of a host of such ventures formed over the last few years (see “A Database Sampler” below). The specifics vary from project to project, but the strategies are similar: sift through the DNA of large populations, if not entire nations, in hope of identifying the underlying genetic causes of those diseases most likely to kill us. The researchers, pharmaceutical company executives and venture capitalists involved are all betting that recent advances in biotech and computing have made it possible to take a few hundred or thousand victims of a disease, analyze their DNA, compare it to the DNA of healthy individuals and identify the salient differences-those genetic variations that result in illness on the one hand and health on the other.

A Database Sampler

Company	Population
Newfound Genomics (Newfoundland, Canada)	550,000 Newfoundlanders

Autogen (Melbourne, Australia)	180,000 Tonganese

deCODE genetics (Reykjavik, Iceland)	280,000 Icelanders

UmanGenomics (Umea, Sweden)	260,000 Swedes

DNA Sciences (Fremont, CA)	100,000 Internet users

Wellcome Trust/Medical Research Council	500,000 U.K. volunteers If these efforts succeed, they could revolutionize the nature of drug discovery and medical treatment. In the ultimate manifestation of this technological dream, no hypothesis of disease causation is necessary. Medical researchers need not speculate first about what biochemical pathways are involved or what proteins are at fault, which is the laborious way that medicine now makes progress. Instead, they would simply compare databases of genetic samples and disease records, employing computerized data-mining operations to find the causative genes and gene variations at work. Pinpoint the genes that predispose individuals to disease and you have a clue to what disease mechanisms are at work and how to prevent or repair them.

The same kind of research would also provide clues to what your own medical future has in store for you-what afflictions are more or less likely to do you in; what treatments, pharmaceuticals or preventive measures will most likely ward off disease or cure it; and perhaps even how you personally should lead your life to maximize the chance of surviving to a ripe old age. Are you genetically predestined, for instance, to fall dead of heart attack in your 50s or fade slowly away with heart failure in your 90s? Will breast cancer or Alzheimer’s be your fate? Schizophrenia or depression? Diabetes?

This endeavor is what Stanford University geneticist Neil Risch, for one, calls “the endgame of human genetics.” Certainly it is the best shot of geneticists to identify the genes at play in the common ills of mankind. Should it work, it “will herald a new era of information-based targeted care, in which genetic profiling will identify the disease predisposition risks faced by individuals and, if disease occurs, will make it possible to tailor therapy based on individual patient needs,” wrote George Poste, former chief scientist for SmithKline Beecham (now GlaxoSmithKline), in the journal Nature. And even if it doesn’t achieve such lofty goals, it may still provide new understanding of the nature of common chronic disease.

In the jargon of genetics, the search for disease-causing genes is a search for the genotype that explains the phenotype. Genotype is the individual variations in the three billion base pairs of DNA and the tens of thousands of genes we all share; it’s our actual genetic makeup. Phenotype is how that DNA physically manifests itself-in this case, as the susceptibility to disease, or the progression of disease, or the susceptibility of the disease itself to treatment, all of which likely have a genetic component. Genotype goes into a black box of human biology and phenotype comes out. Occasionally this connection is excruciatingly deterministic, as it is, for instance, with Huntington’s disease or cystic fibrosis, in which a single mutation in a single gene means you have the disease or will get the disease. In the vast majority of human ailments, however, the connection is excruciatingly vague-as it is with personality or intelligence or athletic excellence or any other complex trait. When geneticists use the word “complex,” they mean that more than one gene is responsible for an individual’s condition, and probably quite a few.

The challenge for the geneticist is, depending on how you look at it, a signal-to-noise problem or a needle-in-a-haystack problem. With tens of thousands of genes in the haystack of the human genome, how do you identify those one or two or 10 that play a role in any particular disease?

This is where large families come in handy. Because all the members share a common genetic inheritance, it’s highly likely that any disease that runs in the family is caused by the very same genes and the very same mutations slipped into the family gene pool by a distant ancestor. If you can find a few hundred family members with the disease and a few hundred without, you can be pretty confident that eventually you’ll find the mutation that is present in the DNA of the afflicted members and absent from the DNA of the healthy ones. For researchers, this is much simpler than the situation where the afflicted are unrelated, since in that case the genetic causes may also be unrelated, and the overall variation in the DNA so bewildering that the signal from any disease-causing genes is overwhelmed by the background noise of genetic variation.

In the early 1980s, researchers turned to large extended families and a technology known as linkage analysis to begin systematically searching for disease-carrying genes. By following the pattern of disease inheritance in large families and linking the presence of the disease to known genetic markers-long regions, for instance, in which the DNA letters A and C alternate repeatedly-geneticists could first localize the disease-causing gene to a specific chromosome or specific chromosomal region. They would then employ a technique called positional cloning to scour the nearby DNA for the genes and finally identify a particular misspelling that led to disease. The techniques were developed in “a spectacular series of discoveries,” says MIT geneticist David Altshuler.

But success didn’t come easy. Nancy Wexler, for instance, started her search for Huntington’s in 1979. By 1983, using blood samples sent from Venezuela, her colleague James Gusella of the Harvard Medical School had narrowed the position of the Huntington’s gene to a short tip of chromosome four that was only a million base pairs in length. It took another 10 years to identify the gene at work and nail down the critical mutation.

Since then, geneticists have identified hundreds of disease-causing genes, using ever faster methods of testing DNA samples, ever faster computers and a new generation of software to compare and contrast DNA variations. The sole caveat in this remarkable accomplishment is that virtually every gene identified, with a few exceptions, has been for a disease caused by a single gene and a single mutation. These are rare diseases-like Huntington’s or cystic fibrosis-because evolution strongly selects against them. When geneticists used the same techniques to look for the genetic causes of common chronic diseases like heart disease and cancer, success was considerably harder to come by.

That these common diseases have a degree of “heritability” is undeniable. But the last decade of mostly negative studies is compelling evidence that the underlying genetics is indeed complex. It may be the interaction of two or three genes and gene variants that predisposes an individual to a specific chronic disease. It may be considerably more-each having a minor effect on the likelihood of contracting the disease or the eventual outcome.

This complexity makes the search for chronic-disease genes extremely difficult. If the impact of any one gene is so small-say five percent as opposed to the 100 percent of the Huntington’s gene-then following the connection through the black box becomes that much more difficult amidst the noise of environmental factors and other genes. “You might be looking for a combination of three or 10 or 100 genes,” explains Altshuler, “each of which might have multiple mutations in it that might affect the disease, and all of them collaborating with the environment and perhaps randomness or fate. So the correlations will be much, much weaker. It means you need different tools to augment the search. In particular, it means you have to look at lots of people. Imagine if one gene causes the disease; you might look at as few as five or 10 families, each with lots of people, and be able to pick out the correlation. The numbers don’t have to be that large to make a compelling case. If no single gene or mutation is going to explain more than five or 10 percent of the disease, you need hundreds or thousands of people.”

Indeed, solving the puzzle would probably be impossible if not for the recent advances in the computer and lab technologies used to determine the genotypes of individuals. In addition, the Human Genome Project now provides a map of the entire three billion base pairs that constitute the human genome. “A necessary step,” says Altshuler, “is to know what the genes are and have very fast and efficient tools for finding variations and asking, does this variation correlate with a disease? Now the Human Genome Project provides a list of all the genes, and that is fundamentally empowering. Even in the previous paradigm, where the disease, like Huntington’s, was caused by a single gene of big impact, you had to find all the genes in the local region, characterize them and figure out which one has the variation. That would take an army of people. The Human Genome Project has come along and done a lot of that labor up front.”

For the pharmaceutical industries, geneticists, and venture capitalists, the immediate challenge is to find a population that will provide sufficient numbers of disease victims, the clinical data necessary to accurately identify the disease, and the opportunity to take DNA samples from everybody involved. Picking the right population is a choice of trade-offs. The bigger the population, the greater the sample size and the better the statistics, but the more difficult and expensive it becomes to get accurate clinical data. Large extended families will likely share very similar disease-causing genotypes, which makes it easier to identify the relevant genes, but those mutations or gene variants might be specific to the family and rare in other populations.

The effort to achieve the right balance among these trade-offs has led groups to various strategies for setting up and exploiting the information contained in large medical databases. In contrast to Newfound Genomics, some have shied away from looking at closely related populations. Cambridge, MA-based Genomics Collaborative is, for instance, recruiting physicians and letting them enter patients on a disease-by-disease basis. This network is growing at the rate of 7,000 new patients every month, says CEO Michael Pellini. Eventually, he says, the company hopes to have genotype and phenotype data on a half-million patients, representing “large heterogeneous populations.”

This kind of population, Pellini argues, will offer up gene-disease associations-and the diagnostics and pharmaceuticals that might come out of them-with an applicability to large, diverse populations. Pellini cites BRCA1, one of two genes associated with familial breast cancer. When the gene was first identified, he says, “People thought it would be implicated in a very significant number of women with breast cancer. When follow-up studies were conducted to validate that association, it was realized that BRCA1 is actually implicated in less than 10 percent of women with breast cancer. One of the reasons that occurred is the researchers started with small studies and with homogenous populations. Think about it. If you develop a diagnostic that is based on one population, one thing you know is that it’s representative of that one population. You have no idea if it’s representative of any other populations. Our goal is to come out with the diagnostics that are actually representative of a very broad population, and ultimately, to develop therapeutics with the exact same rationale.”

At GlaxoSmithKline, the working philosophy of Allen Roses, who heads the genetics program, is to pick the diseases, recruit the world experts on those diseases, and then let the experts recruit the patients, first from families with a history of the disease, and then from what are known in the lingo as “sporadic” cases-those isolated cases without a family history. GlaxoSmithKline is building eight “clinical genetic networks,” each for a different chronic disease, and Roses estimates that each network will cost $8 million for the first three years. “It ain’t cheap,” he says. “It is not high throughput to work up the data patient by patient, family by family, control by control. It is the part of the study which no technology can circumvent. It is the slowest part. But what you get out of it, if you put in the effort, is the polymorphisms [specific gene variations] of specific genes that are-not ‘could be,’ not ‘might be,’ not ‘we believe’-but are clinically associated with the disease.”

Despite the optimism, the population genomics boom has the potential to become mired in two distinct controversies-one ethical, the other scientific. The ethical debate was ignited three years ago, when former Harvard University neurology professor Kari Stefansson collected $12 million in venture capital and returned to his native Iceland to launch deCODE genetics, with the dream of mining Icelandic DNA for disease-causing genes.

To Stefansson, the Icelandic population represents an incomparable genetic resource. Virtually all the 280,000 inhabitants are descended from the Vikings who landed in the late ninth century. And this inbred population comes with excellent medical records beginning in 1915. As a result, Iceland represents a population in which finding underlying disease-causing genes should be as easy as it gets. Indeed, the potential in terms of new drugs is so great that in February 1998 deCODE signed a deal with the Swiss pharmaceutical giant Roche that could potentially be worth $200 million over five years.

Controversy, however, erupted in the spring of 1998 when the millennium-old Icelandic parliament took up consideration of a bill that would grant deCODE the right to construct a national health-records database of the entire Icelandic population. The bill gave deCODE a 12-year exclusive license to run the database and sell access to third parties, which would include any other scientists who might want to use the records.

The bill was sprung on the Icelandic community “as lightning from the clear sky,” says Einar Arnason, a population geneticist and evolutionary biologist at the University of Iceland. Arnason is also vice chairman of Mannvernd, the Association of Icelanders for Ethics in Science and Medicine, which was formed to oppose the bill. While the act was passed by the Icelandic parliament in December 1998, Mannvernd is challenging its constitutionality and counting among its allies the Icelandic Medical Association and a substantial fraction of the nation’s physicians.

DeCODE’s critics have attacked it on several ethical fronts, charging it with misleading the Icelandic public; playing on Icelandic patriotism and national self-interest, when the company is incorporated in Delaware and backed almost exclusively by U.S. investors; and, as Harvard University geneticist Richard Lewontin wrote in the New York Times, converting “the health and genetic status of the entire population into a tool for the profit of a single enterprise.”

The specific criticisms are threefold: first, that deCODE will have exclusive rights to the data in the health-records database, while other scientists, even Icelandic ones, will have to buy their way in; second, that the company may not be able to adequately protect the privacy of individuals whose records go into the database; and third, and most controversial, that the deCODE database works on the basis of “presumed consent” rather than “informed consent.” In other words, rather than asking individuals beforehand whether they would like to participate, deCODE has a right to the records of anyone who doesn’t “opt out” by filling out a form and sending it in to the proper authorities.

The deCODE imbroglio has almost single-handedly rendered the ethics issue a primary focus in the emerging field of population genomics. As medical ethicists like Stanford law professor Henry Greely point out, genetics research grew up with family studies, in which the families involved have an obvious incentive to participate. “They want to find something to help themselves, their kids, their grandkids,” he says. “They’re not worried about who makes money, and they end up with really close relationships with the geneticists.” Now that the research is moving into entire populations, says Greely, “Researchers don’t have any contact with anything but ones and zeroes or perhaps a little bit of extracted DNA.” Issues such as whether or not participants should be told about findings that relate directly to their own health, and even whether they should benefit financially, have to be worked out carefully in advance.

If nothing else, the ongoing deCODE controversy has the other players in population genomics trumpeting their ethics policies-and how they differ from deCODE’s. UmanGenomics, for instance, was founded in 1999 to market the genetic information from a 15-year-old bank of biological samples taken from the bulk of the population of the county of Vterbotten in northern Sweden. Sune Rosell, a former Karolinska Institute pharmacologist and now president of UmanGenomics, explains that three levels of informed consent are involved in the endeavor, from the individual level (an informed consent form is signed before anyone donates blood samples) to a societal level (all projects have to be approved by a regional ethics council) and a community level (representatives from the county are on the company board). In addition, says Rosell, while the company is privately owned, 51 percent of the shares are held by the county and the local University of Umea.

In Britain, the Wellcome Trust and the governmental Medical Research Council are planning to create the U.K. Population Biomedical Collection to link DNA samples, medical records and lifestyle details from 500,000 volunteers. The project has been approved for funding, but the only significant work so far, says project leader Tom Meade, who also directs the Medical Research Council’s epidemiology and medical-care unit, has been “a great deal of public consultation” on “all the issues of reassuring people about confidentiality and getting informed consent. And making sure people understand what this is all about.”

In the United States, public opposition has already scuttled at least one venture. Boston University, which runs the famous Framingham Heart Study, recently founded Framingham Genomic Medicine, a privately owned company (see “Medical Records, Inc.,” TR July/August 2000); the plan was to generate a database linking 52 years’ worth of meticulously detailed medical records from the Framingham Heart Study to genotype data, which the Framingham researchers began collecting in the late 1980s. The company then hoped to market access to the database as a resource to pharmaceutical companies and other academics. In December 2000, however, the university decided to kill the company when it couldn’t make headway on the ethical issues-in particular, the fact that the study was funded over the decades by the federal government, and academics had always had access to the information free-of-charge.

Next to the tangled ethical issues, the scientific controversy is straightforward: a debate between optimists and pessimists, with the bulk of the community falling somewhere in between. The pessimists argue that the genetic nature of chronic diseases-whether asthma, heart disease or diabetes-is unlikely to be simple enough for association studies to elucidate. And if the number of genes is large, and the effect of each gene is small, or if multiple genes confer similar traits-say, resistance to heart disease or cancer-then the studies are likely to turn up little or nothing real, or, at least, nothing real and useful.

The optimists, on the other hand, are betting that two or three or a handful of genes play a large enough role in many chronic diseases that the research will find them. Even optimists, however, recognize that for population studies to find disease-causing genes, the number of genes responsible for susceptibility to any particular common disease must be relatively low. (Though where there’s only one, the family studies to date should have unearthed it.) How many more than one is the question. “There is a lot of room between one and infinity,” says Stanford’s Risch.

And the farther from one the answer lies, the less successful these association studies will be. As Risch points out, the only way to find out which side is right is to do the scientific research on which genomics companies are betting. “I don’t know what else we can do in terms of human genetics to try to find genes for common diseases,” says Risch. “Most people believe there’s a genetic component to these diseases. If it turns out to be too many genes, and the effects are too modest, that will kill it. But there’s no way to know right now, and I don’t see any reason not to be optimistic. This has not played out at all, not by any stretch of the imagination.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.