A New Map for Health

Handled with care, the new “HapMap” of genetic variation could reveal the genetic roots of many diseases.

Erika Jonietzarchive page

November 7, 2005

An international consortium of researchers has assembled a database of human genetic variations, creating a tool that could revolutionize the search for genes that cause many common diseases. But without careful self-regulation, the geneticists say, the information could also result in a flood of misleading or inconclusive results.

Called the HapMap, the database catalogs more than three million points of genetic variation based on samples from 269 people in Nigeria, China, Japan, and Utah. More than 200 scientists in Canada, China, Japan, Nigeria, the United Kingdom, and the United States participated in the project. The first phase of the project, reporting more than one million differences, was published in the October 27 issue of Nature, based on data analysis led by Peter Donnelly of the University of Oxford in England and David Altshuler, director of the program in Medical and Population Genetics of the Broad Institute of Harvard and MIT in Cambridge, MA.

“We need this background information on variation in the human genome just to begin to address the questions that we want to ask – like what are the genes involved in breast and prostate cancer and diabetes,” says Brian E. Henderson, dean of the Keck School of Medicine at the University of Southern California.

“It’s a very powerful tool,” agrees Charles Langley, a population geneticist at the University of California, Davis. “Human medical genetics is finally addressing a much bigger public health issue, which is the genetic basis of common diseases.”

Approximately six billion chemical building units, called nucleotides, comprise the human genome. Although roughly 99.9 percent of the sequence of those nucleotides is identical between any two humans, that still leave millions of differences at individual points in the DNA, called single nucleotide polymorphisms, or SNPs. It is these variations that account for many of the genetically determined differences between humans.

Researchers could find which of these changes relate to a particular disease by sequencing and comparing entire genomes (and every SNP) among thousands of affected and unaffected people. However, in practice, this would be expensive and time consuming.

In 2001, Mark J. Daly, then at the Whitehead Institute, now an associate member at the nearby Broad Institute, found that such genetic differences are inherited in large blocks, called haplotypes (hence the term “HapMap”). While there may be hundreds of SNPs within a region of DNA, all of them are linked, so that everyone who has an “A” nucleotide rather than a “G” at a particular location in a chromosome will have the same genetic variants at other SNPs in that region. And for many haplotypes, only three or four patterns of variation exist.

With a catalog of these blocks, geneticists could more effectively identify gene variants involved in common diseases such as diabetes, cancer, heart disease, and psychiatric illnesses.

In 2002, the International HapMap consortium set out to inventory millions of SNPs and to identify the patterns that distinguish each haplotype. The database now contains more than 3.5 million SNPs. With this information, researchers can select “tag SNPs” that depict the genetic variation in each block. In other words, by identifying only a few SNPs that are characteristic of each pattern and testing at those locations, researchers can “fill in the blanks” for every other SNP in the haplotype. This allows them to compare the genetic patterns of people who have a disease with those of unaffected people far more efficiently than has been possible before.

Indeed, the consortium estimates that, with proper tag selection, geneticists could gather information about possible gene associations from across the whole genome by testing as few as one-tenth of the approximately 10 million common SNP sites.

The data has already been used to identify a gene associated with age-related macular degeneration, the leading cause of blindness in the elderly; and several other studies are underway seeking genes that may be involved in obesity and heart disease.

Along with the data generated, the HapMap project sparked advances in technology. At the project’s outset, determining which SNP a patient carried at one site cost almost a dollar, and researchers could test hundreds a day. Today, the price has dropped to less than a cent per SNP, and millions can be tested in a day. The accuracy of the testing has improved as well, Daly says.

The combination of these new technologies and the HapMap data make it much easier for geneticists to do studies that examine the entire human genome for genes associated with particular characteristics – be they diseases, as the consortium members hope, or other traits believed to have genetic components, such as intelligence or sexual preference.

But there’s a potentially serious catch: the statistical likelihood of turning up a gene that appears to be linked to a particular trait but ends up having no role in actually causing the trait will be quite high, says Daly.

“If you deal out one hand of cards, you’re unlikely to get a full house,” Daly says. “But if you deal out 100,000 hands of poker, you’re bound to get some really good-looking hands, statistically.”

The same thing can happen in such genome-wide association studies: one or more genes can turn up that “look good.” Daly cautions: “These things will happen by chance and have nothing to do with causation.”

As a result, the members of the HapMap consortium have expressed their hope that the data will be used mainly to research medical conditions, as opposed to non-medical traits. In fact, they included a particular caution in the Nature paper, writing: “we urge conservatism and restraint in the public dissemination and interpretation of such studies,” especially if non-medical traits are being explored.

The fact that the HapMap data was derived from the DNA of people in Nigeria, China, Japan, and the United States brings an added hazard: that associations between gene variations and particular traits might (falsely) appear stronger in some populations than in others.*

“This is a huge data set that could be mined for lots of myopic, culturally prejudiced things,” says Langley. “Everybody’s nervous about that. It’ll probably happen, and it’s just up to the scientific community to deal with each case as rigorously as possible.”

Editor’s Note: Please come back on Wednesday, November 9, for Part 2 of Erika Jonietz’s story on the HapMap, which will focus on the project’s international aspects.

* [Clarification, Nov. 15, 2005: This sentence may give the incorrect impression that associations between particular genes and traits never vary among populations. While most genetic variation is shared among all populations, there are occasional differences. As a result, both true and false associations may be made with gene variants that appear at different frequencies in different populations. In either case, the association could be used in a way prejudiced against the group carrying the variant at a higher (or lower) rate. – Editors.]

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.