In 2002, the International HapMap consortium set out to inventory millions of SNPs and to identify the patterns that distinguish each haplotype. The database now contains more than 3.5 million SNPs. With this information, researchers can select “tag SNPs” that depict the genetic variation in each block. In other words, by identifying only a few SNPs that are characteristic of each pattern and testing at those locations, researchers can “fill in the blanks” for every other SNP in the haplotype. This allows them to compare the genetic patterns of people who have a disease with those of unaffected people far more efficiently than has been possible before.
Indeed, the consortium estimates that, with proper tag selection, geneticists could gather information about possible gene associations from across the whole genome by testing as few as one-tenth of the approximately 10 million common SNP sites.
The data has already been used to identify a gene associated with age-related macular degeneration, the leading cause of blindness in the elderly; and several other studies are underway seeking genes that may be involved in obesity and heart disease.
Along with the data generated, the HapMap project sparked advances in technology. At the project’s outset, determining which SNP a patient carried at one site cost almost a dollar, and researchers could test hundreds a day. Today, the price has dropped to less than a cent per SNP, and millions can be tested in a day. The accuracy of the testing has improved as well, Daly says.
The combination of these new technologies and the HapMap data make it much easier for geneticists to do studies that examine the entire human genome for genes associated with particular characteristics – be they diseases, as the consortium members hope, or other traits believed to have genetic components, such as intelligence or sexual preference.
But there’s a potentially serious catch: the statistical likelihood of turning up a gene that appears to be linked to a particular trait but ends up having no role in actually causing the trait will be quite high, says Daly.
“If you deal out one hand of cards, you’re unlikely to get a full house,” Daly says. “But if you deal out 100,000 hands of poker, you’re bound to get some really good-looking hands, statistically.”
The same thing can happen in such genome-wide association studies: one or more genes can turn up that “look good.” Daly cautions: “These things will happen by chance and have nothing to do with causation.”
As a result, the members of the HapMap consortium have expressed their hope that the data will be used mainly to research medical conditions, as opposed to non-medical traits. In fact, they included a particular caution in the Nature paper, writing: “we urge conservatism and restraint in the public dissemination and interpretation of such studies,” especially if non-medical traits are being explored.
The fact that the HapMap data was derived from the DNA of people in Nigeria, China, Japan, and the United States brings an added hazard: that associations between gene variations and particular traits might (falsely) appear stronger in some populations than in others.*
“This is a huge data set that could be mined for lots of myopic, culturally prejudiced things,” says Langley. “Everybody’s nervous about that. It’ll probably happen, and it’s just up to the scientific community to deal with each case as rigorously as possible.”
Editor’s Note: Please come back on Wednesday, November 9, for Part 2 of Erika Jonietz’s story on the HapMap, which will focus on the project’s international aspects.
* [Clarification, Nov. 15, 2005: This sentence may give the incorrect impression that associations between particular genes and traits never vary among populations. While most genetic variation is shared among all populations, there are occasional differences. As a result, both true and false associations may be made with gene variants that appear at different frequencies in different populations. In either case, the association could be used in a way prejudiced against the group carrying the variant at a higher (or lower) rate. – Editors.]