Putting a Rush on Identifying Genes

Laura van Damarchive page

January 1, 1997

After 10 difficult years of work, researchers announced in 1993 that they had located the gene causing Huntington’s disease. Since then, the pace at which other disease-causing genes have been found has greatly accelerated, so that genes are typically located in no more than two years. Now a new tool that can reduce the time needed for a critical step involved in gene identification, from perhaps as much as 10 months to a day, should allow gene discoveries to be made much faster still.

The tool is a map showing the approximate positions of thousands of genes-of still largely unidentified function-along the genome, the complete set of our 3 billion chemical “base pairs” of DNA. The map, which is being developed by an international consortium of 104 scientists, including researchers at the Whitehead/MIT Center for Genome Research (CGR), was made possible by a concerted previous effort that identified the chemical makeup of more than 450,000 short sections of DNA that lead to the manufacture of protein fragments. Researchers recognize that since genes direct protein production, the short sections-known as complementary DNAs, or cDNAs-are portions of genes and thus of potentially great value. But no widespread group of scientists has previously determined just what genes these DNA sections belong to.

The consortium workers began by taking copies of the cDNAs and, by comparing them for similarities, have created clusters, with each group representing an individual gene. From each cluster the researchers have then identified a representative cDNA and also looked for its presence in DNA sections whose locations on the genome are known from previous research. Where the cDNA shows up, as determined by a test that makes millions of copies of the representative material, suggests the proper map position for the cDNA and hence the probable location of that gene. To gain confidence in their placement decisions, the researchers have repeated this procedure. Because the task entails an enormous amount of repetitive work, Whitehead workers have relied on robots they earlier developed to compare stretches of DNA.

Rush to Publish

So far, the consortium members have positioned on the new “gene map” representative cDNAs corresponding to some 16,000 genes, of a total of perhaps 80,000 human genes, says Thomas J. Hudson, assistant director of CGR and recently also appointed assistant professor of medicine and human genetics at McGill University. These results, published October 25 in the journal Science, mean that the gene map today can help researchers rapidly locate genes of interest maybe one of every five times. But Hudson says he anticipates that the gene map could include perhaps 55,000 gene sites-corresponding to possibly two-thirds of all genes-within two years, with all information being added immediately to a new site on the World Wide Web. He notes that the consortium, which also includes researchers at the National Center for Biotechnology Information at the National Institutes of Health (NIH), Stanford and Oxford universities, the nonprofit institutes Gnthon in France and the Sanger Centre in England, decided to publish the results after only one and a half years of work because the information could prove so valuable to genetic research.

A gene hunter can use the new map after first carefully studying families with a certain disease or trait and finding a large region of DNA inherited along with the condition. The map then immediately indicates to the researcher at least some of the genes lying in that region. In the past, finding candidate genes meant searching through amounts ranging from tens of thousands to perhaps a couple of million DNA base pairs-a step that could take months-for series of chemical units indicating various portions of genes. Removing any part of that process reduces the research period. The final step in identifying the correct gene remains the same: the scientist has to figure out which gene mutates to result in the condition of interest.

Consortium members have also included on the Web site some additional information on the mapped genes. By comparing the chemical makeup of those genes with that of other species’ genes whose functions are already known, the researchers have been able to make educated guesses about the function of one-fifth of the genes listed on the site. Such details could make gene hunters’ tasks easier still.

Francis S. Collins, the director of NIH’s National Center for Human Genome Research, points out that the evolving gene map will not only speed up efforts to find single genes associated with certain diseases and traits but will be essential for locating the suites of genes associated with common conditions such as diabetes and obesity in which more than one gene plays a role. The use of “brute force” alone-the technique that had to be employed before the development of the gene map-to genetically identify the causes of such complex diseases would be “extremely difficult,” he says, because many genes are involved. Without the new tool, he points out, researchers would be stuck with “a whale of a lot of DNA” to pick through.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.