1,000 Genomes

Gene-sequencing projects keep getting bigger.

Emily Singerarchive page

January 22, 2008

In a testament to the steady plummet in sequencing costs, today the National Human Genome Research Institute (NHGRI) announced a massive international collaboration to sequence the genomes of 1,000 people from around the world.

According to the NHGRI statement,

“The 1000 Genomes Project will examine the human genome at a level of detail that no one has done before,” said Richard Durbin, Ph.D., of the Wellcome Trust Sanger Institute, who is co-chair of the consortium. “Such a project would have been unthinkable only two years ago. Today, thanks to amazing strides in sequencing technology, bioinformatics and population genomics, it is now within our grasp. So we are moving forward to build a tool that will greatly expand and further accelerate efforts to find more of the genetic factors involved in human health and disease.”

The findings should give added power to the recent wave of studies identifying specific genetic risk factors for common health problems, such as diabetes, heart disease, lupus, and others. (See “Genes for Several Common Diseases Found.”)

According to NHGRI director Francis Collins,

“This new project will increase the sensitivity of disease discovery efforts across the genome five-fold and within gene regions at least 10-fold. Our existing databases do a reasonably good job of cataloging variations found in at least 10 percent of a population. By harnessing the power of new sequencing technologies and novel computational methods, we hope to give biomedical researchers a genome-wide map of variation down to the 1 percent level. This will change the way we carry out studies of genetic disease.”

Like previous international sequencing projects, the data will be made available for analysis in free public databases. Once scientists identify part of the genome associated with a particular disease, they will be able to look up that area of the genome in the database to find a list of gene variants in that region.

The project will be a huge technological feat; to date, only three human genomes have been sequenced.

From NHGRI:

The project depends on large-scale implementation of several new sequencing platforms. Using standard DNA sequencing technologies, the effort would likely cost more than $500 million. However, leaders of the 1000 Genomes Project expect the costs to be far lower–in the range of $30 million to $50 million–because of the project’s pioneering efforts to use new sequencing technologies in the most efficient and cost-effective manner.

In the first phase of the 1000 Genomes Project, lasting about a year, researchers will conduct three pilots. The results of the pilots will be used to decide how to most efficiently and cost effectively produce the project’s detailed map of human genetic variation.

The first pilot will involve sequencing the genomes of two nuclear families (both parents and an adult child) at deep coverage that averages 20 passes of each genome. This will provide a comprehensive dataset from six people that will help the project figure out how to identify variants using the new sequencing platforms, and serve as a basis for comparison for other parts of the effort.

The second pilot will involve sequencing the genomes of 180 people at low coverage that averages two passes of each genome. This will test the ability to use low-coverage data from new sequencing platforms to identify sequence variants and to put them in their genomic context.

The third pilot will involve sequencing the coding regions, called exons, of about 1,000 genes in about 1,000 people. This is aimed at exploring how best to obtain an even more detailed catalog in the approximately 2 percent of the genome that is comprised of protein-coding genes.

During its two-year production phase, the 1000 Genomes Project will deliver sequence data at an average rate of about 8.2 billion bases per day, the equivalent of more than two human genomes every 24 hours. The volume of data–and the interpretation of those data–will pose a major challenge for leading experts in the fields of bioinformatics and statistical genetics.

The 1,000 volunteers will be selected from those who participated in the HapMap project, a map of common genetic variation (see “A New Map for Health”), and will include:

Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.