A male Yoruba from Nigeria and a Han Chinese man joined genetics luminaries James Watson and Craig Venter on Wednesday as the only people to have their genomes sequenced and made publicly available. The two anonymous genomes serve as proof that new sequencing technologies, which are orders of magnitude cheaper than standard methods, are capable of accurately reading the sequence of a complete human genome. That means that scientists will be able to sequence thousands of people, which they hope will finally enable a coherent understanding of the genomic basis of disease.
“This brings the time it takes to sequence a human genome from years to months,” says Samuel Levy, director of human genetics at the Craig Venter Institute, in Rockville, MD, who was not involved in the research. “That’s a huge technological advance. It gives us the ability to do the kinds of studies we want to do to associate genetic variations with human traits.”
Over the past decade, the cost of sequencing has dropped dramatically. While the reference sequence generated during the Human Genome Project cost $300 million, Watson’s genome, released last year and sequenced using a technology developed by 454 Life Sciences, in Branford, CT, cost about $1 to 2 million. The Yoruba genome cost an estimated $250,000 and took only two months to complete, using technology from Illumina, a genetics technology company headquartered in San Diego.
New sequencing technologies boost speed and reduce cost by simultaneously sequencing hundreds of thousands of pieces of DNA. For technical reasons, this massive parallelism reduces the number of base pairs–the DNA “letters”–that can be read from each piece. Standard sequencing methods can read 400 to 800 base pairs, but Illumina’s technology can read only 35 to 50. That makes it harder to assemble a complete sequence, which requires computationally sewing the overlapping pieces together.
Because of these short read lengths, it has been unclear how accurately technology from Illumina and other companies could sequence a human genome. In the new studies, published today in Nature, researchers from Illumina and from the Beijing Genomics Institute, in China, show that by sequencing their subjects’ genomes roughly 40 times each, they were able to read 99.9 percent of the sequence in the reference genome. The greater number of sequencing passes–standard sequencing requires only about 6 to 10 passes–is necessary to compensate for shorter read lengths. But even with the extra passes, the new technology is much cheaper.
The scientists were able to verify the accuracy of their sequences by comparing them with previous genetic analyses of the same genomes. The Yoruba DNA sequenced by David Bentley and his colleagues at Illumina had been used in previous studies that looked for single-nucleotide polymorphisms (SNPs), or genetic variations of a single letter at a time, spread out through the whole genome. Jun Wang and colleagues at the Beijing Genomics Institute, who sequenced the Chinese genome, checked their results against those from a microarray, which is designed to detect thousands of common SNPs.
The two new sequences don’t reveal any genomic surprises. Researchers found approximately four million SNPs in the Yoruba genome, about 26 percent of which had not been previously identified. The Yoruba genome displayed a higher level of genetic diversity than previously sequenced single genomes, but earlier analysis of African DNA had predicted as much. The Chinese genome, in contrast, had about 13.6 percent previously unidentified SNPs.