Five years ago, Craig Venter let out a big secret. As president of Celera Genomics, Venter had led the race between his company and a government-funded project to decode the human genome. After leaving Celera in 2002, Venter announced that much of the genome that had been sequenced there was his own. Now Venter and colleagues at the J. Craig Venter Institute have finished the job, filling in the gaps from the initial sequence to publish the first personal genome.
His newly released genome, published today in the journal PLoS Biology, differs from both of the previous versions of the human genome (one from Celera, the other from the Human Genome Project) in that it details all of the DNA inherited from both mother and father. Known as a diploid genome, this allows scientists to better estimate the variability in the genetic code. (In a genome sequence generated from a conglomerate of different individuals, some variations are lost in the averaging.) Within the genome of 2.810 billion base pairs, scientists found 4.1 million variations among the chromosomes; 1.2 million of these were previously unknown. Of the variations, 3.2 million were single nucleotide polymorphisms, or SNPs, the most well-characterized type of variation, while nearly one million were other kinds of variants, including insertions, deletions, and duplications.
Venter’s genome will join that of another genomic pioneer, James Watson, codiscoverer of the structure of DNA. (See “The $2 Million Genome.”) Announced in June, Watson’s genome was sequenced by 454, a company based in Branford, CT, that’s developing next-generation sequencing technologies. (For more on 454’s technology, see “Sequencing in a Flash.”)
Venter’s and Watson’s genomes are likely just the first in an upcoming wave of personal genomes, a crucial step in the advent of personalized medicine: the ability to tailor medical treatments to an individual’s genetic profile. (See “The X Prize’s New Frontier: Genomics.”) Venter has already explored some of his genome, discovering that he carries genetic variations that put him at increased risk for Alzheimer’s disease, heart disease, and macular degeneration. He says that he’s been religiously taking statins, cholesterol-lowering drugs, ever since.
Venter talks with Technology Review about what lies ahead for his genome.
Technology Review: Why did you decide to embark on this project?
Craig Venter: The genome we published at Celera was a composite of five people. To put it together, it became clear that we had to make some informatics compromises–we had to leave out some of the genetic variation. We knew the only way to truly understand the genome would be to have the genome of one individual. Rather than starting from scratch, we decided to take what we had from the Celera genome and add more sequence. The goal was to get an accurate reference sequence from a single individual.
TR: How does your genome sequence add to what we know from the Human Genome Project?
CV: The government labs sequenced and assembled a composite haploid genome from several individuals [meaning it included a DNA sequence from only one of each chromosome pair]. There was the assumption back then that having half of the genome was all that was needed to understand human complexity. But it’s become clear that we need to see the composite of the sets of chromosomes from both the mother and father to see the variation in the genome.
This genome has all the insertions and deletions and copy-number differences. That gives us a very different view.
TR: What’s the most exciting finding so far?
CV: For me, the most exciting finding is that human-to-human variation is substantially higher than was anticipated from versions of the human genome done in 2001. If fact, it might be as much as tenfold higher: rather than being 99.9 percent identical, it’s more like 99 percent identical. It’s comforting to know we are not near-identical clones, as many people thought seven years ago.
TR: How will scientists use your genome sequence?
CV: It will serve as a reference genome. This is probably the first and last time anyone will spend the time, money, and energy to sequence a diploid genome using highly accurate Sanger sequencing. Future genomes, like those from 454 or George Church’s Personal Genome Project, will be layered onto [existing] data, adding to the completeness of this genome. (See “The Personal Genome Project.”) [The traditional Sanger sequencing method, used for the Human Genome Project and to generate Venter’s sequence, generates longer pieces of DNA than do newer methods, such as that used by 454, making it easier to assemble the overlapping pieces.]
TR: James Watson released a version of his own genome earlier this summer. How is yours different?
CV: There has been nothing published yet on his genome, so we have no idea. But as I understand it, in contrast to really assembling a genome, they sequenced short fragments that are layered onto the sequence assembled at the NIH. So there are a lot of technical differences, but until it’s published, we won’t really know.
TR: You’ve had sections of your genome in the public domain for several years now. Any second thoughts about putting the entire high-quality sequence out there?
CV: No. And I applaud Watson for doing this as well. A key part of the message here is that people should not be afraid of their genetic codes or afraid to have other people see them. That’s in contrast to the notion that this is dangerous information that should be kept under lock and key. We’re not just our genetic code. There is very little from the code that will be 100 percent interpretable or applied.
TR: Have you searched your genome for disease-related mutations?
CV: Yes. I have a book coming out in October called A Life Decoded where I look at many variants and try to put them in context of my life. For example, I have a high statistical probability of having blue eyes, but you can’t be 100 percent sure from my genome that I have them. The message is that everything in our genomes will be a statistical uncertainty. We’re really just in the first stages of learning that.
Previous published genomes don’t represent anyone, so we can’t interpret human biology based on these. But now we can start to make human-genome inferences. We’ll need tens of thousands to millions of genomes to put together a database that would allow interpretation of multiple rare variants and what they mean. That will take decades.
TR: How much did the project cost?
CV: The goal was not to see how cheaply we could sequence a genome; it was to see how accurately we could do it. It was clearly a multimillion-dollar project over the years.