Sequencing in a Flash

A new generation of DNA-sequencing machines is opening up whole new areas of genomic research. Already, researchers are unraveling how modern humans differ from Neanderthals and devising more precise tests for cancer.

Jon Cohenarchive page

May 1, 2007

On February 6, 2007, executives from 454 Life Sciences showed 78-year-old James Watson a first draft of his own genome. There was something downright poetic about this. Watson, of course, had won a Nobel Prize 45 years earlier for his role in discovering the double-helical structure of DNA; he was also a prime mover behind the Human Genome Project, which by its completion in 2003 had spent nearly $3 billion over 13 years extracting the blueprint that those helices encode. Now 454 had moved a step beyond that megaproject, which pooled many people’s DNA to determine the genetic sequence of what amounts to a model human. The company and its so-called next-generation sequencing machine had single-handedly read the genetic code of an individual–one whose work had done so much to make the achievement possible.

454’s Jonathan Rothberg believes his machines will make sequencing so cheap and fast that it will become practical to read the genomes of individuals.

But Jonathan Rothberg, who founded 454 in Branford, CT, with the dream of producing a sequencing machine more efficient than those available to the Human Genome Project, does not mention poetry when he recounts his meeting with Watson. Rather, he talks about money, speed, and a future in which ordinary people carry around their personal genomes on discs–an increasingly plausible scenario. “It cost us $200,000 to do Jim Watson,” points out Rothberg. “And we did it mostly in December and January.”

Rothberg, who now chairs 454’s board of directors, emphasizes that “Project Jim” remains a work in progress and will require more time and money. As of February, the company had sequenced Watson’s DNA only three times (each run increasing accuracy and filling gaps); nine passes were required to produce the Human Genome Project’s final draft sequence. But still.

Rothberg’s company is just one of several, including Illumina of San Diego and Applied Biosystems of Foster City, CA, developing machines that can decode DNA faster than ever before. And just as the cost of computer power has plummeted with the steadily increasing density of transistors on chips, the price of sequencing DNA has fallen rapidly with the advent of these machines. Today, the price tag on a human genome decoded with sequencers of the type used in the Human Genome Project would be $25 million to $50 million. It drops to around $1 million with next-generation machines available today and could be as low as $100,000 by 2008.

Multimedia

View a graphic of the secrets of sequencing.

As the history of computers has shown, more processing power for less money can lead to unanticipated applications. In the wake of the Human Genome Project, researchers faced difficult financial decisions about which genomes to sequence next: chimpanzee or macaque, cow or dolphin, rice or cassava. The new machines make it possible to sequence nearly everything of interest. And as ever more sequence data flows into databases, whole new areas of research are opening up. Scientists now have an unprecedented ability to make comparisons between species, shedding light on everything from evolutionary questions to genetic reasons for individual differences in disease resistance and susceptibility. Research done with 454’s machines and published in top journals includes the partial sequencing of a Neanderthal genome and the development of new tests for cancer-causing genetic mutations–technology that may help doctors tailor treatments to their patients.

“The last year has been the most exciting period in genomics since the days of the Human Genome Project,” says Eric Lander, first author on the project’s first published draft of the human genome and now head of the Broad Institute for genomic medicine in Cambridge, MA. “Sequencing is becoming cheap enough and powerful enough that it can be applied to about any problem. It’s standing the field on its head.” Francis Collins, who led the Human Genome Project for the National Institutes of Health, predicts that the new sequencing technologies “will have profound consequences for the future of biomedical research and, ultimately, for the practice of medicine.”

A Unique Solution
Jonathan Rothberg’s office has a diner theme, with a red-and-black checkerboard tile floor, red Naugahyde-covered chrome chairs, and a sofa with arms that imitate the rear of a 1959 Cadillac, complete with monster tail fins and bullet-shaped taillights. Instead of a desk, he has a diner bar with bar stools. Wine bottles from a Connecticut vineyard he owns line some of the shelves. Beyond the windows lies the Long Island Sound. The place screams I am unique. And so Rothberg is. In 1991, while completing his PhD in biology at Yale University, he started CuraGen, one of the first companies to develop drugs based on genomics. In addition to CuraGen and 454 Life Sciences, he has founded an institute for the study of childhood diseases and yet another biotech company, RainDance Technologies, which has developed what it calls “liquid circuit boards” that are designed to make experiments more efficient by manipulating tiny quantities of fluid. And all that by the age of 43.

Indeed, it was an interest in the uniqueness of each person that ultimately led him to try to design a sequencer that he hopes will one day make genome checks as routine as blood tests are now. Rothberg holds up the guts of the 454 machine, a glass slide with 1.6 million miniature wells, each approximately 50 micrometers wide (about half the width of a human hair) and 55 micrometers deep. It is this chip that allows the machine to sequence DNA so quickly, because a separate chemical reaction can be carried out in each well.

Gene sequencing takes advantage of the fact that the two strands of a DNA helix are complementary: of the four chemical “bases” adenine, guanine, thymine, and cytosine, which are strung together in various orders on each strand, adenine pairs only with thymine, and guanine only with cytosine. In the most commonly used sequencing technique, which builds on a scheme developed 30 years ago by the University of Cambridge’s Frederick Sanger, fragments of DNA are separated into single strands and exposed to free nucleotides, which bind to the original As, Cs, Ts, and Gs to generate new complementary strands. These strands vary in length because some of the free nucleotides have been modified to prevent the reaction from continuing; when one of these bases binds to its target, the chain stops growing. And each of these four types of chain terminators has a different fluorophore attached that fluoresces when struck by a laser beam. An electric current separates the strands by size, and the laser reads the colors to determine which was the last base added to each chain, spelling out the sequence. The vast majority of labs that do sequencing today use a machine made by Applied Biosystems that spits out about two million bases a day.

The latest sequencer from 454 can read 300 million a day.

The 454 method avoids several of the more time-consuming steps of conventional sequencing, such as the separation of strands by size. Unlike Sanger sequencing, it doesn’t terminate chains: it records bases as they’re added to a growing strand. First, a DNA molecule is randomly chopped into different lengths. Then each fragment is stripped into single strands, and each strand is attached to a separate tiny bead. A biochemical process copies the single strands, so that 10 million clones jut out from each bead. Each bead is then packed into one of the 1.6 million wells. As, Cs, Ts, and Gs wash over the wells sequentially to synthesize new complementary strands.

Here’s the truly clever part: using a method first described by Pål Nyrén and coworkers at Sweden’s Royal Institute of Technology, 454’s sequencer instantly records when a base is added to each strand by exploiting the fact that the binding reaction releases a chemical called a pyrophosphate. In the wells of the 454 machine, the pyrophosphate is captured by a chemical cascade that ends up flicking on the enzyme luciferase (which occurs naturally in fireflies)–emitting a burst of light. A standard charge-coupled device of the kind used in digital cameras and telescopes detects each flash, reading off the sequence of As, Cs, Ts, and Gs in each fragment. The process can read about 200 to 300 bases in a row. As in conventional sequencing, computers then look for matching sequences at the end of one fragment and the start of another, piecing the fragments back together in the correct order.

The sequencer that 454 brought to market in October 2005 had a few serious limitations. It could read only 100 bases in a row (the longer the stretch of bases in each sequenced fragment, the easier it is to assemble a complete genome), and it also had trouble accurately mapping repetitive stretches–say, six As back to back. But Rothberg says 454’s philosophy was “Get it out early; get it accepted.” The company first targeted “early adopters” like Broad’s Lander, hoping they would soon publish findings that relied on the sequencer. “You’ve got to get early guys first, but the rest of the guys, the followers, are where the market is,” says Rothberg. “And they read peer-reviewed papers.”

Neanderthals
One paper by an early adopter that received widespread attention from scientists and the public alike was a study of Neanderthal DNA led by Svante Pääbo of the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany. Neanderthals, the closest species to modern humans, disappeared some 30,000 years ago, and more speculation than fact surrounds their genetic relationship to us. Though Pääbo had done some previous studies with Neanderthal DNA, anything beyond rudimentary analysis had proved too difficult and costly. The problem is that over thousands of years, the few known samples of Neanderthal DNA from fossils have been degraded to short fragments of around 50 to 75 base pairs. In addition, the DNA is often contaminated with genetic material from microörganisms and the modern humans who have handled the fossils. But Rothberg believed that the 454 machine could analyze many short sequences at little cost, generating enough information to let scientists sift ancient treasures from junk. Rothberg cold-called Pääbo, who agreed to collaborate.

After sequencing genes from 70 Neanderthal bone and tooth samples, Pääbo’s team and researchers from 454 found one sample, estimated to be 38,000 years old, that had mostly clean DNA. As they reported in a paper published last fall in Nature, they then sequenced one million base pairs from less than 200 milligrams of material, an achievement that has yielded clues about whether modern humans and Neanderthals interbred and when the two species diverged from each other. More important, the paper shows that sequencing all three billion bases in the Neanderthal genome is feasible. Doing so could help solve such mysteries as whether Neanderthals had the genetic ability to speak.

Sorting out whether humans and Neanderthals interbred or even had the capacity to talk to each other may get a lot of press and public attention, but other applications for ultrarapid DNA sequencing could have a far greater impact on medicine and on our lives. The traditional sequencing method looks at DNA from many different cells. But if one of those cells is, say, a tumor cell, its sequence can differ slightly from those of the healthy cells. In such cases, the computers select the sequence that’s most commonly found and discard the others. Next-generation sequencers like the ones marketed by 454 instead clone and sequence single molecules of DNA, allowing “ultradeep” probing that can unearth rare variants. (Traditional sequencers can also analyze single molecules, but it’s prohibitively expensive.) The implications of single-molecule sequencing are enormous for medicine. While it is not practical to use conventional sequencing to sniff out the DNA differences between healthy and diseased cells, the new machines can perform such experiments easily.

Matthew Meyerson, a clinical pathologist at the Dana-Farber Cancer Institute in Boston, has published a study showing how the 454 machine can help uncover mutations linked to lung cancer. Lung-cancer drugs now available target the gene that Meyerson is sequencing, and he hopes that physicians will ultimately gain a better handle on who will respond to which drugs by learning whether the patient has a particular mutation. “I imagine in a few years all cancer patients will have their tumors characterized by single-molecule sequencing if the technology continues to decrease in cost,” he says.

In a variation on this theme, Michael Kozal, an AIDS clinician at Yale, has joined with 454 to do ultradeep sequencing of HIV to determine the presence of minor populations of drug-resistant virus. Early tests of the technique in patients detected about twice as much resistant HIV as Sanger sequencing did. This information, too, could help physicians individualize treatment regimens, which would increase cost-effectiveness. “It’s practical to do in our system,” says 454 chief scientist Michael Egholm, who is collaborating with Kozal. “Before, it simply wasn’t affordable.”

MyGenome
George Church, a sequencing pioneer at Harvard Medical School, says cost is the key. As their prices fall in the next few years, he says, these machines will become a democratizing force that will make traditional sequencers all but obsolete, much the way personal computers displaced mainframes. And this will lead to applications that no one can yet fathom. “If we were still working with mainframes, a lot of cool stuff wouldn’t be happening,” he says.

Church, who was among the dozen researchers to propose the Human Genome Project in the mid-1980s, is one of the few biologists whose lab equipment includes a table-mounted vise grip and a drill press. He uses equipment like this to build his own next-generation sequencers, of which his lab currently has eight (see TR35, September/October 2006 ). Convinced that companies are overcharging for their machines, he makes a point of freely sharing his know-how with any interested colleagues. He compares his philosophy to the “wiki and Linux mentality,” saying, “If a bunch of ants get together, they can move a rubber-tree plant.”

Church’s grand vision is to channel the cheap flood of As, Cs, Ts, and Gs into what he calls the Personal Genome Project. In the Human Genome Project, researchers obtained DNA from several people, each of whom, for privacy reasons, remains anonymous. So the final sequence represents a composite person with a conglomerate of different genetic backgrounds and medical histories. Church wants his Personal Genome Project to decode the DNA of individuals, who will also volunteer their medical records. He will post all the resulting data on the Internet. Ultimately, he imagines, millions of people will join the project, posting their sequences, medical records, and, if they choose, even facial photographs online. The entire world will then have access to all the data it needs to freely test hypotheses.

Although Church has received substantial funding from the National Institutes of Health to develop sequencing technology, the ethical, legal, and social questions raised by the personal Genome Project have kept NIH from supporting it, despite a positive review of a grant application in August 2005. “As soon as I got approval, NIH got all excited, and not necessarily in a good way,” he says. He’s attempted to address the privacy and confidentiality issues, noting that no one’s identity needs to be made public and that NIH already funds human genetics projects that have fewer safeguards in place.

Church recognizes that intimate knowledge of their own DNA might be too much for many people. “You don’t let your kids browse to Internet pornography sites,” he says, “and to some extent you don’t allow yourself to browse the scariest, grossest sites.” He expects that rather than accessing their raw genomes, people will have professionals help them interpret the information.

Despite the lack of federal funding and the ethical objections, Church is proceeding, confident that advances in sequencing technology will drive the idea of a Personal Genome Project forward–just as advances in information technology have led strangers to share data in ways that no one dreamed of when the dual-floppy-drive Apple II debuted 30 years ago. As sequencers become more efficient, he believes, and costs continue to drop, personal genomics will take off on a scale that few people have yet imagined.

Winning the Lottery
Last October, the X Prize Foundation announced a $10 million award for producing highly accurate sequences of 100 human genomes in 10 days or less without spending more than $10,000 per genome. One of the first entrants was 454, which plans to develop even smaller beads that it hopes will allow its machines to read even more DNA per run at roughly the same cost. “We don’t need any new physics or math to get to the $1,000 genome,” says Rothberg.

Leaving aside the question of when–or if–anyone will claim the X Prize, DNA sequencing will surely continue to plummet in price and increase in accuracy. “Until last year, sequencing was really struggling to have the impact on the next era of genomics that it needed to have,” says David Bentley, Illumina’s chief scientist. Basically, the price of traditional sequencing was just not dropping quickly enough. “Now the field is far more optimistic than it was,” he says. Next-generation sequencing “has a huge role to play.”

Hearing scientists tick off the possibilities is like listening to lottery winners. And personalized medicine like the type of cancer testing and treatment that Dana-Farber’s Meyerson hopes to help usher in is just a starting point. Bentley says the new sequencers will open windows on the vast “noncoding” regions of the genome that turn genes on and off. Egholm of 454 notes that the Human Genome Project did not actually sequence every last bit of human DNA; there may still be undiscovered genes that additional sequencing can find. Broad’s Lander imagines a torrent of new information about what leads a cell to differentiate into one type or another (a central mystery in developmental biology) and what controls different cellular states. “I realize that’s harder to explain than curing cancer,” he says, “but it’s ultimately more important, because it will affect all diseases.”

Within the next year, Lander predicts, scientists will be able to begin studies that generate “terabases” of information–one trillion As, Cs, Ts, and Gs. “I never even spoke the word terabase before last year,” he says. “And if all those data are on the Web and freely available, it’s going to drive a completely different kind of biology.”

Jon Cohen, a San Diego-based freelance writer and correspondent for Science, is working on a book that looks at the genetic differences separating chimpanzees from humans.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.