Today, four months and many late nights after the anxiety-riven planning meeting, the gene factory is complete. The only sign that the glass building contains what is arguably the world’s most prolific molecular biology lab are two massive air conditioning units crouching in the grass. The chillers, too heavy to sit on top of the building, cool 1,600 cubic meters of air per minute and pipe it into the heart of the facility, where 257 new sequencing machines hum in orderly rows.
The gray, waist-high 3700-model machines were developed over two years in near-secrecy by Perkin-Elmer’s West Coast subsidiary, Applied Biosystems. Just one of those machines, says Venter, has more sequencing capacity than many big academic labs, most of which rely on an earlier model called the 377. Altogether, Venter calculates, Celera can decode nearly as much DNA in one day as all the major labs funded by the Human Genome Project produced last year.
It’s what’s inside these new machines that makes them so fast. Each contains 104 glass capillaries: hollow, hair-thin tubes that the machine can automatically fill with a syrupy polymer and later clean out with a dilute solution of nitric acid. The sequencer’s job is to sort DNA fragments by size. Pulled along by an electric field, small fragments move through the tubes faster than large ones. The capillaries replace cumbersome cafeteria-tray-sized slabs of toxic gel used in previous models, which had to be changed by a skilled technician every few hours. Stocked with chemicals and more than 1,000 DNA samples, the automated 3700 can run for nearly two days without human intervention, says Mark Adams, the young scientist who supervises Celera’s sequencing operation. At full capacity, Celera expects to read 100 million letters of DNA sequence each day.
More than half of Celera’s personnel-backed by eight 6-foot, 64-bit computer servers located in an adjacent building-will be devoted to unscrambling the avalanche of data streaming from the sequencing facilities. Leading the analysis is Gene Myers, an expert on pattern analysis on leave from the University of Arizona’s computer science department.
The challenge Myers’ staff will face is something like reassembling a complete Bible from 10 copies that have been torn into tiny pieces. Since the sequencing machines can read only short stretches of DNA, the genome must first be broken into smaller pieces. Celera scientists began by taking DNA from a number of human cells and chemically shredding it into millions of random, overlapping fragments a few thousand letters long. To keep a library of these fragments, the scientists grafted them into colonies of E. coli bacteria. Following the shotgun strategy, Celera will then sequence 500 letters from each end of a fragment-repeating the process across the entire library yields 70 million separate sequences.
Myers’ task is to develop algorithms that can assemble these elements once their code has been read. Although it sounds like a straightforward job-just line up overlapping letters and start pasting-it is anything but. Take the ripped-up Bible. Common phrases such as “Thou shalt not…” or “Blessed are they…” would make reassembling the good book much harder because some fragments appear to overlap when, in fact, they don’t. The genome is similarly crammed with repeated sequences, some short, some long, some present in a million copies, others repeated only twice.
For that reason, scientists working on the publicly funded Human Genome Project have laboriously mapped out the genome before starting to sequence. Roughly like figuring out where the Bible’s chapters go before tearing up the pages, it means they will then have to reassemble many small piles, rather than one huge one. Elbert Branscomb, director of the Department of Energy’s Joint Genome Institute, thinks Celera’s 70-million-piece puzzle may be unsolvable. “How much of a problem this will be no one even has a moderately good guess,” says Branscomb.
Myers contends that the key to the solution is that Celera’s puzzle pieces come in pairs lifted from the ends of a single fragment, the total length of which they know. The pairs, he believes, will constrain the problem enough to arrive at a unique solution. Outside scientists say Celera’s strategy would be impossible without the sequences already developed at publicly funded labs, but Myers maintains the puzzle could be solved anyway. “Outside information is just an expedient,” he says. “If we were going to do a genome that we have no data about, say Bermuda grass, we could do a self-contained operation.”
Whether or not Celera’s operation represents top-notch science is still a matter of some debate in the genome community. Without a doubt, Celera’s version of the genome will have many, many small gaps. A photocopy, if you will, that gives the big picture and most of the detail but may fall short of the high-fidelity standard envisioned by the Human Genome Project.