Single-Molecule Sequencing Gets a Crucial Fix

The technique was once considered too error-prone to be useful, but a computational fix could bring it into the world of medical research.

Karen Weintraubarchive page

July 5, 2012

A computational fix for single-molecule DNA sequencing technologies could bring the latest crop of gene readers into more widespread use.

New DNA sequencing technologies such as those offered by Oxford Nanopore and Pacific Biosciences can directly read a single molecule of DNA and can provide a clearer view of the organization of a genome and its genetic content. But the technology suffers from a high error rate. That is, too often it sees the wrong nucleotide—an A, T, G, or C—in a DNA strand.

In a new Nature Biotechnology paper, researchers show how to vastly improve single-molecule sequencing, using the method to improve the results from Pacific Biosciences’ technology. The technique could be broadly applicable to other single-molecule sequencing, such Oxford Nanopore’s approach.

Most DNA sequencers, including those driving the boom of medical genome research, read genetic information from a complicated ensemble of many pieces of DNA. They require a genome to be copied many times over and chopped into tiny strands of DNA, which will then be read by the sequencer and put back together, usually with the aid of an existing genome sequence for comparison.

But the new generation of readers can examine longer strands of DNA, providing more information about the gene inheritance patterns and overall genome structure. The longer pieces of DNA also make it easier for researchers to assemble a genome when there is not a standard to reference. While it’s costlier for large, complex genomes, it may be a better way of deriving reference genomes of various less-studied species, including some important crops. And though it’s not there yet, such single-molecule sequencing may eventually be better, as well, for sequencing unique genomes like those of cancers, which have lots of mutations.

Single-molecule analysis in general has been around for years, but its error rate was so high—roughly 15 percent—that it wasn’t considered a reliable sequencing method. In the new paper, researchers at Cold Spring Harbor Laboratory in New York figured out how to dramatically reduce that error rate. The paper’s author, Michael Schatz, an assistant professor at Cold Spring Harbor, essentially used more conventional technology made by San Diego-based Illumina to help correct the mistakes in the single-molecule method. The result is “substantially better” than using Pacific Biosciences’ technology alone, he says. “The data are basically perfect.”

His mathematical adjustments, published as open-source software for other scientists, are likely to have a major impact on Pacific Biosciences’ commercial viability, says Schatz, adding that he has no financial interest in the company. “PacBio has been struggling recently. There was uncertainty if the technology was going to pan out,” he says. “Now that we have a computerized solution that’s almost totally automated, I think that will enable more widespread adoption of the technology.”

Such single-molecule analysis won’t replace more conventional technology anytime soon, though, because it isn’t as cost-effective for large complex genomes, like people’s, according to David Jaffe and Chad Nusbaum, genomics experts at the Broad Institute in Cambridge, Massachusetts. But it may be useful for solving specific problems. “Clearly you’re getting some benefit here. It has to be balanced against the cost,” Nusbaum says.

If PacBio technology can eventually be used to sequence plant genomes like corn, however, that would be a huge achievement, Nusbaum says. “For years, people have said the corn genome is impossible. If this can give you better expressed gene sequences, then that’s hugely important to agribusiness,” he says—both for food and biofuel development.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.