Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

{ action.text }

The Austrian biochemist, Erwin Chargaff, is famous for the two rules he discovered that now bear his name. At the time of this discovery, in 1950, the biggest problem in biology was understanding the structure of DNA. Chargaff’s rules turned out to be an important clue in this puzzle.

Biologists had long known that DNA was built out of four molecules: adenine, guanine, thymine and cytosine. They assumed that these molecules occurred in equal quantity and dismissed any measurements that hinted otherwise as experimental errors.

Chargaff showed through careful measurement that this assumption was wrong. He found that the amount of adenine equalled that of thymine and the amount of guanine equalled that of cytosine but these were not equal to each other. The rough figures are: A=T=30% and G=C=20%.

Chargaff’s first parity rule, as this is now called, was an important clue that James Watson and Francis Crick used to develop their base pair model for the double helix structure. Biologists now know that since A binds with T and G binds with C to form a double helix, this rule holds for all double stranded DNA.

Chargaff went on to discover that an approximate version of his rule also holds for most (but not all) single-stranded DNA. That’s much more of a puzzle and biologists still aren’t quite sure why it is true.

Chargaff’s rules are important because they point to a kind of “grammar of biology”, a set of hidden rules that govern the structure of DNA. This grammar ought to reveal itself as patterns in DNA that are invariant across all species.

But in the 60 years since Chargaff discovered his invariant patterns, no others have emerged. Until now.

Today, Michel Yamagishi at the Applied Bioinformatics Laboratory in Brazil and Roberto Herai at Unicamp in Sao Paulo, say they’ve discovered several new patterns that significantly broaden the grammar of DNA.

Their approach is straightforward. These guys use set theory to show that Chargaff’s existing rules imply the existence of other, higher order patterns.

Here’s how. One way to think about the patterns in DNA is to divide up a DNA sequence into words of specific length, k. Chargaff’s rules apply to words where k=1, in other words, to single nucleotides.

But what of words with k=2 (eg AA, AC, AG, AT and so on) or k=3 (AAA, AAG, AAC, AAT and so on)? Biochemists call these words oligonucleotides. Set theory implies that the entire sets of these k-words must also obey certain fractal-like patterns.

Yamagishi and Herai distil them into four equations.

Of course, it’s only possible to see these patterns in huge DNA datasets. Sure enough, Yamagishi and Herai have number-crunched the DNA sequences of 32 species looking for these new fractal patterns. And they’ve found them.

They say the patterns show up with great precision in 30 of these species, including humans, e coli and the plant arabidopsis. Only human immunodeficiency virus (HIV) and Xylella fastidiosa 9a5c, a bug that attacks peaches, do not conform.

“These new rules show for the first time that oligonucleotide frequencies do have invariant properties across a large set of genomes,” they say.

That could turn out to be extremely useful for assessing the performance of new technologies for sequencing entire genomes at high speed.

One problem with these techniques is knowing how accurately they work. Yamagishi and Herai suggest that a simple test would be to check whether the newly sequenced genomes contain these invariant patterns. If not, then that’s a sign the technology may be introducing some kind of bias.

This is a bit like a checksum test for spotting accidental errors in blocks of data and a neat piece of science to boot.

Ref: arxiv.org/abs/1112.1528: Chargaff’s “Grammar of Biology”: New Fractal-like Rules

5 comments. Share your thoughts »

Tagged: Biomedicine

Reprints and Permissions | Send feedback to the editor

From the Archives

Close

Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me