Earlier this year, molecular biologists announced that 20 per cent of nonhuman genome databases are contaminated with human DNA, probably from the researchers who sequenced the samples.
Now, the human genome itself has become contaminated. Bill Langdon at University College London and Matthew Arno at Kings College London say they’ve found sequences from mycoplasma bacteria in the human genome database.
This contamination has far reaching consequences. Biotech companies use the human genome database to create DNA chips that measure levels of human gene expression. Langdon and Arno say they’ve found mycoplasma DNA in two commercially available human DNA chips.
Anybody using these chips to measure human gene expression is also unknowingly measuring mycoplasma gene expression too.
In some ways, this is hardly a surprise. “It is well known that mycoplasma contamination is rife in molecular biology laboratories,” says Langdon and Arno. With any luck the discovery of this stuff in the human genome database will focus minds on the problem.
A key question is the nature of this kind of information transmission. These mycoplasma genes are clearly successful in reproducing themselves in silico. One possibility is that we’re seeing the beginnings of an entirely new kind of landscape of infection.
Here, genes that can masquerade as human (or indeed as other organisms) can successfully transmit themselves from one database to another. And if we think of this as virtual infection, a sure bet is that we’ll be worrying about virtual evolution in the near future.
But what to do? The level of contamination and the way in which it is spreading suggests that researchers are losing the battle to eliminate it. “We.. fear current tools will be inadequate to catch genes which have jumped the silicon barrier,” they say.
Most frightening of all is the possibility that Langdon and Arno may have only scratched the surface. “Having found two suspect DNA sequences, it seems likely the published “human genome” sequence contains more,” they say.
If virtual infection is really as big a problem as Langdon and Arno suggest, we may well need to protect databases with the genomic version of antivirus software, a kind of virtual immune system.
But this in itself is likely to trigger an evolutionary arms race that selects genes most capable of beating the safeguards.
Clearly, this is a nettle that needs to be grasped quickly. That’s if it’s not too late already.
Ref: arxiv.org/abs/1106.4192: More Mouldy Data: Virtual Infection of the Human Genome