Microsoft Reports a Big Leap Forward for DNA Data Storage
Microsoft says DNA could be a better way to store data for the long term than the magnetic tape companies rely on today.
It looks like a test tube with dried salt at the bottom, but Microsoft says it could be the future of data storage. The company reported today that it had written roughly 200 megabytes of data, including War and Peace and 99 other literary classics, into DNA.
Researchers have demonstrated that digital data can be stored in DNA before, but Microsoft says none have written so much of it into DNA at once.
DNA is a good storage medium because data can be written into molecules more densely than the basic elements of conventional storage technologies can pack it in, says Karin Strauss, Microsoft's lead researcher on the project, which also involves researchers from the University of Washington. Right now the technique is expensive and finicky, but the company hopes to piggyback on the plunging costs of tools for creating and reading out DNA driven by the biotech industry. DNA is seen as a potential replacement for magnetic tape, which is the standard mechanism for long-term data stores today.
“The company is interested in learning whether we can create an end-to-end system that can store information, that’s automated, and can be used for enterprise storage, based on DNA,” says Strauss.
Strauss says the project is motivated by the fact that electronic storage devices are not improving as quickly as the amount of data we use grows. “If you look at current projections, we can’t store all the information we want with devices at the cost that they are,” she says.
IDC predicts that the worldwide total of stored digital data will hit 16 trillion gigabytes next year, most of it housed in huge data centers. Strauss estimates that a shoebox worth of DNA could hold the equivalent of roughly 100 giant data centers.
DNA can also be remarkably durable, particularly when kept cool and dry. In March, researchers announced that they had partially reconstructed the genomes of ancient humans whose bones had been in a Spanish cave for more than 400,000 years. In contrast, the magnetic tape that is the best long-term data storage option today lasts only a few decades before starting to degrade.
Storing data in DNA requires translating the 1s and 0s of binary digital files into long strings of the four different nucleotides, or bases, that make up DNA strands and write out the genetic code. In 2012, Harvard molecular biologist George Church wrote a 50,000-word book totaling less than a megabyte of data into DNA and printed it onto a glass chip smaller than a pollen grain. This year he reported having encoded 22 megabytes of digital data.
Microsoft says it has now written almost 10 times as much digital data into a collection of millions of pieces of DNA, each 150 bases long.
Reinhard Heckel, a postdoctoral researcher at University of California, Berkeley, who has worked on how to store data in DNA, calls that "impressive." But he says that the largest obstacle to making DNA data storage useful is the cost, because making custom DNA molecules is expensive. "For people to really pick it up, you need to store something cheaper than on tape, and that’s going to be hard," says Heckel.
Microsoft won’t disclose details of what it spent to make its 200-megabyte DNA data store, which required about 1.5 billion bases. But Twist Bioscience, which synthesized the DNA, typically charges 10 cents for each base. Commercially available synthesis can cost as little as .04 cents per base. Reading out a million bases costs roughly a penny.
Strauss is confident that the costs of reading and writing DNA will plunge significantly in coming years. She says there is already evidence that they are falling faster than the cost of fabricating transistors did over the past 50 years, a trend that has been the engine of much innovation in computing.
It would have cost about $10 million to sequence a human genome in 2007 but close to only $1,000 in 2015.
Become an Insider to get the story behind the story — and before anyone else.