Skip to Content

Microsoft Reports a Big Leap Forward for DNA Data Storage

Microsoft says DNA could be a better way to store data for the long term than the magnetic tape companies rely on today.

It looks like a test tube with dried salt at the bottom, but Microsoft says it could be the future of data storage. The company reported today that it had written roughly 200 megabytes of data, including War and Peace and 99 other literary classics, into DNA.

Researchers have demonstrated that digital data can be stored in DNA before, but Microsoft says none have written so much of it into DNA at once.

DNA is a good storage medium because data can be written into molecules more densely than the basic elements of conventional storage technologies can pack it in, says Karin Strauss, Microsoft's lead researcher on the project, which also involves researchers from the University of Washington. Right now the technique is expensive and finicky, but the company hopes to piggyback on the plunging costs of tools for creating and reading out DNA driven by the biotech industry. DNA is seen as a potential replacement for magnetic tape, which is the standard mechanism for long-term data stores today.

“The company is interested in learning whether we can create an end-to-end system that can store information, that’s automated, and can be used for enterprise storage, based on DNA,” says Strauss.

The pink smear in this test tube is DNA that has been synthesized to store digital data for long-term storage. Microsoft used the same technique to store roughly 200 megabytes of data.

Strauss says the project is motivated by the fact that electronic storage devices are not improving as quickly as the amount of data we use grows. “If you look at current projections, we can’t store all the information we want with devices at the cost that they are,” she says.

IDC predicts that the worldwide total of stored digital data will hit 16 trillion gigabytes next year, most of it housed in huge data centers. Strauss estimates that a shoebox worth of DNA could hold the equivalent of roughly 100 giant data centers.

DNA can also be remarkably durable, particularly when kept cool and dry. In March, researchers announced that they had partially reconstructed the genomes of ancient humans whose bones had been in a Spanish cave for more than 400,000 years. In contrast, the magnetic tape that is the best long-term data storage option today lasts only a few decades before starting to degrade.

Storing data in DNA requires translating the 1s and 0s of binary digital files into long strings of the four different nucleotides, or bases, that make up DNA strands and write out the genetic code. In 2012, Harvard molecular biologist George Church wrote a 50,000-word book totaling less than a megabyte of data into DNA and printed it onto a glass chip smaller than a pollen grain. This year he reported having encoded 22 megabytes of digital data.

Microsoft says it has now written almost 10 times as much digital data into a collection of millions of pieces of DNA, each 150 bases long.

Reinhard Heckel, a postdoctoral researcher at University of California, Berkeley, who has worked on how to store data in DNA, calls that "impressive." But he says that the largest obstacle to making DNA data storage useful is the cost, because making custom DNA molecules is expensive. "For people to really pick it up, you need to store something cheaper than on tape, and that’s going to be hard," says Heckel.

Microsoft won’t disclose details of what it spent to make its 200-megabyte DNA data store, which required about 1.5 billion bases. But Twist Bioscience, which synthesized the DNA, typically charges 10 cents for each base. Commercially available synthesis can cost as little as .04 cents per base. Reading out a million bases costs roughly a penny.

Strauss is confident that the costs of reading and writing DNA will plunge significantly in coming years. She says there is already evidence that they are falling faster than the cost of fabricating transistors did over the past 50 years, a trend that has been the engine of much innovation in computing.

It would have cost about $10 million to sequence a human genome in 2007 but close to only $1,000 in 2015.

Keep Reading

Most Popular

light and shadow on floor
light and shadow on floor

How Facebook and Google fund global misinformation

The tech giants are paying millions of dollars to the operators of clickbait pages, bankrolling the deterioration of information ecosystems around the world.

This new startup has built a record-breaking 256-qubit quantum computer

QuEra Computing, launched by physicists at Harvard and MIT, is trying a different quantum approach to tackle impossibly hard computational tasks.

wet market selling fish
wet market selling fish

This scientist now believes covid started in Wuhan’s wet market. Here’s why.

How a veteran virologist found fresh evidence to back up the theory that covid jumped from animals to humans in a notorious Chinese market—rather than emerged from a lab leak.

protein structures
protein structures

DeepMind says it will release the structure of every protein known to science

The company has already used its protein-folding AI, AlphaFold, to generate structures for the human proteome, as well as yeast, fruit flies, mice, and more.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.