Skip to Content

Microsoft Reports a Big Leap Forward for DNA Data Storage

Microsoft says DNA could be a better way to store data for the long term than the magnetic tape companies rely on today.

It looks like a test tube with dried salt at the bottom, but Microsoft says it could be the future of data storage. The company reported today that it had written roughly 200 megabytes of data, including War and Peace and 99 other literary classics, into DNA.

Researchers have demonstrated that digital data can be stored in DNA before, but Microsoft says none have written so much of it into DNA at once.

DNA is a good storage medium because data can be written into molecules more densely than the basic elements of conventional storage technologies can pack it in, says Karin Strauss, Microsoft's lead researcher on the project, which also involves researchers from the University of Washington. Right now the technique is expensive and finicky, but the company hopes to piggyback on the plunging costs of tools for creating and reading out DNA driven by the biotech industry. DNA is seen as a potential replacement for magnetic tape, which is the standard mechanism for long-term data stores today.

“The company is interested in learning whether we can create an end-to-end system that can store information, that’s automated, and can be used for enterprise storage, based on DNA,” says Strauss.

The pink smear in this test tube is DNA that has been synthesized to store digital data for long-term storage. Microsoft used the same technique to store roughly 200 megabytes of data.

Strauss says the project is motivated by the fact that electronic storage devices are not improving as quickly as the amount of data we use grows. “If you look at current projections, we can’t store all the information we want with devices at the cost that they are,” she says.

IDC predicts that the worldwide total of stored digital data will hit 16 trillion gigabytes next year, most of it housed in huge data centers. Strauss estimates that a shoebox worth of DNA could hold the equivalent of roughly 100 giant data centers.

DNA can also be remarkably durable, particularly when kept cool and dry. In March, researchers announced that they had partially reconstructed the genomes of ancient humans whose bones had been in a Spanish cave for more than 400,000 years. In contrast, the magnetic tape that is the best long-term data storage option today lasts only a few decades before starting to degrade.

Storing data in DNA requires translating the 1s and 0s of binary digital files into long strings of the four different nucleotides, or bases, that make up DNA strands and write out the genetic code. In 2012, Harvard molecular biologist George Church wrote a 50,000-word book totaling less than a megabyte of data into DNA and printed it onto a glass chip smaller than a pollen grain. This year he reported having encoded 22 megabytes of digital data.

Microsoft says it has now written almost 10 times as much digital data into a collection of millions of pieces of DNA, each 150 bases long.

Reinhard Heckel, a postdoctoral researcher at University of California, Berkeley, who has worked on how to store data in DNA, calls that "impressive." But he says that the largest obstacle to making DNA data storage useful is the cost, because making custom DNA molecules is expensive. "For people to really pick it up, you need to store something cheaper than on tape, and that’s going to be hard," says Heckel.

Microsoft won’t disclose details of what it spent to make its 200-megabyte DNA data store, which required about 1.5 billion bases. But Twist Bioscience, which synthesized the DNA, typically charges 10 cents for each base. Commercially available synthesis can cost as little as .04 cents per base. Reading out a million bases costs roughly a penny.

Strauss is confident that the costs of reading and writing DNA will plunge significantly in coming years. She says there is already evidence that they are falling faster than the cost of fabricating transistors did over the past 50 years, a trend that has been the engine of much innovation in computing.

It would have cost about $10 million to sequence a human genome in 2007 but close to only $1,000 in 2015.

Keep Reading

Most Popular

Scientists are finding signals of long covid in blood. They could lead to new treatments.

Faults in a certain part of the immune system might be at the root of some long covid cases, new research suggests.

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.