Microsoft Has a Plan to Add DNA Data Storage to Its Cloud
Based on early research involving the storage of movies and documents in DNA, Microsoft is developing an apparatus that uses biology to replace tape drives, researchers at the company say.
Computer architects at Microsoft Research say the company has formalized a goal of having an operational storage system based on DNA working inside a data center toward the end of this decade. The aim is a “proto-commercial system in three years storing some amount of data on DNA in one of our data centers, for at least a boutique application,” says Doug Carmean, a partner architect at Microsoft Research. He describes the eventual device as the size of a large, 1970s-era Xerox copier.
Internally, Microsoft harbors the even more ambitious goal of replacing tape drives, a common format used for archiving information. “We hope to get it branded as ‘Your Storage with DNA,’” says Carmean.
The plans signal how seriously some tech companies are taking the seemingly strange idea of saving videos, photos, or valuable documents in the same molecule our genes are made of. The reason, says Victor Zhirnov, chief scientist of the Semiconductor Research Corporation, is that efforts to shrink computer memory are hitting physical limits, but DNA can store data at incredible densities.
Formatted in DNA, every movie ever made would fit inside a volume smaller than a sugar cube.
“DNA is the densest known storage medium in the universe, just based on the laws of physics. That is the reason why people are looking into this,” says Zhirnov. “And the problem we are solving is the exponential growth of stored information.”
Last July, Microsoft publicly announced it had stored 200 megabytes of data in DNA strands, including a music video, setting a record. The work, described in a paper published in March on the pre-print server Biorxiv, has been led by Karin Strauss, of Microsoft Research, and the University of Washington laboratory of computer scientist Luis Ceze.
Major obstacles to a practical storage system remain. Converting digital bits into DNA code (made up of chains of nucleotides labeled A, G, C, and T) remains laborious and expensive because of the chemical process used to manufacture DNA strands. In its demonstration project, Microsoft used 13,448,372 unique pieces of DNA. Experts say buying that much material on the open market would cost $800,000.
“The main issue with DNA storage is the cost,” says Yaniv Erlich, a professor at Columbia University who earlier this year reported a novel approach to DNA data storage. “So the main question is whether Microsoft solved this problem.” Based on their publication, Erlich says, “I did not see any progress towards this goal, but maybe they have something in their pipeline.”
According to Microsoft, the cost of DNA storage needs to fall by a factor of 10,000 before it becomes widely adopted. While many experts say that’s unlikely, Microsoft believes such advances could occur if the computer industry demands them.
Automating the process of writing digital data into DNA will also be critical. Based on the several weeks it took to carry out their experiment, Carmean estimates that the rate of moving data into DNA was only 400 bytes per second. Microsoft says that needs to increase to 100 megabytes per second.
Reading the data out is easier. That was done using a high-speed sequencing machine, including to recall specific parts of the files, analogous to random-access memory on a computer. Even a two-fold improvement in DNA reading would make that aspect of the system efficient enough for commercial use, Microsoft thinks.
Because writing and retrieving data into DNA is slow, any early use of the technology will be restricted to special situations. That could include data that needs to be archived for legal or regulatory reasons, such as police body-cam video or medical records.
Microsoft currently works with Twist Bioscience, a DNA manufacturer located in San Francisco. Twist is one of a number of newly formed companies trying to improve DNA production, a list that now includes startups DNAScript, Nuclera Nucleics, Evonetix, Molecular Assemblies, Catalog DNA, Helixworks, and a spin-off of Oxford Nanopore called Genome Foundry.
One exciting possibility being pursued by some of the startups is to replace the 40-year-old chemical process used to make DNA with one that employs enzymes, as our own bodies do. Jean Bolot, scientific director of Technicolor Research, in Los Altos, says it is funding such work at Harvard University, in the laboratory of George Church, the genomics expert.
“I am confident we will have results to talk about this year,” says Bolot, who adds that his company has been in discussions with movie studios about how they might employ DNA storage. He says half of all films made before 1951 are already lost because they were stored on celluloid. Now new formats, like high-definition video and virtual reality, are stretching studios’ ability to preserve their work, he says.
Zhirnov says computer chip makers are taking DNA seriously because there are physical limits to how much data can be stored in conventional media, like tapes or hard drives. His organization, which is funded by Microsoft, Intel, and others to perform applied research, began taking a closer look at DNA starting in 2013. He says semiconductor experts who believed DNA was too “soft” were surprised to learn that it lasts a hundred to a thousand times longer than a silicon device. The molecule is so stable that it is frequently recovered from mammoth bones and ancient human remains.
But its most important feature is density. DNA can hold 1,000,000,000,000,000,000 (aka a quintillion) bytes of information in a cubic millimeter. “Density is driving everything,” says Zhirnov.
A spokesperson for Microsoft Research said the company could not confirm “specifics on a product plan” at this time. Inside the company, the DNA storage idea is apparently gaining adherents but is not yet universally accepted. “Our internal people believe us, but not the tape storage people,” says Carmean, formerly a top chip designer at Intel.
In addition to being dense and durable, DNA has a further advantage that’s not often mentioned—its extreme relevance to the human species. Think of those old floppy disks you can’t read anymore or clay tablets with indecipherable hieroglyphs. Unlike such media, DNA probably won’t ever go out of style.
“We’ll always be reading DNA as long as we are human,” says Carmean.
How Rust went from a side project to the world’s most-loved programming language
For decades, coders wrote critical systems in C and C++. Now they turn to Rust.
Welcome to the oldest part of the metaverse
Ultima Online, which just turned 25, offers a lesson in the challenges of building virtual worlds.
These simple design rules could turn the chip industry on its head
An open standard called RISC-V is rewriting the economics of chip design and shaking up the tech sector’s power dynamics.
A new paradigm for managing data
Open data lakehouse architectures speed insights and deliver self-service analytics capabilities.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.