Data storage

DNA data storage may sound futuristic, but it’s on the immediate horizon

As the amount of data generated around the world continues to grow at an aggressive rate, researchers are looking for ultra-dense, ultra-durable storage technologies that can house it all.

For example, Microsoft is exploring the possibility of using lasers to etch data into quartz glass or storing information as holograms inside crystals. New developments in tape storage, the current top choice for archival use cases, are also promising.

However, one new storage medium in particular seems to have all the necessary attributes: deoxyribonucleic acid, or DNA. The researchers found that a single gram of DNA is capable of storing 215 PB (220,000 TB) of data.

To learn more about the work being done to make DNA storage a commercial reality, Tech Radar Pro spoke to the DNA Data Storage Alliance, founded last year by Microsoft, Western Digital, Twist Bioscience and Ilumina.

The Alliance was launched with the aim of raising awareness of emerging storage technology and establishing a set of standards and specifications that the industry can rely on.

What is DNA storage and what challenges should it solve?

DNA data storage is the process of encoding and decoding binary data onto and from synthesized DNA strands (deoxyribonucleic acid). DNA has several unique properties, including density, it is essentially free to copy, the code will always be readable, and the cost of ownership over time will be lower due to longevity. In addition, it saves significantly on energy costs compared to current digital storage.

Legacy storage solutions have evolved significantly over the years, but the areal density of magnetic media (hard disk and tape), which enables today’s common archive storage solutions, is slowing down and the size of libraries is becoming unwieldy. In short, data growth outpaces the scalability of current storage solutions. The industry needs a new storage medium that is more dense, durable, long-lasting and cost-effective in order to accommodate the expected future growth of archival data.

How is it possible for digital information to be translated into a biological format (and vice versa)?

What kind of complications can arise here?

To store data in DNA, the original digital data (binaries) is encoded (mapped from 1s and 0s to DNA base sequences, ACGT) and then written (synthesized using chemical/biological processes) and stored. When the stored data is needed again, the DNA molecules are read (sequenced to reveal each individual ACG or T in order) and decoded (re-mapped from DNA bases to 1s and 0s).

There are some concerns about the accuracy of data potentially introduced by the synthesis of oligonucleotides (short pieces of DNA) and sequencing errors. However, unlike oligo synthesis for healthcare, which must be perfect, DNA storage can tolerate errors due to error-correction algorithms typically used in storage today. DNA data storage pioneers are already working on improving the encoding/error correction algorithm that will mitigate this risk and recover data accurately. Additionally, cost, speed, logistics, and other challenges remain barriers for data centers to adopt this technology.

The DNA Data Storage Alliance was formed by Illumina, Microsoft Research, Twist Bioscience and Western Digital. Our mission is to create and promote an interoperable storage ecosystem based on engineered DNA as the data storage medium. Our initial goal is to educate the public and raise awareness of this emerging technology. In addition, as commercially viable DNA data storage methods and tools become better understood and more widely available, the Alliance will consider the creation of specifications and standards (e.g., coding, physical interfaces, curation, files) to foster the emergence of interoperable solutions based on DNA data storage that complement existing storage hierarchies.

How could DNA storage impact the data center industry?

DNA is an inherently environmentally friendly medium in terms of power, space, and durability, in addition to greatly reducing the need to migrate data every few years. When used as the primary archival storage medium in a data center, it has the potential to change the size of the data center as well as the total cost of ownership and, alternatively, impose much lower loads than legacy archival storage technologies on terrestrial resources.

What are the main obstacles that DNA storage will have to overcome?

The costs of DNA synthesis and sequencing are still relatively high, compared to currently used archival storage media such as hard disks or tapes, and significant cost reduction is required for data storage DNA to be widely adopted. Additionally, education and confidence building to prepare the market for this new storage medium will also be essential, which is why the DNA Data Storage Alliance was formed.

What are the latest R&D innovations bringing DNA storage closer to reality?

Costs continue to drop thanks to the miniaturization of the DNA synthesis process by Twist Bioscience. Other companies are pursuing alternative methods of DNA synthesis, with both approaches allowing massively parallelized synthesis and cost reductions. The cost and throughput of NGS is also continuously improving, making DNA data recovery more promising. In addition, the development of encoding and decoding algorithms has proven itself.

What kind of schedule are we dealing with?

DNA data storage will be available in the medium term. There is still work to be done and a lot of momentum is moving forward to make this a reality. The earliest adopters of DNA data storage are likely to be applications where they have Write Once, Read Never (WORN) or Write Once, Read Seldom if Ever (WORSE) data. As the technology evolves and gains acceptance within the community, the market will expand and evolve.

What storage technologies in place is DNA most likely to compete with?

The demand for long-term data storage in the cloud is reaching unprecedented levels. Existing storage technologies do not offer a cost-effective solution for storing long-lived data. Operating at such scales in the cloud requires fundamentally rethinking how we build large-scale storage systems, and the underlying storage technologies that power them.

Are there other emerging storage technologies being developed that could be just as promising?

Researchers are exploring various technologies to support this evolution, including storing data in synthetic DNA, quartz glass, and other scalable optical systems. DNA data storage is unique in its characteristics and properties – it is safe to say that it will enable a new level of storage.