An extended molecular alphabet for storing DNA data
(News from Nanowerk) DNA data storage systems have the potential to hold orders of magnitude more information than existing systems of comparable size. Compared to existing data storage technologies, it is potentially cheaper, much more physically compact, more energy efficient and more durable – DNA survives for hundreds of years and is maintenance free. Files stored in DNA can also be very easily copied at negligible cost.
The storage density of DNA is staggering. Consider this: humanity will generate about 33 zettabytes by 2025, or 3.3 followed by 22 zeros. All of this information would fit in a ping-pong ball, with room to spare. The United States Library of Congress has about 74 terabytes, or 74 million million bytes, of information – 6,000 of those libraries would fit in a DNA archive the size of a poppy seed.
Information stored in DNA can be copied in a massive parallel fashion and selectively retrieved via the polymerase chain reaction (PCR). However, existing DNA storage systems suffer from high latency caused by the inherently sequential write process. Despite recent advances, a typical solid-phase DNA synthesis cycle time is on the order of minutes, which limits the practical applications of this molecular storage platform.
To overcome these challenges, new synthesis methods and new information encoding approaches are needed to accelerate the speed of writing large volume datasets.
Extending the alphabet of a DNA storage medium by including chemically modified DNA nucleotides can both increase storage density and write speed because more than two bits are recorded during each cycle of synthesis.
Advancing research in this regard, scientists have now reported an expanded molecular alphabet for storing DNA data comprising four natural nucleotides and seven chemically modified nucleotides that are easily detected and distinguished using nanopore sequencers.
The findings are published in Nano-letters (“Expanding the molecular alphabet of DNA-based data storage systems with neural network nanopore readout processing”).
The authors’ results show that Mycobacterium smegmatis Porin A (MspA) nanopores, which are widely used for ssDNA detection and single molecule chemistry studies, can accurately distinguish 77 combinations and orders of chemically diverse monomers in homo and heterotetrameric sequences.
They further demonstrate that highly accurate classification (greater than 60% on average) of natural and chemically modified nucleotide combinatorial patterns is possible using deep learning architectures.
According to the scientists, the extended molecular alphabet has the potential to offer nearly doubled storage density and potentially the same order of reduced recording latency, thus offering a promising avenue for the development of novel molecular recorders.