Data storage

Transforming the double helix into a robust and durable data storage platform

DNA is emerging as a robust data storage medium that offers ultra-high storage densities. However, existing DNA storage systems suffer from high latency caused by the inherently sequential write process.

Scientists have solved this problem by expanding the molecular makeup of DNA and developing a new, accurate sequencing method. They transformed the double helix into a robust and durable data storage platform.

Their prototype DNA data storage system uses an extended molecular alphabet combining natural and chemically modified nucleotides.

Kasra Tabatabaei, researcher at the Beckman Institute for Advanced Science and Technology and co-author of this study, said: “Every day, several petabytes of data are generated on the Internet. A single gram of DNA would be enough to store this data. This is how dense DNA is as a storage medium.

In imagining the future of data storage, scientists in this new study examined DNA’s millennial MO. They then added a 21st century twist.

Each strand of DNA contains four chemicals: adenine (A), guanine (G), cytosine (C) and thymine (T). These chemicals arrange and rearrange along the double helix in combinations that scientists can decode, or sequence, to make sense of.

Scientists added seven synthetic nucleobases to the existing four-letter array to increase the data storage capacity of DNA.

Tabatabaei said, “Imagine the English alphabet. If you only had four letters to use, you could only create so many words. If you had the full alphabet, you could produce unlimited word combinations. It’s the same with DNA. Instead of converting zeros and ones to A, G, C, and T, we can convert zeros and ones to A, G, C, T, and the seven new letters of the storage alphabet.

This is the first time that scientists have chemically modified nucleotides to store information in DNA. They could do this by combining machine learning and artificial intelligence to develop a unique DNA sequence reading processing method.

Chao Pan, a graduate student at the University of Illinois at Urbana-Champaign, said: “We tried 77 different combinations of the 11 nucleotides, and our method was able to differentiate each of them perfectly. The deep learning framework as part of our method to identify different nucleotides is universal, allowing the generalization of our approach to many other applications.

This perfect translation of letters is courtesy of nanopores: proteins with an opening in the middle through which a strand of DNA can undoubtedly pass. Scientists have found that nanopores can detect and distinguish each monomeric unit along the DNA strand, whether the units are of natural or chemical origin.

Charles Schroeder, James Professor of Economics in Materials Science and Engineering, said: “This work provides an exciting proof-of-principle demonstration of the extension of macromolecular data storage to unnatural chemicals, which have the potential to dramatically increase storage density in non-traditional storage media.”

Journal reference:

  1. S. Kasra Tabatabaei et al. Expanding the molecular alphabet of DNA-based data storage systems with neural network nanopore readout processing. DOI: 10.1021/acs.nanolett.1c04203