Data storage

Extended alphabet and precise sequencing make DNA the next data storage solution

March 04, 2022

(News from Nanowerk) Imagine Bach’s “Cello Suite No. 1” played on a strand of DNA.

This scenario is not as impossible as it seems. Too small to withstand rhythmic strumming or a sliding bowstring, DNA is a powerhouse for storing audio files and all sorts of other media.

“DNA is nature’s original data storage system. We can use it to store any type of data: images, video, music – anything,” said Kasra Tabatabaei, researcher at the Beckman Institute for Advanced Science and Technology and co-author of this study.

The expansion of the molecular composition of DNA and the development of a new precise sequencing method enabled a multi-institutional team to transform the double helix into a robust and durable data storage platform.

The team’s article appeared in Nano-letters (“Expanding the molecular alphabet of DNA-based data storage systems with neural network nanopore readout processing”).

In the age of digital information, anyone brave enough to navigate daily news finds the world’s archives growing heavier by the day. Increasingly, paper records are being digitized to save space and protect information from natural disasters.

From scientists to social media influencers, anyone with information to store can benefit from a secure and durable data lock box – and the Double Helix does the trick.

“DNA is one of the best options, if not the best option, for storing archival data in particular,” said Chao Pan, a graduate student at the University of Illinois at Urbana-Champaign and co- author of this study.

Its longevity is matched only by its durability, DNA is designed to withstand Earth’s harshest conditions – sometimes for tens of thousands of years – and remains a viable source of data. Scientists can sequence fossilized strands to uncover genetic stories and bring long-lost landscapes to life.

Despite its small size, the DNA looks a bit like the infamous police box from Dr. Who: bigger inside than it looks.

“Every day, several petabytes of data are generated on the Internet. A single gram of DNA would be enough to store this data. That’s how dense DNA is as a storage medium,” said Tabatabaei, who is also in his fifth year of doctoral studies. student.

Another important aspect of DNA is its natural abundance and near-infinite renewal, a trait not shared by the most advanced data storage system on the market today: silicon microchips, which often circulate for a few decades before an unceremonious burial in a heap of e dumps. -waste.

“At a time when we are facing unprecedented climate challenges, the importance of sustainable storage technologies cannot be overstated. New green technologies for DNA recording are emerging that will make molecular storage even more important in the future,” said Olgica Milenkovic, Franklin W. Woeltge Professor of Electrical and Computer Engineering and co-PI of the ‘study.

Envisioning the future of data storage, the interdisciplinary team examined DNA’s millennial MO. Then the researchers added their own 21st century twist.

In nature, each strand of DNA contains four chemicals – adenine, guanine, cytosine and thymine – often referred to by the initials A, G, C and T. They arrange and rearrange along the double helix into combinations that scientists can decode. , or sequence, to make sense.

The researchers expanded the already wide information storage capacity of DNA by adding seven synthetic nucleobases to the existing four-letter range.

“Imagine the English alphabet. If you only had four letters to use, you could only create so many words. If you had the full alphabet, you could produce unlimited word combinations. It’s the same with DNA. Instead of converting zeros and ones to A, G, C and T, we can convert zeros and ones to A, G, C, T and the seven new letters of the storage alphabet,” Tabatabaei said. .

Because this team is the first to use chemically modified nucleotides for storing information in DNA, the members innovated around a unique challenge: not all current technologies are able to interpret strands of chemically modified DNA. To solve this problem, they combined machine learning and artificial intelligence to develop a unique DNA sequence reading processing method.

Their solution can distinguish modified chemicals from natural ones and differentiate each of the seven new molecules from each other.

“We tried 77 different combinations of the 11 nucleotides, and our method was able to differentiate all of them perfectly,” Pan said. “The deep learning framework within our method for identifying different nucleotides is universal, allowing our approach to be generalized to many other applications.”

This perfect translation of letters is courtesy of nanopores: proteins with an opening in the middle through which a strand of DNA can easily pass. Remarkably, the team discovered that the nanopores can detect and distinguish each individual monomer unit along the DNA strand, whether the units have natural or chemical origins.

“This work provides an exciting proof-of-principle demonstration of the extension of macromolecular data storage to unnatural chemistries, which have the potential to dramatically increase storage density in non-traditional storage media,” said Charles Schroeder, James Economy Professor of Materials. Science and Engineering and a co-PI on this study.

DNA literally made history by storing genetic information. According to this study, the future of data storage is just as double helix.