Data storage

Researchers test microchip for high-density synthesis of archival data storage DNA

01 December 2021

(News from Nanowerk) Researchers have made significant progress towards the goal of a new microchip capable of growing DNA strands that could provide high-density 3D archival data storage at ultra-low cost – and be able to preserve this information for hundreds of years. To enable the technology, the researchers also developed a correction system capable of compensating for errors in reading data stored in DNA.

DNA data storage uses the four bases that make up biological DNA – adenine (A), thymine (T), guanine (G) and cytosine (C) – to store data in an analogous way to zeros and to those of traditional computing. Current DNA storage is mostly limited to boutique applications such as time capsules, but there is wide interest in DNA as the next major storage medium for big data archives.

Georgia Tech Research Institute researchers have developed a microchip for growing DNA strands that could provide high-density archival 3D data storage at ultra-low cost. The microwells from which DNA grows are a few hundred nanometers deep and reflect specific colors of light in the photo. (Image: Sean McNeil, Georgia Tech Research Institute)

The microchip work is part of the Scalable Molecular Archival Software and Hardware (SMASH) project, a collaboration led by the Georgia Tech Research Institute (GTRI) to develop scalable DNA-based read/write storage techniques. The project, supported by the Intelligence Advanced Research Projects Activity (IARPA) Molecular Information Storage (MIST) program, could help meet the growing demand for archival storage, providing a cost-effective alternative to systems current tapes and hard drives.

Proof-of-concept nanofabricated microchips comprise tiny microwell structures a few hundred nanometers deep from which DNA strands grow in a massively parallel process. The chips will eventually include a second layer of electronic controls – fabricated in conventional CMOS – that will manage the chemical process as a single molecule of DNA is grown in each of the wells, one base at a time. Once the base sequence that stores the data is complete, the DNA strands will be removed from the surface and dried for long-term storage.

Since each base that stores information is made up of a small number of atoms, the technique will allow hundreds of terabytes of information – which would now require many conventional hard drives – to be stored in a single point of DNA. GTRI is working with California biotech companies Twist Bioscience and Roswell Biotechnologies to demonstrate this new type of commercially viable data storage that could eventually scale to the exabyte regime.

“We were able to show that it’s possible to grow DNA to the kind of length we want, and roughly the size of the feature we’re interested in using these chips,” said researcher Nicholas Guise. principal at the GTRI who is in charge of the director project for SMASH. “The goal is to grow millions of unique, independent sequences on-chip from these microwells, each serving as a tiny electrochemical bioreactor.”

The current prototype chip is about one inch square and includes 10 microwell libraries where DNA is grown. “Working with our colleagues at Twist and the Institute of Electronics and Nanotechnology at Georgia Tech, we optimized the geometry of the microwells to fit more and more of them onto a chip,” he explained.

DNA chips will be used for long-term storage of archival data where information is rarely accessed – but needs to be kept for a long time. This data is currently stored in magnetic tape memory, which must be periodically replaced with new tapes as the medium ages. Storing and retrieving data in DNA will take time, but the media will last virtually forever and can be retrieved using standard DNA sequencing techniques used for medical diagnostics.

“As long as you keep the temperature low enough, the data will survive for thousands of years, so the cost of ownership drops to almost zero,” Guise said. “It’s only very expensive to write the DNA once at the beginning and read the DNA at the end. If we can make the cost of this technology competitive with the cost of magnetically writing data, the cost of storing and maintaining information in DNA for many years should be lower.

One of the downsides of storing data in DNA is a higher error rate—considerably higher than what computer engineers would tolerate with conventional hard disk storage. In collaboration with the University of Washington, GTRI researchers have designed an encoding of information in DNA (a “codec”) to identify and correct errors and protect data stored in DNA.

“We’re working with a bunch of new technologies, and these new technologies have higher error rates than storage technologies in the past,” said Adam Meier, a GTRI principal investigator working on the SMASH project. “We targeted this codec to be extremely robust against errors, able to work with devices that read up to 10% of the bases in error.”

Error correction lightens the load on the hardware side of the project, and the error correction scheme is adjustable to allow the team to experiment with different chemical approaches and DNA lengths. In testing their work, the team received support from Georgia Tech’s Molecular Evolution Core and GTRI’s Advanced Concepts Laboratory to sequence the data stored in the DNA.

“What this does operationally is allow us to potentially increase the speed and throughput of the synthesizer and sequencer,” Guise said. “If you can tolerate some of the error through a resilient codec, you can write a lot more data and read a lot more data faster.”

Researchers demonstrated how to write image files into DNA and then read them, with the help of company partner Twist. Meier expects the error rate to decrease as technology advances, although he says error correction will always be part of data read operations.

“What we expect is that eventually the error-correcting code will be lighter,” he said. “It will ultimately have less impact on the final design, and when error rates are better, the codec will become less important. This is part of our research into future phases of the program.