08 April 2016

DNA Archive

Memory on artificial DNA

Alexey Ponyatov, "Science and Life" based on the materials of Washington State University: UW team stores digital images in DNA – and retrieves them perfectly

At the 21st International Conference on Architectural Support of Programming Languages and Operating Systems, which ended on April 6 in Atlanta (USA), researchers from the University of Washington and Microsoft presented a report describing the information storage system based on synthesized DNA. They managed not only to save various types of information (text, images, sound) in this way, but also to read them accurately.

dna_info1.jpg
Three images that were encoded in DNA and then successfully extracted back.
(The picture here and below is by Tara Brown Photography/University of Washington.)

DNA molecules created by nature to store genetic information of living organisms are capable of storing information many millions of times more tightly than all existing technologies for digital storage devices – hard and optical disks, flash drives, etc. In addition, DNA can reliably store data for several centuries, as opposed to a period of several years to two or three decades for other devices. It is estimated that the limit of the DNA record density reaches 1 exabyte per mm3 (1018 bytes/mm3) with a half-life of more than 500 years. However, while access to the information recorded in this way is very slow (from tens of seconds to hours), so such a system can only be used for archival data storage.

dna_info2.jpg2
All data from hundreds of ordinary smartphones (10,000 gigabytes) 
can be stored in a faint pink smear of DNA at the end of this tube.

Coding is carried out using four basic building blocks of DNA: adenine (A), guanine (G), cytosine (C) and thymine (T). These blocks correspond to the digits of the code. Since there are four of them, binary numbers are translated into a code with a different base before encoding. In the simplest case, a system with base 4 can be used, then A, C, G, T are mapped to the digits 0, 1, 2, 3. The process of encoding, for example, the binary sequence 01110001 consists in replacing it with the Huffman code at base 4 – 1301, and then synthesizing the DNA – STAS chain. However, such coding does not allow us to protect ourselves from numerous errors that occur during DNA synthesis, so we had to develop a special coding method that reduces the likelihood of error, and in addition, add error correction schemes used in computer memory to biotechnologies.

The researchers also solved the problem of arbitrary access to information recorded on a large number of different DNA. To do this, they learned how to encode service data ("indexes") in them, allowing them to find the necessary information. Using the polymerase chain reaction used in molecular biology, they identified the necessary indexes, and then, using DNA sequencing (determining the sequence of blocks), they read the data.

This work is of great interest against the background of the avalanche-like growth of information around the world. According to forecasts, in 2017 its number will increase to a value of more than 16 zettabytes (10 21). Even if not everything needs to be saved, this is a huge number. And the highest density of commercially available memory on tape cartridges is only 10 gigabytes per mm3 (1 GB = 109 B), even the most recent studies promise optical disks with a density of only about 100 GB/mm3. The height of the desired stack of such disks will be greater than the distance to the Moon. Of course, while the cost and efficiency of DNA memory leave much to be desired, but researchers consider these problems quite solvable.

Portal "Eternal youth" http://vechnayamolodost.ru  08.04.2016

Found a typo? Select it and press ctrl + enter Print version