28 June 2018

DNA storage

How will the DNA data storage industry develop

Veronika Elkina, Rusbase, based on WIRED: The Rise of DNA Data Storage

The famous poem by the American poet Robert Frost "The Other Road" consists of 144 words and fits on one page. It can also fit in a 1 kilobyte file. Or even in a few drops of water inside a test tube, if scientists get down to business. More precisely, not exactly in the water itself, but in what is inside it – invisible floating DNA chains.

Scientists have been saying for a long time that DNA can become an ideal repository of information: it is dense, stable, and easy to copy. Over the past few years, researchers have recorded a lot of different data on DNA, for example, "War and Peace" by Leo Tolstoy, the song Smoke on the Water by Deep Purple, a gif with a running horse. But for DNA to be able to replace existing silicon and magnetic drives, it needs to become cheaper and clearer for writing, reading and storing.

Hyenjun Park and his colleagues from the startup Catalog, who first managed to record a poem on DNA a year and a half ago, are dealing with this issue. Now they are developing a machine that can record terabytes of data per day using 500 trillion DNA molecules. (So it is written both in this text and in the original article. Most likely, it should be read not "DNA molecules", but "nucleotides" – VM.) In a few years, researchers plan to launch corporate DNA data recording services for IT companies, the entertainment industry and the government. This project of the Massachusetts Institute of Technology is not the only one working in this field. Large companies such as Microsoft, Intel and Micron finance their own projects for storing data in DNA.

If everything works out, then this type of storage will be able to solve the unique problem of the XXI century associated with an overabundance of information. Five years ago, humanity produced 4.4 zettabytes of data, by 2025 this number will increase to 160 zettabytes. Modern infrastructure is able to cope with only a small part of the future data volume, which by 2040 may take up all the chips in the world.

Most digital archives, which contain everything from music to images from space and research data, are stored on magnetic tape. This is a cheap drive that has to be changed approximately every 10 years.

"Modern technologies have already approached the physical limits of scaling," says Viktor Zhirnov, a leading researcher at Semiconductor Research Corporation. "The density of data storage in DNA is several times higher than that of any other known storage technology."

Imagine that you want to save all the films in the world in DNA. You will get a storage the size of a sugar cube. And such a repository can exist for 10 thousand years.

The main concern is the price. Over the past few years, the cost of sequencing (i.e. reading DNA) has decreased, but recording is still expensive. Recording one minute of high-quality stereo sound costs about $ 100 thousand.

Catalog researchers believe that they will be able to reduce these prices. The traditional method splits the sequence into bits–ones and zeros–which are superimposed on the four basic DNA compounds. In 2016, when Microsoft was able to record 200 megabytes of data in a DNA chain, the company used 13448372 unique pieces of DNA. Catalog wants to generate a huge number of identical DNA molecules, no longer than 30 base pairs. Scientists then use billions of enzyme reactions to encode the information into recombination samples of harvested pieces of DNA. Instead of superimposing one bit on one base pair, the bits will be distributed across multidimensional matrices, and each set of molecules will represent its position in each matrix.

"If you present the data in the form of a book, then you can write down the information by copying it by hand," Park said. Instead of translating each letter into the desired format, Catalog will make a printing press where each letter will be represented by a DNA molecule. "By rearranging these prepared molecules in different ways, we can put all the words in the same order as it was in the book."

If successful, this technology will allow you to save data whose safety is necessary for legal reasons, for example, rare surveillance camera recordings, medical data and historical government documents. At the beginning of next year, Catalog plans to conduct a test commercial launch of its project – first of all, the startup focuses on intelligence services, space research departments, as well as the IT sector and Hollywood.

The Office of Advanced Research Projects of the US Department of Defense (DARPA) is also engaged in molecular data storage. Last year, it allocated $15.3 million for research on biochemical ways to store binary code. Large technology companies have also started to develop projects in this area. By 2020, Microsoft plans to launch a working prototype of DNA storage in one of its data centers.

According to Doug Karmin from Microsoft's research department, such storage will first be available to "elite" customers who need to store at least several gigabytes and petabytes of data. Long-term plans are even more ambitious. "We are going to completely replace magnetic drives," he said. Karmin believes that this transition will happen soon enough due to the growing interest in consumer genetics and synthetic biology. "It's becoming easier for people to access their DNA, so why not give them the ability to read any data recorded in it?"

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version