01 June 2015

How was the DNA decoded

Maxim Frank-Kamenetsky, Forbes

In the first two decades after Watson and Crick discovered the DNA double helix, humanity managed to understand a lot about the molecular nature of life. The famous "central dogma" was formulated, according to which genetic information in a cell is transmitted only in one direction: from DNA to protein. The genetic code has been completely deciphered, which allows the cell to translate the texts of nucleic acids into the texts of proteins, that is, the sequence of nucleotides in DNA and RNA – into the sequence of amino acid residues in proteins. All these were huge achievements.

The problem, however, was that although molecular biologists talked about these texts all the time, no one knew the texts themselves. There was no way to decode the genes. Or, in other words, there was no method for determining the sequence of nucleotides in DNA.

By that time, they already knew how to do this for proteins – the method of reading their sequence was developed in the early 1950s, even before the discovery of the double helix. In addition, scientists were already a little able to read short RNA sequences. But DNA sequences could not be read at all. This created a huge gap in the real understanding of the molecular foundations of life and hindered both the development of biotechnologies, which, strictly speaking, did not exist yet, and the medical application of this knowledge.

It even began to seem that this was too difficult a task and it would not be possible to solve it – all attempts were unsuccessful. But in the mid-70s of the XX century there was a breakthrough. The method for determining the DNA sequence was developed by the British chemist Frederick Sanger.

Sanger is a great man. He is the only one in the history of science who has received two Nobel Prizes in chemistry. Nobel forbade giving the same person a prize twice in the same field. And by that time Sanger had already received an award just for developing a method for reading amino acid sequences in proteins. And when he developed a method for reading DNA sequences, the Nobel Committee found itself in a very difficult position: it had to either not give a person a prize for an outstanding discovery, or violate the Nobel will. They decided to break the will after all. And this is the only case in the field of chemistry.

How are DNA sequences read now? Since then, huge progress has been made in this direction, and it is based on Sanger's breakthrough. A DNA sequence is a colossal length of text written with just four "letters" – four chemical compounds: adenine (A), thymine (T), guanine (G) and cytosine (C). We have a genome in every cell that consists of three billion nucleotides, three billion such "letters".

How can I read this text? First of all, DNA is cut into fragments using special enzymes called restrictases. Restrictases recognize short DNA sequences containing approximately 6 to 8 nucleotides, and only at this point the DNA double helix is cut in a certain way. The discovery of such "scissors" was another breakthrough of the early 1970s.

After cutting the DNA, the task is to determine the sequence of a short piece – it can contain a hundred or several hundred links. And here the Sanger method is used.

Special adapters are added to the resulting fragment of the molecule from both ends, because the restrictase leaves uneven ends. The adapter has a certain sequence that we choose ourselves, since it is synthetic. After adding the adapter, each fragment will receive certain – known to us – sequences at the ends. We can use these sequences to add synthetic primers (nucleic acid fragments) to the fragment of the molecule, starting from which a complementary chain will be synthesized according to the existing DNA sequence.

Sanger's idea was that in the process of such synthesis, it is necessary to add specially modified nucleoside triphosphates to the mixture of normal nucleotide precursors called nucleoside triphosphates, which will not be able to elongate.

As a result, synthesis stops at the place of one or another "letter". Thus, we get molecules with a set of lengths that tells us exactly where this or that "letter" is embedded. And then it remains only to separate these molecules along the length, which is done using gel electrophoresis.

A special gel is being prepared, that is, a polymer mesh, to which a constant electric field is applied. Under the influence of an electric field, negatively charged DNA molecules crawl through the polymer mesh. And the longer the molecule, the slower it moves in the gel. This allows us to separate a mixture of molecules according to their lengths, and where the nucleotide that we are currently studying is, we will see the synthesis stop, that is, the lengths of fragments when we separate them by length corresponding to the number of these nucleotides. And so we can read the whole sequence.

This wonderful, ingenious method, thanks to which we were able to sequence the human genome, was invented by Sanger. The first human genome was read at the very beginning of our century. Then it cost about three billion dollars. Then the method was modified, robotized, and today the procedure for determining the sequence costs incomparably less. The price is approaching $1000 for decoding the DNA of a particular person.

The absolutely fantastic development of DNA sequencing methods has created incredible progress both in the field of understanding the molecular nature of life, and in the field of biotechnological and medical applications.

About the author:
Maxim Frank-Kamenetsky is a Doctor of Physical and Mathematical Sciences, professor at the Faculty of Biomedical Engineering at Boston University.

Portal "Eternal youth" http://vechnayamolodost.ru01.06.2015

Found a typo? Select it and press ctrl + enter Print version