19 September 2012

Scientists said they have figured out the functions of 80 percent of the human genome

Slightly exaggerated

Irina Yakutenko, "Lenta.Ru"Scientists have found out that the main part of the human genome is not garbage at all, as previously thought.

A large consortium of researchers has found that 80 percent of DNA has some kind of biological function. This discovery is worthy of being called epoch-making, but have scientists really discovered exactly what is reported in press releases?

Not just genesThe project, whose participants made such a high–profile discovery, is called ENCODE (short for Encyclopedia of DNA Elements, translated from English - "encyclopedia of DNA fragments") and sets itself a truly comprehensive goal – to describe all the sequences of the human genome that have a particular function.

The task is very important: human DNA consists of three billion "letters" (nucleotides), while there are only 20 thousand genes in it, and their total length slightly exceeds 1 percent of the total size of the genome. Why the remaining 99 percent is needed was unclear for a long time.

However, one of the possible answers to this question, extremely popular among biologists at the end of the last century, is reflected in the name that scientists in the 1970s dubbed the entire incomprehensible array of genetic data - "junk" DNA (in English – junk DNA). However, with further study of the mechanisms of a living cell, it became clear that genes are far from the only thing that determines how the hereditary information embedded in it will be implemented.

Even if in the distant future scientists find out what each of the 20 thousand genes of Homo sapiens and countless variants of them are responsible for, then looking at the printout of a particular person's genes, they still will not be able to fully describe how he looks, how well his heart, lungs and other organs work, from what diseases he suffers suffers or how many points he will score in the IQ test. In order to create an exhaustive portrait of a person, in addition to knowing exactly which genes he has, you need to at least understand which regulatory elements control the work of these genes (in addition, it is desirable to take into account the conditions in which a person grew up, but today this factor is considered much less significant than before). Regulators "attract" specific proteins that can turn the gene on or off.

Strictly speaking, the fact that, in addition to genes, all kinds of regulatory sequences control the normal operation of living systems was already known in the middle of the XX century. In 1961, French biochemists Francois Jacob and Jacques Monod described such a sequence in the genome of E. coli. A system called the lactose operon regulates the absorption of lactose by the milk sugar cell and has been included in all textbooks of molecular biology. In 1965, Jacob and Mono were awarded the Nobel Prize for their work on the study of "genetic control of the synthesis of enzymes and viruses," but the real importance and scale of the use of regulatory elements by cells became clear later when scientists collected more data on their functioning.

But just knowing the sequence of genes and regulatory elements is still not enough to create a complete picture of the work of the genome. Another important factor is the spatial location of DNA fragments in the nucleus, on which genes and regulators are located. It may very well be that the sequence that triggers the work of the gene is removed from it at a distance of tens of thousands of "letters" – but if the DNA strand bends into a loop so that they touch, the gene will actively function.

Finally, the work of a cell depends on how its DNA is modified. Scientists have long found out that various chemical appendages can be "screwed" to the "letters" of the genetic code, but the true significance of such changes has become apparent only in the last couple of decades. Some modifications (for example, methylation) "turn off" genes, other "improvements", on the contrary, make them work without stopping. In addition to genes, specific proteins called histones, around which nuclear DNA is wrapped, can undergo chemical changes. The complex laying of deoxyribonucleic acid spirals allows you to "cram" long strands into a tiny nucleus – the total length of human DNA, for example, is about two meters, and the diameter of the nucleus does not exceed 20 micrometers (a micrometer is a millionth of a meter).

Gradually, researchers accumulated more and more data on what other subtleties besides the actual sequence of genes affect the implementation of the information embedded in the body. The organizers of the ENCODE project set out to record all these "additions" to the human genome. In other words, they decided to isolate from the array of "junk" DNA all the fragments that may be significant for the work of the cell.

Not garbageThe project started in 2003, and about 400 scientists from all over the world were invited to it.

Separate groups were engaged in different "branches" of research, and then all their results were combined, compared and analyzed. The first significant milestone was passed in 2007, when a consortium of specialists presented the results of an analysis of one percent of the human genome. Half of the fragments were taken from well-studied sections of chromosomes, and half "represented" the genomic terra incognita. It was assumed that the DNA characterized by scientists would serve as a kind of sample, focusing on which it would be possible to make generalizing conclusions about the entire genome.

The current result of the ENCODE project is much more significant: this time the researchers did not limit themselves to one percent, but went through the entire human genome. In total, scientists have published 30 articles in one of the most prestigious scientific journals Nature, plus a dozen more materials in simpler journals. Specialists searched for all the potentially significant DNA sites listed above and, in addition, estimated from what percentage of "letters" RNA is synthesized. This process is called transcription, and, in fact, it is a direct indicator that the gene is active. Genes contain information about proteins in encoded form, but protein–building enzymes cannot decipher this code. Instructions for them are recorded on a special intermediary molecule called matrix RNA (mRNA). Special proteins read the information recorded in DNA and "translate" it into the language of mRNA, understandable to the enzymes-builders.

At the dawn of the study of the mechanisms of the genome, it was believed that transcription could only come from genes. One of the discoverers of the structure of DNA, Francis Crick, even postulated this belief in his famous central dogma of molecular biology, which describes the process of implementing genetic information in a cell. The classic version of the central dogma states that information is transmitted only along the DNA – RNA–protein path, and unidirectionally. Later it turned out that the scheme proposed by Crick is actually somewhat more complicated and information flows between its three main components can flow in more than one way.

In particular, it turned out that RNA molecules are formed not only from genes, but also from sections of the same "junk" DNA. These ribonucleic acids represent another type of regulators of the genome and can directly affect the activity of genes, determining exactly how the information recorded in them will be read. Thus, sections of DNA outside the genes from which such transcription occurs can no longer be classified as "garbage". And according to the results of the current ENCODE stage, it turned out that somehow RNA is synthesized with 60 percent of all human DNA. About 20 percent of the genome also falls under other criteria of potentially significant sites. And it was the figure 80 that appeared in all press releases and popular science articles, and most of the publications claimed that all this DNA, previously considered unnecessary, has some biological functions. And such an interpretation of the results can be called, at least, incorrect.

Slight exaggerationThe thing is that the molecular machines that are responsible for the internal kitchen around DNA are imperfect.

Despite the amazing accuracy and precision of their work – even the most complex human technology is still far behind this level – sometimes these machines make mistakes. Enzymes that copy DNA insert the wrong "letter", proteins that repair genome breakdowns suddenly skip whole pieces or, conversely, add a couple of extra fragments and so on. Often such mistakes turn out to be fatal for the body or lead to a serious illness, but, on the other hand, they form the resource of mutations, without which evolutionary changes are impossible.

Many of these mutations do not affect the work of the cell in any way, and these errors, which are indifferent to the organism, fall out of the "field of view" of natural selection. If an extra copy of the gene does not interfere with the work of the others, then it can remain in the genome for as long as desired. Mutations will gradually accumulate in it, but if the "letters" at the beginning of the gene remain unchanged, then the unnecessary copy will continue to be transcribed. The enzymes synthesizing RNA recognize exactly the front parts of the genes (promoters) and do not know how to determine whether the gene in front of them is "right" or "wrong". They begin transcription even if the gene breaks off in the middle or only a short "stub" remains of it at all. As a result, short RNAs are formed, which are immediately disposed of by special cellular "scavengers". Unfinished RNAs do not perform any useful work in the cell, however, according to the ENCODE project standards, it will be assumed that the DNA section from which such transcription proceeds has some kind of function.

Since enzymes can make mistakes at any moment, over millions of years, not only extra genes have appeared in the genome, but also extra copies of regulatory sites that control the work of genes. And although such a regulator has not been "tied" to any meaningful DNA fragment for a long time, it will still be recognized by special proteins, which means that it will fall into the list of potentially functional ENCODE sites.

There are other questions about the project that are not related to the definition of the term "functionality". Some confusion is caused by the choice of cell lines on which scientists conducted their experiments. Most of the cells that the researchers worked with are "immortal" cancer cells that reproduce well in culture, but their DNA is full of all kinds of mutations and rearrangements. Is it possible to directly transfer the results obtained on such lines to "normal" human cells? However, Pavel Georgiev, director of the Institute of Gene Biology of the Russian Academy of Sciences, believes that the basic molecular biological mechanisms that control the work of living systems are the same in both cancer and ordinary cells. "The main thing was to get some primary result, and it is much easier to work with cancer cells. Further, individual laboratories will deal with the cells they need using the current ENCODE results," he says.
Actually, the phrase "biological function" also appeared only in press releases. In the original articles, scientists use a much more neutral definition of "biochemical function", which does not directly indicate the significance of this "function" for the body.

"The task of separating truly functional elements from sites that are transcribed by chance will require much more work than has already been done," says Evgeny Kunin, a leading researcher at the US National Center for Biotechnological Information. "Moreover, it is theoretically impossible for 80 percent of the genome sequences to perform one or another biological function. Such a significant part of the genome cannot be the subject of selection," Kunin explained to <url>.

Birney recognizes that the final marker of functionally important DNA sites should be their resistance to transmission from generation to generation. However, the scientist notes that some sequences may fall out of this rule – for example, the genes responsible for the shape of the nose are very likely not under strict selection pressure. But this does not mean that they are not interesting to researchers involved in human genetics.

But if the participants of the ENCODE project are guided by an indicator of 20 percent – why did all press releases (including the press release of the journal Nature) get a four times higher figure? Birney explains this oddity as follows: "At first I insisted on [using both digits]. But to put two percentages in one paragraph at once means to require too much effort from readers. They need to understand why there is such a big difference between these two figures, and the corresponding explanations may turn out to be a little longer than most people can tolerate." As a result, the researchers settled on the 80 percent option, because it includes data from all ENCODE experiments. In addition, he better emphasizes the main idea of the project, namely, that the genome is not an array of "dead" DNA, but a structure filled with life and activity.

A spoonful of honeyA small substitution of concepts, which ENCODE participants allowed themselves, does not detract from the merits of the project.

This is a colossal study on its scale, which gives scientists a lot of valuable information to work with. Some practical results are already available today. After analyzing mutations (so-called single-nucleotide polymorphisms or snips) that increase the likelihood that a person will develop a particular disease, experts found that up to a third of such changes are located in regulatory sites, and many of these sites are active only in immune cells. Such data already allow researchers to look for the causes of diseases specifically in a specific type of cell, rather than sorting through all possible options.

"Such large–scale genomic projects are always positive," says Pavel Georgiev, Director of the Institute of Gene Biology of the Russian Academy of Sciences. – They allow you to save a lot of money. In such projects, a large community of scientists purposefully solves one specific task, and this is always more effective compared to the situation when some individual laboratories with rather limited capabilities are trying to do something on their own."

All such projects, with the exception of perhaps the most "obvious" ones, like the Human Genome, have one common problem: how to convey their value to the public. The results of the researchers' work are often very specific, and it takes a lot of time to explain their essence. Hence the loud headlines and distortions like the 80 percent functionality of the genome. Deliberately misleading readers, of course, is not good, but, on the other hand, sometimes only such a technique allows you to draw attention to a truly important project. So the crafty scientists can probably be understood.

Portal "Eternal youth" http://vechnayamolodost.ru19.09.2012

Found a typo? Select it and press ctrl + enter Print version