28 September 2015

Mutating humanity

What have we learned about our mutations in the 15 years of the genomic era

Alexander Markov, Elements 

In the 15 years that have passed since the first rough reading of the human genome, our knowledge of mutations that occur in our genes has increased many times, and the methods of studying them have become much more effective and diverse. In a review article by American geneticists published in the journal Science, the results of the study of the mutation process in Homo sapiens for the first decade and a half of the "genomic era" are summarized. 

Mutagenesis is one of the most fundamental biological phenomena. All living beings, without exception, are subject to mutations. Random changes in the sequence of DNA nucleotides occur both during replication (reproduction) of DNA molecules, and in the intervals between acts of replication (for example, under the influence of ultraviolet radiation and other mutagens). Most mutations are immediately corrected by special DNA repair systems, but even the most complex and advanced repair systems do not provide one hundred percent protection. 

The rate of mutagenesis varies greatly among different organisms, but it is not zero in any creature – from viruses to humans. This fact may seem trivial, but in fact it requires an explanation, and here's why.

As a rule, there are much more harmful than beneficial mutations among non-neutral (affecting fitness). This happens for purely probabilistic reasons (see Anna Karenina's Principle). Consequently, the lower the rate of mutagenesis, the higher, all other things being equal, the average fitness of the descendants of this organism will be. Therefore, selection, it would seem, should always contribute to reducing the rate of mutagenesis: under its influence, more and more accurate replication and repair systems should develop. Why have these systems never become absolutely error-free?

There are probably two reasons. The first one can be conditionally called "economic". Ultra-precise replication and repair systems would most likely be too expensive: cumbersome, energy-intensive, their work would slow down replication or have some other harmful side effects. In addition, the more precisely these systems work, the weaker the selection pressure aimed at their further improvement. 

The second reason is, of course, that beneficial mutations also do not-no, and do occur. Therefore, although the average fitness of the descendants of mutating organisms will always be lower than that of non-mutating organisms, the spread of fitness and maximum fitness of the former will be higher. Therefore, in many situations, especially with environmental changes, a clear advantage will be on the side of mutating organisms not only in the long term (thousands and millions of years), but also in the short term (on the scale of one or several generations). For some organisms, the direct harm of too slow mutation has been shown experimentally. 

Anyway, mutations are inevitable, there are much more harmful among them than useful, and all living things have to put up with it (and, moreover, adapt to it). We humans are not only no exception, but even, together with other mammals, are far ahead of most living beings in terms of mutagenesis rates per individual per generation (Fig. 1). 


Fig. 1. The rate of mutagenesis in different organisms. On the vertical axis – the rate of nucleotide substitutions (per billion base pairs per generation), on the horizontal – the size of the genome (in millions of base pairs). The left graph shows viruses and prokaryotes, the right graph shows eukaryotes (the right graph also shows the average values for bacteria and archaea). The red line corresponds to the level of mutagenesis equal to one mutation per genome per generation. It can be seen that in viruses and prokaryotes, the larger the genome, the lower the rate of mutagenesis, whereas in eukaryotes, the inverse relationship is observed. It is also seen that mammals are characterized by an extremely high number of mutations per genome per generation. Eubacteria – bacteria, Archaea – archaea, Double-stranded DNA viruses, Single-stranded DNA viruses – viruses containing double- and single-stranded DNA, RNA viruses – RNA-containing viruses, Mammals – mammals, Invertebrates – invertebrates, Plant – plant, Unicellular eukaryotes – unicellular eukaryotes. Image from the article: Lynch M., 2010. Evolution of the mutation rate // Trends in Genetics. (with changes)

The genome of Homo sapiens was first read in rough about 15 years ago. From that moment, a new era began in the study of the mutation process in our species. Thanks to the rapid development of technologies for sequencing and analyzing genomic sequences, today we know immeasurably more than at the beginning of the XXI century about the rate of mutations, about the patterns of their distribution across the genome, about their role in various pathologies (including cancer) and about other features of our mutagenesis that have both theoretical and practical significance. 

A selection of articles devoted to this burning topic has been published in the latest issue of the journal Science. Most of them concern highly specialized, including medical issues, but one, written by geneticists from the University of Washington in Seattle, gives a general overview of what we have learned about our mutations over the past 10-15 years (Jay Shendure and Joshua M. Akey. The origins, determinants, and consequences of human mutations). 

1. The rate of mutations of the "germ line". All mutations can be divided into somatic (occurring in somatic, that is, non-germ cells of the body at different stages of development) and germline mutations that change the genome of germ cells and are inherited by descendants. From a medical point of view, both types of mutations are extremely important, from an evolutionary point of view, the second ones are, of course, more important.

The first estimates of the rate of occurrence of mutations in the human germ line were made long before the genomic era, but their accuracy was low. Today, several approaches are used for this. One of them is the study of pedigrees in order to count newly emerging mutations with a clear phenotypic effect and high penetrance (that is, such mutations that change the phenotype, firstly, in a strictly defined way, and secondly, for sure). As a rule, mutations that cause congenital pathologies are used for this – "Mendelian" (inherited in accordance with Mendelian laws) hereditary diseases. They are inherited according to Mendel, because they are caused by single mutations, and not by cunning combinations of dozens and hundreds of "risk alleles" in combination with environmental factors. In one of the recent studies based on this approach, the rate of single nucleotide substitutions in humans was estimated at 1.28 mutations per 100 million base pairs per generation (1.28 × 10-8 per nucleotide per generation) (M. Lynch, 2010. Rate, molecular spectrum, and consequences of human mutation). Since there are approximately 6 billion base pairs in the human diploid genome, this corresponds to 77 new mutations per genome per generation. 

Another approach is based on comparing the genomes of humans and other primates. By calculating the differences in neutral (not under the influence of selection) sections of the genome, it is possible to compare the result with the lifetime of the last common ancestor of the species being compared (as far as it can be estimated from paleontological data). According to the neutral theory of molecular evolution, the rate of accumulation of neutral genetic differences between species should ideally be simply equal to the rate of neutral mutagenesis (in one generation, as many differences accumulate between two species as new mutations occur in each individual). Therefore, knowing the time of divergence, the rate of mutagenesis can be calculated by the formula m = D/2t, where D is the number of neutral differences between species, t is the lifetime of the last common ancestor in "generations ago". The two in the formula appears due to the fact that both species accumulated mutations independently of each other after divergence. 

However, there are a lot of factors that violate the correct course of the "molecular clock", and paleontological dating of the last common ancestors, to put it mildly, is not always accurate. Therefore, the reliability of this method is low. It is not surprising that he gave slightly different results. For example, after reading the chimpanzee genome, the rate of our mutagenesis was estimated at 2.2 × 10-8 substitutions per nucleotide per generation, or 132 new mutations in each newborn – almost twice as much as the analysis of hereditary diseases showed (The Chimpanzee Sequencing and Analysis Consortium, 2005. Initial sequence of the chimpanzee genome and comparison with the human genome). 

In recent years, thanks to the sharp reduction in the cost of genome-wide analysis, it has become possible to assess the rate of mutagenesis directly, simply by comparing the genomes of parents with the genomes of their children and counting new mutations. Other approaches have also emerged, in particular, based on paleogenetic data. For example, reading the genomes of Neanderthals and other ancient people allowed us to estimate the rate of mutagenesis by the number of "missing mutations" in these genomes, that is, by how much less genetic differences from the common ancestor the line that died out tens of thousands of years ago managed to accumulate compared to us today. These and other new methods give estimates ranging from 1.0× 10-8 to 1.2× 10-8 substitutions per nucleotide per generation, that is, 60-72 new mutations in each newborn. And this seems to be close to the truth. 

Of course, these are average values: in individual individuals, the number of new mutations can be either less or significantly higher than the average. In any case, there is no doubt: we are all inveterate mutants! We are far from any trifle like bacteria or yeast, in which a single new mutation may occur in a thousand or even ten thousand "newborns" (cell divisions) (Fig. 1).

In mammals, 5-10% of the genome is "under selection", and the rest is mostly garbage (to put it politically correctly, these are areas in which all or almost all mutations are neutral, that is, they do not affect fitness and are not subject to selection). Consequently, out of 60-70 new mutations in the genome of an average newborn, about 3-7 are harmful. The rate of occurrence of beneficial mutations is not exactly known, but they are certainly rare enough that they can be neglected, speaking of one average person. 

From three to seven new harmful mutations in each person in each generation is dangerously many. Strong purifying selection is needed to avoid degeneration, that is, the steady accumulation of genetic cargo. If there is a bad hope for selection, then we can only hope for high biotechnologies: genetic engineering, gene therapy, in vitro fertilization with artificial selection of embryos and the like. However, the threat of degeneration is not discussed in the article under discussion (but this problem is discussed in the articles referred to by the authors of the review). 

In addition to single-nucleotide substitutions, there are also insertions and fallouts ("indels", see Indel), inversions (180° rotations) and duplications of DNA fragments of different lengths. Such mutations occur less frequently than single-nucleotide substitutions, but they affect a larger number of nucleotides and, of course, also affect the likelihood of developing all kinds of diseases. According to available estimates, which are not yet very accurate, each person carries on average about three new small (1-20 base pairs) inserts and deletions and 0.16 larger (>20 base pairs).

Knowing the rate of mutagenesis, population size and fertility, we can roughly estimate the overall scale of genetic polymorphism of modern humanity. This scale is impressive: only during the lifetime of one last generation, more than 10 11 point mutations should have reappeared in the human population – much more than nucleotides in the genome! Apparently, every possible point mutation (except those incompatible with life) is currently present in at least a hundred or two people living on the planet. Of course, there are much fewer registered polymorphisms, because it has not yet reached the general reading of genomes. 

2. Patterns of mutation distribution across the genome. As you know, mutations are random. At least in the first approximation. However, this does not mean that the probability of occurrence of all mutations is exactly the same or that the process of mutagenesis is completely chaotic in all its aspects. The "randomness" of mutations means a very specific thing, namely, the absence of a direct influence of the usefulness or harmfulness of a mutation on the probability of its occurrence. Living beings do not have a mechanism that allows them to calculate which mutation will be useful to them in these conditions, and to introduce this mutation into their genome. However, there are mechanisms that make it possible to slightly increase the likelihood of beneficial mutations (for example, somatic hypermutation of immunoglobulin genes in lymphocytes) and reduce the likelihood of harmful ones. In particular, it turned out that the frequency of mutation in humans is associated with the temporal sequence of chromosome replication. Fewer mutations occur in the sites replicating earlier than others than in the sites replicating last. This is advantageous because the first, as a rule, are replicated areas in which there are many genes. Accordingly, mutations in these regions often turn out to be harmful. The last sections of DNA are replicated, in which "garbage" prevails and in which, therefore, most mutations turn out to be neutral.

The strongest "skew" in the distribution of mutations in the human genome is that most often the nucleotides C (cytosines) are mutated, followed by the nucleotide G (guanine). Cytosine is generally a "weak link" in DNA, since it tends to turn into uracil (Y) as a result of spontaneous deamination. However, the repair systems are vigilant to ensure that there are no uracils in the DNA, and quickly correct most of the mutations that have arisen in this way. 

CD dinucleotides differ in that the cytosines included in them are often methylated. Methylated cytosine, as a result of deamination, no longer turns into uracil, but into thymine – a "legitimate" base, normally part of DNA. It is much more difficult for repair systems to detect such a mutation. 

As a result, the frequency of cytosine mutation in CD dinucleotides is about 10 times higher than normal. This, in turn, leads to an unequal occurrence of amino acid substitutions. Of all the 20 amino acids in human proteins, arginine is the most "vulnerable". As it turned out, over 16% of all amino acid substitutions leading to hereditary diseases are arginine substitutions (Fig. 2). This fact, at first glance mysterious, is explained very simply. If you look at the table of the genetic code, you can make sure that all 4 codons starting with the mutation-prone CG dinucleotide encode arginine. 


Fig. 2. Relative frequency of amino acid substitutions in humans. Arginine (R) holds the record for the number of substitutions, because 4 out of 6 triplets encoding it begin with a CD dinucleotide (the data are based on an analysis of 4,000 non-synonymous mutations that cause hereditary diseases). A drawing from the discussed article in Science.

There are more CD dinucleotides in coding sequences than the genome average, which contributes to a higher rate of mutation of coding sequences compared to non-coding ones. However, there is also a mechanism that reduces the frequency of mutation of coding sequences, at least those that are often transcribed. The fact is that the transcribed sequences undergo better repair. Because of this, inherited mutations occur less frequently in genes active in germ line cells than on average in the genome.

Obviously, the mutation pattern, like other signs of an organism, may well evolve under the influence of mutations, selection and drift. It may differ slightly not only in humans and chimpanzees, but even in different human populations. And it's not just a theory. So, relatively recently, from 40 to 80 thousand years ago, the pattern of mutation changed in the ancestors of the current Europeans, who then just separated from the ancestors of Asians. Namely, Europeans have increased the rate of occurrence of mutations in TCC trinucleotides. These trinucleotides began to turn more often into TTCs (5'-TTC-3' → 5'-TTC-3') (K. Harris, 2015. Evidence for recent, population-specific evolution of the human mutation rate). 

It is known that such mutations most often occur in skin cells under the influence of ultraviolet light. They are especially characteristic of melanoma cells. In the course of evolution, the skin of Europeans has become more transparent to ultraviolet light than in other human populations, so the increase in such mutations in skin cells is easily explained. But how these mutations get into the germ line is not exactly known. One of the hypothetical possibilities is that ultraviolet light increases the frequency of mutations of this type both in the skin and in germ cells in the same indirect way, contributing to the degradation of folic acid. Deficiency of this vitamin can lead to failures in the course of DNA synthesis. Anyway, this fact clearly shows that the pattern of mutation in human populations is really subject to evolutionary changes.

3. Middle–aged fathers are the main source of hereditary mutations. To date, it has been firmly established that people receive the lion's share of new hereditary mutations from their fathers. At the same time, the older a man is, the more mutations there are in his spermatozoa. About 95% of the variability of descendants in the number of new mutations is explained by the age of the father. Worse, it turned out that the old fathers smoothed the above-mentioned dependence of the mutation frequency on the replication sequence (replication timing). Accordingly, the proportion of mutations in "meaningful" parts of the genome is growing, and among such mutations the proportion of harmful ones is higher.

With the age of the mother, the number of mutations in her eggs does not increase, but the probability of having children with chromosomal disorders, such as Down syndrome, increases.

In short, having children is still better in youth. The tendency observed in developed countries to increase the average age of fatherhood and motherhood further increases the risk of genetic degeneration of mankind.

4. Somatic mutations and their medical significance. During a person's life, the cells of his body divide trillions of times. Each division is associated with the risk of somatic mutations, and in the intervals between replications, DNA molecules can be damaged. In tissues whose cells divide especially intensively (for example, in the intestinal epithelium), almost every possible point mutation should be present in at least one cell by the age of 60. The variety of somatic mutations is higher than hereditary ones, because the first, in order not to be immediately rejected by selection, it is enough to be compatible with the life of just one cell, while the second requires compatibility with the life of the whole organism.

Although somatic mutations are not inherited, their medical significance is very great. It has long been known that they play a key role in the development of various types of cancer. In recent years, it has become clear that somatic mutations cause many other diseases (R. P. Erickson, 2010. Somatic gene mutation and human disease other than cancer: An update). For example, it turned out that somatic mutations in the genes PIK3CA, AKT3 and mTOR cause hemimegalencephaly – unilateral enlargement and dysfunction of one of the hemispheres of the brain, which also increases the risk of developing epilepsy. A relatively small proportion of mutant cells can disrupt the work of vast areas of the cortex: in patients with dysfunction of the whole hemisphere, only 8 to 35% of brain cells can carry the mutation. Apparently, somatic mutations underlie many other pathologies of the central nervous system (A. Poduri et al., 2013. Somatic Mutation, Genomic Variation, and Neurological Disease).

5. Towards understanding the phenotypic effects of mutations. The ultimate – and hardly achievable in the foreseeable future – goal of studying human mutations is to create a complete catalog of them, indicating the effect of each mutation on the phenotype. Ideally, it would be good to understand the mutual influence of the effects of different mutations on each other, but it is still too far to a complete catalog of such interactions.

The effects of mutations are studied at three levels, which can be conditionally called molecular, medical and evolutionary. In the first case, we are talking about how a particular mutation affects gene expression or protein function. The second is about how the mutation affects the likelihood of developing certain diseases. The third is about the effect of mutations on fitness (reproductive success). Of course, these are interrelated things, but the correlations between them are not strict. For example, mutations that cause senile diseases are unlikely to be "harmful" from an evolutionary point of view: they will not affect reproductive success. The weakening of the function of an enzyme can affect human health in some environmental conditions, but it does not manifest itself in others, and so on. All three types of studies are associated with great methodological difficulties, so so far we have more or less detailed information only on a relatively small number of mutations.

For example, finding out the molecular effects of mutations is extremely painstaking work, which, however, can be carried out in vitro. As a result, after spending an incredible amount of effort and money, you can get a more or less complete mutation-functional spectrum for some protein. Figure 3 shows the results of a study of the effect of different amino acid substitutions on the ubiquitin ligase function (see Ubiquitin ligase) of the regulatory protein BRCA1. This protein, by sewing ubiquitin to other proteins, regulates DNA repair and plays an important role in protecting against cancer.


Figure 3. The figure shows how the ubiquitin-ligase function of BRCA1 is affected by amino acid substitutions in each of the 103 amino acid positions of the studied protein fragment. The amino acid positions are located along the horizontal axis and are labeled at the bottom of the diagram. The effects of different substitutions are shown in different colors for each position. 20 amino acids were deposited along the vertical axis, with which the researchers alternately replaced the original amino acid in each position. Yellow rectangles mark the amino acids of the "wild type", that is, those that are in this position in a normal, non-mutant protein. The blue color indicates a weakening of the function, red – its strengthening above the normal level, white – the preservation of the initial level of protein activity (see the color scale on the right; the unit corresponds to the initial state, that is, "normal"). Finally, the gray color indicates that no data was received for this replacement. By the number of white and almost white rectangles, one can judge the tolerance of this protein to mutations, that is, the spectrum of non-synonymous (leading to amino acid replacement) mutations that do not lead to dramatic changes in its functionality. Image from an article by L. M. Starita et al., 2015. Massively parallel functional analysis of BRCA1 RING domain variants.

This study, on the one hand, impresses with its grandiosity, on the other hand, leads to sad thoughts about how much effort you need to spend to get at least such simple, one–step "adaptive landscapes" for all human proteins and all single non-synonymous mutations, not to mention their combinations. But, nevertheless, this work is necessary if we want to get closer to the answer to the key question about the ratio of genotype and phenotype. Without such knowledge, we will not be able to develop technologies that will allow us to replace the weakening natural selection with targeted genome editing in the future, stop the accumulation of genetic cargo and even, perhaps, improve human nature.

Portal "Eternal youth" http://vechnayamolodost.ru
28.09.2015
Found a typo? Select it and press ctrl + enter Print version