28 March 2016

Living wage

What is the minimal genome and how does it propel us towards the creation of synthetic life

Alexander Ershov, N+1

In 2010, a team of biologists led by Craig Venter – the one who read the human genome faster than the international consortium of the same name – reported on the creation of the "first fully synthetic organism". The essence of that work was to chemically synthesize the genome of the bacterium Mycoplasma mycoides and upload it – roughly like a new firmware into a computer – into another cell. Today, the same team talked about the next step to creating a "designer life". Scientists managed to remove everything unnecessary from version 1.0 and grow a new organism with a completely self-sufficient and at the same time minimal genome. We talked with Professor Mikhail Gelfand, a well-known Russian bioinformatician, deputy director of the IPPI RAS, about how this was done and what the new work means for science.

Craig Venter is a figure in biology as odious as he is legendary. For the reader unfamiliar with his biography, you can safely recommend an old, but not bad book "Genomic War". It describes in detail the story thanks to which our hero became famous, the story of how Venter overtook the entire world scientific community and read the human genome faster and cheaper than academic scientists. Not out of a special love for humanity: actually, the plan was to patent the genes as soon as possible and make a profit from them. Here, fortunately, everything happened exactly according to Goethe: instead of patents for someone else's DNA, Venter received only fame and the nickname "Darth Venter". Humanity has also acquired a new genome sequencing technology (the "shotgun method"), which still remains the standard in its field.

But this is all the work of bygone days. For the past 15 years, Venter and the team of the institute headed by him (naturally, the Venter Institute) have been working on the creation of "synthetic life". What is it? It should be noted right away that under all the headlines that beat the "game of God" or the creation of "artificial life", in fact, we are talking about much more prosaic things than it may seem at first glance. Neither Venter nor any other scientist can and is not going to create any fundamentally new organisms from fundamentally different components than those that exist in any cell on Earth. We are not talking about creating some hypothetical "silicon", "arsenic" or digital life.

In the real synthetic biology of 2016, three main directions can be conditionally distinguished. Firstly, it is the creation and modeling of artificial genetic constructs that could become elementary components of future large systems. They should become approximate analogues of diodes, switches or capacitors in electronics – that is, a set of simple and predictable components that can be introduced into GM organisms and used to create vaccines, synthetic fuels, and so on. In general, this is still the same genetic engineering, only much further removed from natural components than in the 70s and 80s. A slightly different direction, xenobiology, is the introduction of artificial components into ordinary model organisms. For example, the transformation of one of the stop codons in the genetic code into a triplet encoding a new, non-naturally occurring amino acid. This not only expands the range of what can be inserted into the protein, but also binds the body to the laboratory, from which it can no longer escape – because there are no artificial amino acids outside that it needs for life. This is very important for creating strains that synthesize something potentially dangerous for the environment. Or, for example, a work in which scientists created two additional artificial letters for nucleic acids – bases that do not exist in nature. They can find many applications in biotechnology that are now difficult to predict.

The third direction, which only Venter and his colleagues are currently engaged in, is the chemical synthesis of complete genomes. This is in some way a top-down approach, that is, not assembling a new organism from simple and understandable synthetic elements (there are only a few of them so far), but creating a complete genome all at once. Even without a detailed analysis of what it includes: we'll do it first – then we'll figure it out.

It is not a deep analysis or an elegant idea that comes to the fore here, but a simple in formulation, but at the same time a very difficult task: to synthesize a very large DNA molecule and not make mistakes at the same time. This is a task that few people have faced before Venter. Now the chemical synthesis of DNA fragments with a length of 50-70 bases is done very simply; it happens on automatic synthesizer machines, there will be several dozen of them in Moscow alone. But if you need DNA hundreds and thousands of bases long, the task immediately becomes non–trivial and requires multi-stage assembly of fragments manually. If we are talking about DNA with a length of hundreds of thousands of nucleotides (and there are a little more than a million bases in the Mycoplasma genome), then we need to switch to other model organisms and develop new molecular biological methods. In addition, as the length of the synthesized DNA grows, the probability of making an error increases rapidly, which means that you need to come up with an effective system for finding and correcting them. And this, as experience has shown, can be difficult.

Venter's group's first success in this direction was the synthesis of the bacteriophage φX174 genome. This virus, by the way, was not chosen by chance – about ten years before it became the first organism with a fully read genome. The length of this genome is only 5386 bases, but for 2003 it was still a great achievement. Then the same team created methods of genome transplantation, allowing to introduce a whole new genome into the cell and then destroy the old genome. In a computer analogy, this is akin to installing new software on old hardware. The difference between biological systems is that here the new software itself gradually builds a new iron for itself: the composition of RNA, proteins, membrane lipids – in general, almost everything that is in the cell - changes in a bacterium with a "substituted" genome. The efforts of Venter and his colleagues eventually culminated in a 2010 article describing the creation of the first living organism with a synthetic genome – Mycoplasma mycoides JCVI-syn1.0.

The genome of this organism was almost identical to the genome of natural mycoplasma, with the exception of a small number of marker genes and "digital signatures" of the authors of the article. However, despite the impressive technical achievements that accompanied this work, it inevitably raises a simple question – "why is it necessary to artificially fabricate Spinoza when any woman can give birth to him anytime"? Why make a bacterium with an artificial genome if it is almost no different from the natural one? By itself, the work of 2010 would be meaningless if it were not followed by the very continuation that has been published now.

The idea behind this whole cycle of work is that sooner or later biologists must move away from a purely descriptive understanding of life and learn how to create new, "designer" organisms. To some extent, this is already happening now, let's remember at least about GMOs. But there we are talking only about point, specially introduced functions, in fact, the organism remains the same as evolution made it. But evolution had its own tasks that left their historical imprint on the genome – we don't need it.

What if we throw out of a model organism, for example, a simple bacterium, everything that is not absolutely necessary for it to reproduce? Then, by adding one or another ready-made structure, we will be able to get from it an organism that perfectly copes with one specific task. For example, with the production of the necessary antibiotic or medicinal protein. Or with the creation of synthetic fuel, – one of the most favorite topics for Venter. The smaller the system, the more predictable it should be, which means that it is possible to achieve complete control over the organism only when we minimize its genome. Whether this is true or not is still unknown. Many researchers are skeptical about the very idea that an organism with a minimal genome will necessarily be the most effective and predictable. But one way or another, it is this logic that underlies the whole idea of chemical synthesis of genomes, as Venter's team understands it. Therefore, the task of minimization was set as soon as it was possible to develop a pipeline of genome synthesis.

How was the genome minimization really carried out in the course of the new work? Two ways. Firstly, scientists had a system of mutagenesis using transposons, developed, by the way, by the same team back in 1999. It allows you to turn off one gene in one organism, and then see what happened. If we sequenced the genomes of thousands of colonies where such mutations occurred, then it will be possible to map the genome and find out where the transposons never fall.

It's like with the famous survivor's mistake: if after the raids the bombers return with holes in the wings and tail, but never have damage in the area of the cabin and gas tank, it does not mean that bullets do not get into the gas tank and cabin. This means that when they get there, such a plane does not return to base. The same approach also works with the genome: if during random mutagenesis you never see mutations in a certain gene, then this gene is most likely necessary for the cell to survive. In addition to transposon mutagenesis, Venter's team had a system that allowed them to check the survival of large chunks of the bacterial genome. For example, it was possible to remove some genes in whole cassettes and check how this would affect the survival of the organism. Previously, this system was used to control random mutations in the genome, now it has been adapted for their deliberate introduction.

What do we have in the end, what kind of organism did we manage to get now? Firstly, the mycoplasma genome has been shrunk almost twice: its length is 531 thousand bases against 1079 thousand of the original Mycoplasma mycoides JCVI-syn1.0. It contains 473 genes, including several regulatory RNA genes. Secondly, the syn3.0 strain divides almost three times slower than natural mycoplasma and has a rather disorderly morphology of cells: they differ greatly in size and can form filaments.

Fourth, data on the composition of the minimal genome were obtained. In it, about 40 percent of the genes are responsible for the expression of genetic information – the synthesis of RNA and proteins, 7 percent are genes for information reproduction, replication and mutation repair systems. 18 percent of the genes are responsible for maintaining the structure of the membrane and about the same number are engaged in metabolism.

min-genom.png
Figure from the article by Hutchison et al. Design and synthesis of a minimal bacterial genome (Science, 2016) – VM

The most interesting thing is that 149 out of 473 genes do not have an unambiguously shown function. Some of them are similar to some well-known classes of proteins, but almost 17 percent do not have a certain known function, even in the most general sense.

On the one hand, this is a rather unusual result, as Venter himself repeatedly said at a teleconference for journalists. It is strange to see a third of the genes with an unknown function in the minimal genome. On the other hand, it does not surprise experts, many of them predicted that the minimal genome would necessarily contain genes with an unknown function. At least because our understanding of the genomic universe is very far from complete. The proportion of such genes – almost a third – may be unusual, but not the very fact of their presence. We asked Mikhail Gelfand, one of the main Russian specialists in the genomics of micrographs, to tell us what else interesting was found in the minimal syn3.0 genome.

"N+1": Is there something unexpected among the received genes?

M.G.: There are several genes there, about which it is not entirely clear why they are so mandatory. For example, there are a number of exporters who throw something out of the cage, and it is not clear why this is so important.

In addition, there are more than a hundred genes with unknown functions in the syn3.0 genome. That is, in what sense unknown? For some of them, it is clear what type of chemical reaction catalyzes the enzyme encoded by such a gene. But at the same time, neither its specificity (with which substances it works) is clear, nor the biological context – in which situation this enzyme "turns on". And this is interesting because it shows that we don't understand cell biochemistry as well as we would like.

Among these genes with unknown functions, there are many quite conservative ones that persist up to humans. It is clear that if they are so old, then they are probably quite important. And vice versa – there are necessary genes specific to mycoplasmas, that is, apparently, evolutionarily quite young. It would be interesting to try to understand their evolutionary history.

Another thing that may be interesting, but is not reflected in the article, is the composition of those genes that still managed to be thrown out of the minimal genome. What is the proportion of those that have been preserved up to a person among them? It would be interesting to see, because conservatism usually indicates the importance of a gene, and if some gene is conservative, but it is not in the minimal genome, it would be interesting to know why. Well, that is, it is clear that this syn3.0 lives in heavenly conditions, it does not need to be able to do much of what cells need in natural conditions. But, nevertheless, it can be instructive.

"N+1": In the article by Kunin and Mushegyan (A minimal gene set for cellular life derived by comparison of complete bacterial genomes, PNAS, 1996), where the concept of a minimal genome is put forward for the first time, this theoretically minimal genome turned out to be almost half as small as in real syn3.0. So, they were you wrong?

M.G.: This is a classic story about how a beautiful interesting idea collides with an ugly biological reality. It's like a universal genetic code – a beautiful table that is given in any biology textbook. About which it is still useful to understand that there are organisms whose genetic code differs from the "universal" one.

One of the reasons for the difference between the predicted and experimentally working set is that completely unrelated proteins can perform the same function. And Kunin and Mushegyan, of course, understood this perfectly. If one protein performs the same vital function in Mycoplasma, and another in Haemophilus, then you will not be able to find out that at least one of them is needed in a minimal set. Comparing two genomes, it is simply impossible to establish.

In general, a whole line of research has already grown out of this observation. Kunin called this phenomenon non–orthological substitution - when the function of a gene in the genome is intercepted by another, unrelated gene, for example, introduced as a result of horizontal transfer. If we are looking for such genes, we can look at their distribution across the genomes of different organisms. For example, if a pair of genes are really "doubles" of each other, then they should occur less often together in the same genome (this is simply redundant), so that by analyzing the distribution, you can catch such pairs of genes. Mushegyan then had such works.

"N+1": But, as far as I understand, this kind of analysis was not used in the work of Venter and the company – did they prefer a pure experimental "brute force"?

Brute force, "brute force method" – search for a solution using a complete search of all possible solutions to the problem – VM

M.G.: And this is unknown, in fact. They write that, in addition to the data on transposon mutagenesis, when selecting genes into a minimal set, they additionally looked at their functions. And here, it seems to me, all these analyses may well be hiding. The problem is that such an analysis is poorly algorithmized, it is difficult to describe it in the "Methods" section. If you can't describe in detail exactly how the decision was made, it's often easier to say that it was "done by expert evaluation", that's all.

As for the very concept of "minimality", there are two things to keep in mind. Firstly, there are bacteria whose set of genes is even smaller than this minimum; Nancy Moran's group studies them a lot. There is an insect endosymbiont bacterium that has only 121 genes (Tremblaya princeps). For example, all ribosomal proteins remained in the syn3.0 genome. In the same endosymbionts, even ultraconservative ribosomal proteins begin to disappear, the structure of the ribosome changes, which is quite non-trivial (this is our unpublished data so far). Of course, such bacteria cannot live freely.

Secondly, the minimal genome is not an absolute definition. The minimality threshold is rather conditional, since it strongly depends on the growth rate. A couple more genes could be thrown out of the syn3.0 genome, but then it would divide even more slowly.

The beautiful result is that the mandatory and optional genes are not mixed in the genome, but go in clusters. It's not that it's completely unexpected – if I were offered a bet, I would bet on it that it should be so. But when it's actually shown, it's funny.

"N+1": If you choose one main result from this article, what will it be?

M.G.:There is a rather small scientific component and a colossal technological one. If you ask what has changed in biology after this article, then I would answer that, by and large, nothing. There are, of course, lists of mandatory and optional genes that will be useful to look at, but it's not God knows what. However, technologically, the work is done, of course, amazing.

The main result of this work is, of course, the conclusion that such a thing can be done in principle. This is the most interesting thing. Everything else is either technical solutions or data from which beautiful biology can probably be extracted, but the authors were not concerned about this. This one is like a flight to the moon – well, what scientific data was obtained there, probably not too sensational. The sensation was that such a task could in principle have an engineering solution. And that's the most important thing.

Portal "Eternal youth" http://vechnayamolodost.ru  28.03.2016

Found a typo? Select it and press ctrl + enter Print version