04 June 2019

With the right to correspond

How and why scientists synthesize genomes of living organisms

Daria Spasskaya, N+1

In mid–May, an article was published in Nature in which Cambridge scientists described the process of de novo synthesis and assembly in a living organism of the largest genome to date - the ring chromosome of Escherichia coli Escherichia coli. The researchers not only created a bacterium with a synthetic genome, but also removed from circulation two codons encoding the amino acid serine, thus obtaining an organism with an abbreviated genetic code.

Compared to the information hype created by Craig Venter (a pioneer of human genome research and director of the Institute named after himself) around his projects on synthetic genomics (we are talking, first of all, about creating the first organism with a synthetic genome and bacteria with a minimal genome), the news about the "substitution" of the Escherichia coli chromosome has practically passed unnoticed. Nevertheless, this is an important milestone in the field of synthetic biology, because we are talking about four million base pairs that were obtained in the laboratory from a solution of nucleotides and then got into a living cell.

We are figuring out how to synthesize a genome in the laboratory, why scientists are doing such things at all, and what has happened in synthetic genomics since the "first synthetic living organism", published in 2010 by Craig Venter's team.

Synthesis of whole genomes is a branch of synthetic biology, which aims to create organisms with specified properties using modern methods of genetic engineering.

In fact, the principles of synthetic biology have long been used in the field of industrial microbiology to create microbes - superproducers of valuable molecules, and in ecology (for example, to create biosensors), and in other fields of biology. The difference between synthetic biology is rather that it is an interdisciplinary field combining several major projects, such as the creation of computing devices based on living systems or the creation of organisms with designer genomes synthesized de novo.

How it is done

DNA synthesis in the laboratory is not a difficult task in itself – techniques that allow you to build a chain with a given sequence of nucleotides were developed back in the 80s of the XX century. However, the situation in the field of synthetic genomics from a methodological point of view resembles the situation with sequencing at the beginning of a project to decipher the human genome – there are already technologies, but they are still extremely unproductive. As a result, in order to synthesize even the shortest genome, you will have to rely on the scheme "small pieces – larger pieces – large pieces – the whole molecule".

The most popular and cheap amidophosphite method of DNA chain synthesis on a solid-phase carrier allows for no more than 100-200 cycles of sequential addition of nucleotides, and the probability of erroneous addition increases with each cycle. Thus, short single-stranded DNA molecules (oligonucleotides) consisting of a maximum of 200 links can be obtained at the output.

An alternative to this method may be enzymatic synthesis using terminal deoxynucleotidyl transferase (TdT). This enzyme works as a DNA polymerase, lengthening the chain by attaching new nucleotides, but it does not need a matrix for synthesis. The enzymatic approach can increase the accuracy of synthesis, but, apparently, not the length of the finished product. So far, the maximum length of the chain obtained in this way has been 150 nucleotides. In addition, the technology is not yet available for commercial use.

Despite the limitation in length, such productivity of synthesis has satisfied most users until recently. Short DNA strands are used in a variety of biological applications – as seedings for polymerase chain reaction, as probes for sequence detection, suppression of gene expression and mutagenesis. Nevertheless, the emergence of synthetic biology has led to increased competition in the synthesis market, as a result of which prices have fallen markedly. Most research laboratories can now afford to order the synthesis of an entire gene with a length of several hundred or thousands of nucleotides.

To synthesize even a single gene, you will have to deal with a large number of short oligonucleotides at the start. At the design stage of the experiment, the DNA sequence is broken down so that the sequences of these oligonucleotides overlap with each other. Then the oligonucleotides are mixed in several pieces and combined using a polymerase chain reaction. This method was also invented back in the late 80s and spurred the development of molecular biology so much that a Nobel Prize was awarded for its development.

In a reaction consisting of multiple cycles of sequential separation and annealing of chains on each other, a thermostable DNA polymerase is used, which completes the second DNA chain on a single-stranded matrix. As a result, short overlapping pieces can not only be sewn together, but also copied repeatedly.

synthetic1.jpg

Polymerase chain reaction scheme (Wikimedia Commons).

Next, pieces of DNA are combined into large fragments by enzymatic assembly, or, more simply, using a baker's yeast cell as a "seamstress". Due to the enhanced ability to recombine DNA, yeast can combine many overlapping sequences with each other. It was in this way that the genome of the "synthetic mycoplasma" was assembled, and the yeast was able to connect 25 pieces of DNA at the same time.

Large fragments of the E. coli genome were collected in the same way. However, to create intermediate forms of the chromosome, we had to rely on recombination in the cells of the bacterium itself. Usually, small fragments of DNA are easy to embed into the bacterial genome, but to embed pieces of DNA with the size of 100 thousand base pairs, we had to use a CRISPR-based method with the introduction of double-stranded breaks.

What is synthesized

With a high probability, readers have heard about experiments with mycoplasma genomes conducted by Venter's team – for example, we wrote about the creation of a bacterium with a minimal genome in 2016.

Mycoplasmas are the smallest and most simply arranged bacteria with a small ring chromosome. It was the Mycoplasma genitalium genome with a size of 582970 base pairs that became the first project for the synthesis of DNA of living organisms. His results were published in 2008, and in 2010, scientists from the Venter Institute reported in an article in Science that the twice-larger Mycoplasma mycoides genome was not only synthesized, but also introduced into the cell and began to replicate, which gave the authors reason to talk about the "first synthetic living organism".

synthetic2.jpg

The assembly of the mycoplasma genome in yeast from 25 separate overlapping fragments. Daniel Gibson et al / PNAS 2008.

Nevertheless, the starting point in synthetic genomics is considered not to be Venter's experiments, but experiments with virus genomes (traditionally, viruses are not considered living organisms, since they are not capable of reproduction outside the host cell).

In 2002, an article was published on the synthesis of DNA corresponding to the RNA-containing genome of poliovirus type 1 (the causative agent of polio). The size of the DNA prototype of the viral genome was only 7,500 base pairs, but at that time it was the largest DNA fragment created exclusively by means of biochemistry. The synthetic genome contained 27 nucleotide substitutions as genetic markers, or identification marks.

The poliovirus was followed by the genome of the influenza virus, which was synthesized in 2005. The researchers first recreated the genome sequence of the virus, which caused the Spanish flu epidemic in 1918-1919, bit by bit. This was done using reverse transcription from short fragments of RNA isolated from the preserved tissues of the victims. Then the sequence of the DNA copy of the virus genome was recreated in the laboratory as part of a safe ring molecule (plasmid). Subsequently, scientists used individual genes of this strain in laboratory models to analyze the causes of the lethality of this type of virus.

The intermediate record for the size of the synthesized genome (more precisely, again, its DNA copies) was set in 2008, when scientists, trying to understand the causes of the outbreak of SARS, obtained in the laboratory a gene sequence of almost 30 thousand base pairs of the SARS coronavirus from a bat.

However, already in the same year, an article by Daniel Gibson was published on the synthesis of the mycoplasma genome in half a million pairs. The new sequence assembly methods described in the article were adopted by other researchers, so since then, obtaining small viral genomes has ceased to be an outstanding achievement. It is even surprising that after the "Spanish flu" and SARS, a relatively recent article about the reconstruction of the extinct equine pox virus caused such a wide resonance and public concern.

So, the first living organism with a designer genome was mycoplasma. Even before the synthesis stage, substitutions were introduced into the sequence of its genome during the planning of the experiment, allowing not only to leave identification marks, but also to encode new information, for example, the e-mail address of an institute employee. However, mycoplasma is not a very useful object in the long term. It was much more important to develop a methodology for the synthesis of the Escherichia coli genome.

What is important about E. coli

Escherichia coli is not only a classic object of molecular biology, but also a "workhorse" of biotechnology. Hundreds of E. coli strains have been created to produce proteins, amino acids, vitamins, and various compounds that are cheaper to produce biotechnologically than to isolate from natural raw materials or to obtain using chemical synthesis.

Nevertheless, the "wild-type" E.coli genome contains four and a half million base pairs, and its assembly is fundamentally more expensive and more complicated than the mycoplasma genome.

In the 2000s, with the support of industry, a project was launched in Japan to reduce the genome of E. coli using genetic engineering methods in order to discard all "superfluous" elements and create a rapidly growing microorganism with a stable genome, ideal for creating producers of anything. One of the intermediate stages of this project, a strain called MDS42, lost 600 thousand base pairs and became the object of attention of synthetic biology.

One of the goals of creating a "synthetic" Escherichia coli was the recoding of the genome, that is, reducing the number of natural nucleotide codons encoding amino acids, or even replacing their specificity with non-canonical amino acids. Recoding is possible due to the degeneracy of the genetic code – as many as 64 codons are used to encode only 20 canonical amino acids, which make up proteins, and a stop signal.

Theoretically, it is possible to reduce the number of codons for some amino acids and thus free up space for new ones. In addition, recoding changes the original sequence of genes while preserving their amino acid composition. This, for example, makes it possible to exclude the possibility of using the genetic apparatus of the cell for virus replication.

The result of such a large-scale recoding was published in 2016 with the participation of one of the pioneers of synthetic biology, George Church. The authors of the article in Science did a lot of bioinformatic work, planned the assembly of the genome from small pieces and decided to remove as many as seven codons, replacing them in the genome with synonymous ones (that is, encoding the same thing).

The final chromosome had to contain more than 60 thousand substitutions. However, only 60 percent of the sequences were experimentally verified by creating a number of "semi-synthetic" E.coli strains. Some of them, apparently, turned out to be unviable.

synthetic3.jpg

Scheme of sequential assembly of the synthetic genome of E. coli from fragments. Julius Fredens et al / Nature 2019.

The authors of a recent paper led by Jason Chin from Cambridge decided to be more modest and replaced only three codons in the E.coli genome. The experiment was preceded by a large-scale computer design, including the breakdown into short sequences, the removal of some elements and the introduction of identification marks. The key stage of the assembly was the creation of several "semi-synthetic" strains that recombined synthetic DNA with each other during the "sexual process".

This experiment was crowned with success, and an E. coli with a synthetic genome a little less than four million base pairs was born.

And baker's yeast

Meanwhile, researchers are working on the synthesis of the genome of another model organism, the baker's yeast Saccharomyces cereviseae. Yeast is the simplest eukaryotic organism, which, like E. coli, is very widely used in basic research and biotechnology. The genome of baker's yeast already contains 12 million base pairs, in addition, it does not consist of one chromosome, as in bacteria, but of 16.

The project to synthesize the first eukaryotic genome started in 2011, and since then a consortium called "Synthetic Yeast 2.0", or Sc2.0, which included representatives of institutes from the USA, Great Britain, Australia, China and Singapore, reported on the creation of six yeast chromosomes de novo. As part of this project, the Build-A-Genome training course was launched, which allowed students and high school students to be involved in the routine work of assembling fragments.

The organizers of the project plan not only to recreate the genome of the microorganism, but also to reduce it by eight percent by removing various kinds of "genetic garbage", such as repetitive sequences and transposons, as well as introns. In addition, sites for recombination of DNA fragments can be inserted into synthetic chromosomes using the viral enzyme Cre-recombinase, by "turning on" which you can start the process of mixing the genome and directed evolution using SCRaMbLE technology.

synthetic4.jpg

Recreating in yeast cells the synthesis pathway of an aromatic component resembling the smell of raspberries to improve the taste of wine. To recreate the synthesis pathway of 4-4-hydroxyphenyl-butane-2, it was necessary to introduce several heterologous genes from plants and other microorganisms into the yeast genome. Pretorius I / Critical Reviews in Biotechnology 2016.

In 2003, the Human Genome Project was successfully completed to determine the sequence of the human genome. Now, following in the footsteps of this international project, researchers have proposed to launch the second part – already dedicated to the synthesis of the human genome. The concept of the Genome Project-write project was published in 2016, and more recently the organizers announced their main goal. It will be the creation of an Ultra-Safe Cell ("safe cell"), which, thanks to recoding, will be resistant to infection with viruses. The founders of the project hope that, just as the project to decipher the human genome has led to the emergence of cheap sequencing technologies, the project to synthesize it will lead to the development and cheapening of DNA synthesis technologies.

Why is it necessary

Jason Chin, under whose leadership the synthetic genome of E. coli was assembled, admitted in a comment for The New York Times that he was motivated primarily by research interest. He wanted to know, is a degenerate genetic code really necessary? Is it possible to create a viable organism with a stripped-down codon table? Nevertheless, synthetic genomics has a purely applied interest, it is not for nothing that the project to trim the genome of E. coli was initiated by industrialists.

Synthetic viruses can be used to create vaccines, and synthesis and recoding of genomes of microorganisms in demand in biotechnology can be used to produce proteins that do not exist in nature with new properties. In addition, recoding will allow genetically isolating modified organisms and prevent them from "escaping" into the environment.

Right now, the industry needs cheap DNA synthesis technologies for metabolic engineering, the production of enzymes, drugs and biofuels. Projects to recreate entire genomes may still look like a pointless waste of resources, but the output in the form of related technologies from them can be very noticeable.

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version