15 October 2018

Proteomics

Why study proteins and how to do it

Sergey Moshkovsky, Post-science

Soviet schoolchildren in high school had a textbook on biology, on the first pages of which there was a statement by Friedrich Engels: "Life is a way of existence of protein bodies." Following the example of people of that time, Engels was a well-educated man, and there is some truth in his words. If you take any living creature and extract all the water from it, then the protein in the dry residue will take the first place by weight among all macromolecules. It turns out that life consists of protein. All work in cells is done by proteins. They perform any function, except, probably, the transmission of information. With their help, you can configure working living systems, so proteins must be studied.

Proteomics1.jpg

What does a squirrel look like

Every organism has a certain number of protein-coding genes. They determine the number of basic protein entities that perform various functions. But an equal sign cannot be drawn directly between genes and proteins, because proteins, after they have descended from the synthetic conveyor of matrix RNA and ribosomes, can be modified, decomposed, split and form new functional fragments. Proteins are joined by various radicals that play the role of signaling or functionalization. Cofactors, coenzymes are added, if it is an enzyme. For example, insulin is initially synthesized in a completely different form, and when it splits, it already becomes functional.

A cofactor is a non–protein compound, often a metal ion, that an enzyme needs to perform a biological function. Cofactors are often called helper molecules that participate in biochemical transformations.

The structure and function of many protein products encoded in the human genome are unknown. No one has ever seen them, isolated them, isolated them. It is impossible to chemically synthesize protein according to the sequence of amino acids: it will most often precipitate, and that will be the end of it. This is due to the fact that during artificial synthesis, proteins do not take the desired spatial shape. No more than 10% of all protein products generally known in nature spontaneously adopt the desired conformation. The working conformation is achieved only in a living cell under the influence of proteostasis systems – the so-called chaperones.

Chaperones are proteins whose main function is to restore the native tertiary or quaternary structure of the protein, as well as in the formation and dissociation of protein complexes. Chaperones exist in every living organism, and the mechanism of their action – non–covalent attachment to the protein and its "unwinding" using the energy of ATP hydrolysis - is also conservative. The more primitive the organism, the simpler this system is, although bacteria also have it. It is estimated that the cell spends about 30% of its energy on proteostasis.

Proteomics2.jpg
Structure of the yeast chaperone

For some proteins, a very sophisticated proteostasis system is needed: 20 ATP molecules are spent to bring one protein molecule into a functional state – so sophisticated is its conformation. If this system is turned off, the protein directly in the cells will fall into a precipitate and become toxic. In general, prion diseases or the formation of amyloids in Alzheimer's disease is a deficiency of the proteostasis system.

Almost all human proteins can be expressed in a homologous expression system, that is, in human cells. Human proteins can also be developed in bacteria, but for this they need to be deprived of introns and then create such vectors, but subsequently these proteins may turn out to be toxic to the bacterium and begin to fold incorrectly, form inclusion corpuscles. And this will prevent you from establishing their correct spatial structure.

An intron is a section of DNA whose copy is removed from the primary transcript and is absent in mature matrix RNA. Introns are found only in eukaryotic genes.

Computer modeling works well only in cases where there are differences of a maximum of a few amino acids from a known structure. Everything else needs to be isolated in its pure form, crystallized to do X-ray diffraction analysis. Methods of nuclear magnetic resonance are also well developed for this purpose. Now cryoelectronic microscopy is developing – this is a fairly powerful method that also allows you to find out the spatial structure of the protein. But each method has its own limitations. For example, membrane proteins crystallize poorly because lipids interfere very much. Therefore, it is quite difficult to study the structure of membrane channels or receptors, and this is a very important part of modern biology.

Attempts to create a protein with predetermined properties began as soon as genetic engineering appeared, in the 1980s and 1990s. The problem is that first you need to create a protein that will not precipitate, will take a stable conformation, and then you have to take on the properties. Therefore, it is easier to approach the task from the other end: sequencing the genomes of all known organisms is faster, you can take proteins with any desired properties from there. For example, you need a thermostable protein – you dived to an underwater volcano, found the inhabitants of hot springs and determined which gene encodes the protein you need. It's easier to find an organism with the desired properties than to come up with a protein, because nature had more time for selection than you and me.

How to determine how much protein is in the body, approximately 10 thousand proteins are registered in cell culture, that is, the products of 10 thousand genes. They make much more proteoforms, because there are modifications. The diversity of the proteome does not particularly depend on the complexity of the organism. Of course, bacteria have fewer genomes – mycoplasma has only 700 genes – so the diversity of the proteome will be lower. But in general, the relationship between the diversity of proteins and the degree of organization of a living being is small.

There are two ways to determine the protein concentration. The most common method is to get antibodies to the desired protein, test them on a standard, that is, on a series of samples with a known protein concentration measured by a parallel method, and determine the desired protein content by the degree of binding. This method is called immune analysis, and it is very widespread, even used in pregnancy tests. It turns out that we can use one protein to measure the content of another protein.

The second method is mass spectrometric. It is necessary to isolate pure proteins, split them into fragments, and this can already be identified directly in the mass spectrometer, that is, without a binding agent. And although it seems that such a system is more accurate and does not depend on reagents, there is still an experimental error. The same trypsin does not always work the same way, because it is a kind of living system, an enzyme, and living systems are distinguished by flexibility.

Proteomics3.jpg 
 Trypsin is an enzyme that destroys peptides and proteins

What should I do if I need to determine the protein content in one cell of an entire tissue? Cells in culture can be easily counted under a microscope. Then you need to take the whole culture, destroy the cells, get an extract and measure the protein content in this extract. The resulting value can be recalculated by one cell, knowing their initial number and the volume of one cell. Of course, it will turn out to be a kind of average temperature in the hospital, because not all cells are the same: in some protein may not be produced at all, and some cells may be literally stuffed with it.

How to measure protein in single cells? You can try to inject fluorescent antibodies into the cell and use a microscope to estimate how many antibody molecules we have bound to target proteins.

Antibodies are glycoproteins located on the surface of B-lymphocytes in the form of membrane-bound receptors and in blood plasma, formed in response to the introduction of bacteria, viruses, protein toxins and other antigens into the human body or warm-blooded animals. By binding their active sites to bacteria or viruses, antibodies prevent their reproduction or neutralize the toxic substances they emit

If we believe that they have contacted with one hundred percent efficiency, we can determine the concentration of the protein we need. This approach is the future, because you need to do everything in single cells. Now the vanguard of science is the definition of a transcriptome in single cells. This is how new cell types are discovered: they took the tissue of the spleen or another organ, sorted the cells and saw that they were very strongly segregated by transcriptome. What we thought was one entity is actually different cell types. Then you can distinguish each type morphologically and describe how they are located in the tissue. 

This is possible thanks to PCR and amplification of nucleic acids. The proteome of one cell is much more difficult to make, you can cheat a little: take some giant cell and try to identify all the proteins in it. The grape snail has giant neurons measuring 0.4 mm – a very suitable object for this task.

The variability of human proteins is known at the level of about 100 thousand exomes, that is, protein-coding parts of the genome, and is described better than for any other organism. Different proteins live in the body for different amounts of time. Albumin is secreted by liver cells and lives for two weeks, and some proteins can persist for months. Some squirrels live for hours. Antibodies are synthesized and destroyed all the time, and an eternal immune response is possible not due to a constant population of proteins, but due to the cellular population. If the protein turns out to be defective, it ideally splits immediately. The lifetime of a protein depends on external stimuli and sequence. For example, there is an N-terminal rule, according to which, depending on which amino acid is at the N-end of the molecule, proteins live at different times.

There are limits to the variability of protein profiles within the human population. This has to do with genomic polymorphism, which is partly related to natural selection. People differ from each other by a small number of mutations, or variants relative to the whole genome, and some of them are coding. Some populations are adaptive to some conditions, others to others. Polymorphism is also associated with the concentration of certain proteins. Even if the sequence remains conservative, the level of protein synthesis may vary. Conservative sequences are similar or identical sequences found in biological polymers: nucleic acids, the primary structure of proteins. It is widely believed that a mutation in a conservative sequence leads to the appearance of either a non-viable organism or a phenotype that is eliminated by natural selection.

From the point of view of medicine, it is not those mutations that greatly change the structure of the protein that are dangerous, because such an embryo does not survive and we will not see it, but those that leave a person viable, but lead to some diseases. This is how man now differs from other species that live on the principle of "all or nothing". Our purifying selection is less effective than that of our ancestors, but, apparently, this is our way of development.

In addition to congenital mutations, there are acquired ones. The main example is cancer, which most often occurs due to the fact that the burden of mutations has accumulated. In elderly people who do not have cancer, many mutant cells persist in the bloodstream. The older you are, the more mistakes you accumulate, and often this quantity turns into quality. Eventually, a breakthrough occurs, and cancer clones begin to grow at a high rate. Thus, different diseases can be diagnosed by proteins. But now DNA diagnostics remains in the first place.

Nucleic acid sites that are specific to the disease can be amplified from just one copy, since there is a polymerase chain reaction. And the sensitivity of methods for detecting proteins in different human biological fluids is limited: proteins cannot be amplified. For example, if a tumor the size of a small nut has formed, then some dangerous mutant protein enters the blood in the amount of several copies, and it is impossible to identify it.In addition to congenital mutations, there are acquired ones. The main example is cancer, which most often occurs due to the fact that the burden of mutations has accumulated. In elderly people who do not have cancer, many mutant cells persist in the bloodstream. The older you are, the more mistakes you accumulate, and often this quantity turns into quality. Eventually, a breakthrough occurs, and cancer clones begin to grow at a high rate. Thus, different diseases can be diagnosed by proteins. But now DNA diagnostics remains in the first place.

Of course, the clinic measures proteins, but those that are accumulated in large quantities: C-reactive protein, various enzymes, alkaline phosphatases, aspartate aminotransferases, alanine aminotransferases (AST and ALT). And usually the concentration spread is large enough not to give false positive results. Difficulties in determining the protein content arise at different levels. For example, a mass spectrometer has a sensitivity of 6000 molecules, if they are simply guaranteed to be loaded there. But at each stage of the transfer of molecules to the mass spectrometer, obstacles arise. Surely, with the availability of tools, it is possible to detect proteins at a concentration of 10-9-10-10 mol per liter, but it depends on the sample in which we are looking for it.

In scientific journals, you can often find studies that allow you to identify, say, 10 protein molecules, but it is not yet possible to put such technologies on stream: when we start testing this method from a metrological point of view, today we will see 10 molecules, tomorrow 20, the day after tomorrow nothing at all. So far, the best result achieved in the laboratory is associated with the detection of proteins in the biological fluid at a concentration of several molecules in one liter.

Therefore, the question arises whether proteins are good biomarkers. In oncology, due to the fact that it is much more convenient to detect nucleic acids, methods for determining nucleic acids circulating in blood plasma that carry specific mutations for cancer cells are being developed. Cancer cells are destroyed, and therefore it is possible to detect areas of their fragmented DNA in the body, because the permeability of organs and tissues is non-zero. This has not yet fully passed into practice, but so far there are old markers, proteins, of course, glycoproteins, mucins, enzymes that change their activity and indicate the presence of different conditions.

Why proteins can perform many different functions

Nucleic acid consists of four residues with modifications: adenine, guanine, thymine, cytosine, and the fifth is uracil, which replaces thymine in RNA. This is enough to ensure the transmission of hereditary information in the form of a genetic code. Protein consists of twenty such bricks, and they are all of different properties. They have radicals, side groups that carry different properties. Some are large and fatty, others are small and soluble in water, some are positively charged, others negatively. Different combinations with different properties are possible: small fatty, small charged, large charged, heterocyclic, prone to form hydrogen bonds and enter into other weak interactions, capable of forming covalent bonds and many more different variants.

In fact, we have 21 amino acids, there is still selenocysteine, and bacteria even have 22 of them: pyrrolysin is also added. In the genome, selenocysteine is encoded not by a code, but by a special hairpin, that is, the secondary structure of RNA. In the presence of such a hairpin, selenocysteine is inserted instead of cysteine. We can say that this conformation is also recorded at the DNA level, because there is a certain pattern that affects the ribosome and programs the creation of a hairpin and substitution of selenocysteine, although the codon at this moment is the same as for cysteine.

Selenocysteine is an analog of cysteine with the replacement of a sulfur atom with a selenium atom. It is part of the active center of the enzyme glutathione peroxidase, as well as selenoproteins and some other proteins.

Apparently, such a set of amino acids is enough for everything. The decision is made by natural selection: what is not needed is simply discarded, and everything useful is preserved. Evolution does not ask, "What should I do? Will I have enough opportunities to implement all the functions?" What we have now is the result of the evolutionary process of developing proteins for almost 4 billion years. This was enough to go through a lot of options and get both a web, and skin, and claws, and teeth.

Proteins are a product of evolution because they are directly encoded in the genome. Therefore, they have more opportunities for combinatorics in an evolutionary way – by introducing mutations into the genome. The tools for the synthesis of lipids and carbohydrates are limited by enzymatic systems, therefore, probably, these compounds had fewer opportunities for combinatorics. In addition, the amino acid composition of proteins allows them to independently be enzymes, enzymes, and reduce the activation energy of reactions. As for carbohydrates and lipids, they have either a structural function or a recognition function, that is, a receptor function. They can't be enzymes themselves.

There are also ribozymes – RNA molecules with enzymatic properties. RNA is also encoded in the genome, and it also had opportunities for evolution, but proteins are more stable from a chemical point of view, and RNA decomposed quickly. The whole world is full of RNase, because in the fight against viruses, all organisms have acquired this enzyme for splitting viral genomes. Therefore, RNAs have given way to proteins in the race of functionality, so to speak. But this is only in the external environment. Inside cells, RNAs perform a huge number of different functions. We have only recently learned about all these properties, because we have learned to work in conditions free of RNase and DNase. When you conduct experiments with nucleic acids, you can't touch anything with your bare hands, because we also have a huge amount of RNase on our skin.

The question "What is more important: proteins or non-proteins?" is akin to the struggle in the ancient Academy of Sciences, when one scientist came out and said: "Proteins need to be investigated," and another answered: "Why these squirrels? I've been researching nucleic acids all my life, and there's a lot more information in them." Determining which molecules are more important is politics. And everything is important for science. You need to know the whole structure of the cell.

About the author:
Sergey Moshkovsky – Doctor of Biological Sciences, Professor of the Department of Biochemistry of the Faculty of Medicine and Biology of the Russian National Research Medical University named after N. I. Pirogov of the Ministry of Health of Russia, Head of the Department of Personalized Medicine of the V. N. Orekhovich Research Institute of Biomedical Chemistry of the Russian Academy of Medical Sciences.

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version