31 January 2013

Bioinformatics is useful and "useless"

"The DNA molecule will stretch out several meters"
Scientist and oppositionist Mikhail Gelfand –
about what bioinformatics does and why there is no need to be afraid of GMOsWhat is useful bioinformatics and what is interesting about "useless" bioinformatics, why one should not be afraid of GMOs and how one can study the cell as a whole, in his lecture on "Gazeta.

Ru" says the deputy director of the Institute of Information Transmission Problems of the Russian Academy of Sciences, a member of the public council under the Ministry of Education and Science and the coordinating council of the opposition, Professor Mikhail Gelfand.

– Please tell us briefly what bioinformatics does?– Bioinformatics deals with biology, I generally tend to think that this is a branch of biology.

There are two main ways to do bioinformatics. There is "useful" bioinformatics – it tells biologists something useful for their activities. For example, the standard task is to take a gene encoding a protein and understand the function of this protein. This task is quite difficult. It turns out that by comparing genomes and doing other manipulations, it is possible in many cases to make quite specific predictions on this topic, and then the experiment is reduced simply to checking these predictions. It happens the opposite: you know that there is some function in the cell, but you don't know which protein is responsible for this function. This task is experimentally even more difficult. But in this case, too, it is theoretically possible to offer some candidates, and it will just be necessary to check. This is one side, it's closest to what my group does a lot, closer to molecular biology.

Another direction, also useful, related to the development of technology, has become especially relevant in the last ten years – this is an attempt to look at how the cell works as a whole.

In traditional molecular biology, the object of research is one specific protein or gene. Now there is a technical opportunity to look at all the interactions of proteins and DNA at once, and these are quite large amounts of data. This is a way to look at the whole cell, and it is very useful to compare different data of this kind: on the binding of proteins and DNA, DNA modification, the spatial structure of DNA in the cell.

For traditional bioinformatics, DNA is a text, a long linear molecule. In fact, it is useful to understand that this molecule is not linear, but has a complex three–dimensional structure.

If the DNA molecule is linear, it will stretch several meters. In the cage, it is rolled up into a tight ball, and this is no coincidence. For example, it turned out that spatially close sections of DNA "like" to be in a similar state from the point of view of the work of genes. Today, this complex science is at the very beginning of development: quite a lot of data has already been accumulated, and the very first attempts are being made to comprehend them.

However, there is no doubt that the future is behind it: in the end, we can talk about the work of cellular systems as a whole. And this is very instructive, because twenty years ago methodological and philosophical arguments were popular that biology was reduced to reductionism: we take the whole cell apart and study these parts. But this is a dead-end branch, so we will never understand how the cell works. Now we are making some attempt to understand this. The role of bioinformatics in it is twofold: firstly, purely technical - in the primary processing and storage of data, and secondly – meaningful, because to describe the cell as a whole, you need to do it in a fairly formal language. You need to draw graphs or write differential equations, that is, do something completely mathematized, only at the level of the cell as a whole.

It was all "useful" bioinformatics. And then there is "useless" bioinformatics, for me, for example, the most interesting.

It is connected with evolution and answers a very basic question: how did what we see around us now happen? How can we model the mechanisms that led to the formation of the genomes that we are currently observing? There is a famous saying of Dobrzhansky: "Nothing in biology makes sense except in the light of evolution." And the "right" biologist always means that he is observing some kind of instantaneous slice of a very time-consuming process. This process also needs to be described, modeled, and tried to understand how it was arranged, how it is arranged now. The prediction of gene function that I mentioned earlier is largely based on evolutionary considerations. They are just so familiar that they are operated on as if it is natural, but in fact these are very deep evolutionary statements.

– How did bioinformatics develop in Russia? – Bioinformatics is a very young science.

As I have said many times, when I started doing this in the 86th year after university, I went to the Library of Natural Sciences once a week and looked through all the articles on bioinformatics that were published in the world this week. I didn't read everything, but I looked through everything, and it was physically possible. Now, of course, not anymore: the number of articles has grown tenfold. And in the 80s it was a very young science that emerged as an independent field at the turn of the 70s and 80s.

Then you could come up with an idea, and with a very high probability it turned out that you were the first to come up with it.

And if you had some favorite technique in another area, you could try to bring it and apply it here.

– Please tell us about the Russian specialists working in bioinformatics. – A lot of strong scientists, world leaders in bioinformatics are from Russia, and many of them maintain contacts with Russia.

I will name three of the most striking examples of such scientists of Russian origin. The first is Evgeny Kunin, he works in the USA, a few years ago we had joint work with him, but he does not interact with Russia specifically. Next – Pavel Pevsner, professor at the University of California (San Diego) and Alexey Simonovich Kondrashov– professor at the University of Michigan. The latter two are very active in Russia, they have "megagrants", with the funds of which they have created successful Russian laboratories. They are at completely different poles of this science. Pevsner deals with specific algorithms designed to process specific data, algorithmic problems very close to practice. He has an amazing flair for good practical problems, from which a beautiful mathematical apparatus grows. Pevsner's Russian laboratory is located in St. Petersburg. Kondrashov, on the contrary, is a biologist, a very bright evolutionist, and his laboratory in Moscow is just dealing with evolutionary tasks. As a very good biologist, he knows unexpected and beautiful biological objects and makes his evolutionary constructions on them.

It's about those who left... And then there are the remaining ones that existed here in the 90s. We will have the twentieth anniversary of the Moscow Bioinformatics seminar in February.

It originated in '93, when two or three people remained from fairly strong groups, and none was already able to support their own workshop.

Then we decided to unite and make a regular citywide seminar. People from different laboratories met every two weeks and told each other about their work. Now the seminar is actively living, but in a different mode: the Moscow-wide seminar has become a meeting place with visiting speakers. Strong groups can afford their workshops again. There are several world-class bands.

It should be borne in mind that bioinformatics is still in a much softer position than experimental biology, because we have no difficulties with reagents, animals, and the transportation of biological samples across the border.

– Aren't supercomputers needed? – Supercomputers are needed, but there is no shortage of them.

The most acute problem of bioinformatics is not computers or even money, but strong scientists. Now is the time when biologists realized that in every good biological laboratory there should be a person who is responsible for bioinformatics. And there is quite a big demand for such people. There is a faculty of bioengineering and bioinformatics at Moscow University, and its graduates have no problems finding a job.

We did a two-year evening school in bioinformatics: at first it was a division of the Yandex school of data analysis, and now, apparently, we will try to conduct it separately. Last year, when we announced the first recruitment, we recruited 50 people, and 100 or 80 people came to the interview. And in half – mathematicians and biologists. Mathematicians who wanted to enter a new field, and biologists are experimental biologists who realized that they needed this skill.

– And who is better in this field, mathematicians or biologists? – In half.

If you look at my students, and the most successful, and just the good ones, it turns out that about half.

– Tell us what tasks you are currently engaged in.– There is a direction in the work of our laboratory related to the prediction of gene functions.

We are engaged in "useless" bioinformatics – we study how the regulatory systems of bacteria evolve, and "on the way" we make a lot of practical predictions: which protein performs which function. There is a laboratory at the Burnham Institute for Medical Research (University of California, San Diego), and several of my students spend a significant part of their time there. This is the biochemical laboratory of Andrei Osterman, which is engaged in experimental verification of bioinformatic predictions, both ours and its own.

Another direction is related to the analysis of mass data on how genes work in different mammals. For example, we have a joint project with Philip Haitovich from Shanghai, in which we study how the work of genes in the brains of humans and monkeys changes with age.

The third thing we are doing is my project to study the evolution of bacteria in a very short time. If you take two strains of E. coli, it turns out that their sequences are 99% identical, but each of these strains will have large chunks of the genome that the other strain does not have at all. These are the same E. coli, but they can differ by a third in genome, and the fragments that they differ in are quite often responsible for pathogenicity. Drug resistance is also transmitted between strains: some have it, while others do not. That is, this problem is practically very important, but so far we have a very poor understanding of how the evolution of bacteria works.

It would be useful to know how pathogenesis mechanisms or, conversely, antibiotic protection genes are transmitted between bacteria.

– Please give specific examples of the practical application of bioinformatics. – The same as from biology classes.

First, obviously, it's medicine. Progress in cancer treatment is largely tied to progress in understanding its molecular mechanisms. This is not my field, but now quite a lot of papers are being published on the molecular diagnosis of cancer and on the definition of molecular mechanisms that lead to cancer degeneration. This is the case of the last 2-3 years. A lot of patients with the same diagnosis are taken, and the sequence of cells from the tumor and healthy cells is determined, it looks like what has changed. Cancer is a disease of the genome, it is caused by mutations in DNA, you can simply "write out these mutations in a column". Now, when it is possible to determine the genome sequence in cells from a tumor, it is possible to diagnose by which signaling pathways in the cell have broken down. In general, cancer is a disease of the "cellular bureaucracy". Ten percent of cell proteins are engaged in transmitting various signals and determining the state of the cell. A normal cell cannot divide uncontrollably, because the "bureaucracy" does not allow it, it understands that it is "squeezed" between neighbors, knows what tissue it is in, and therefore behaves accordingly. There are special signaling pathways that regulate the work of genes depending on the tissue context.

In cancer, the cell ceases to "recognize its neighbors", ceases to understand what tissue it is in, it returns to an undifferentiated state, begins to "travel" through the body, forming metastases.

It turns out that what we previously considered to be one diagnosis may be several different ones. The two tumors look similar, but the "broken" in them is different – a different molecular diagnosis. And this is essential for the prognosis and choice of medicines. Modern cancer drugs act on these signaling pathways, on individual components of these signaling pathways – they suppress them or, conversely, activate them. Practical applications that are just beginning to develop are, for example, this. Suppose you had a drug that is effective in 10% of cases (for some cancers it is good), but has severe side effects. If we don't know in advance who the medicine will help, we won't use it. But if we know in advance who is one of the ten to whom it will be useful, for whom the medicine will work, then we will give it to him, and the side effects will be justified. And we will not torment the others in vain. Of course, this is an idealized case, it roughly shows the direction in which medicine will develop.

And the second aspect is that in different types of cancer, the same signaling pathways are "broken".

Cancers will be different at the same time, because the same pathway could "break down" in the cells of one or another tissue: the influence of the genomic context, the fact that the outcome depends not on one gene, but on which other genes are active, has not been canceled. But the main mechanism in which the breakdown occurred is the same. And now the first clinical papers of this kind appear, when you take a drug that has already been approved for some type of cancer (you do not need to conduct safety tests, it has already been introduced into medical practice), and begin to apply it to a cancer to which it has not been used before, because the molecular mechanism is the same. According to the technique of these works, this is pure bioinformatics – the analysis of large amounts of data on signaling paths.

I will make a reservation once again that this is not the area that I am engaged in, and we are talking about search work, the district hospital does not do this en masse. This is a difficult, very subtle science and individual examples, not massive for the whole oncology, but showing in which direction it is possible to move.

In addition, practical applications of biology can be found in agriculture and genetic engineering. Agriculture needs significant progress because there are many people living on the earth. That is, either we all agree to eat less, or we have to grow more – and that's where genetic engineering helps.

There are also technologies related to the production of medicines – bacterial biotechnologies, the same genetic engineering, only at the level of microorganisms.

In this context, it is impossible to keep silent about the hysteria around genetically modified organisms.

Everyone is terribly afraid of genetically modified potatoes, and no one is afraid of genetically modified bacteria that make human insulin - it is this human insulin produced by bacteria that is injected into all diabetics.

This is the simplest example. A more subtle example is one of the very beautiful discoveries in the field of the system of protection of bacteria from viruses – bacterial immunity. This is an absolutely fundamental thing, no one expected this to happen. The people from DANONE made the most progress in this study. They have a big problem: there are starter cultures for yogurt starter cultures, and from time to time there are viral epidemics that lead to the death of these cultures. Therefore, the analysis of how bacteria protect themselves from viruses is absolutely fundamental for DANONE.

This is all general information about the benefits of biology, and bioinformatics in each field of biology allows biologists to work much more efficiently. The consumer of biology is medicine, or agriculture, or biotech, and our consumer is biologists. We are one step further from practice.

– Can I have a technical question? If you don't have your own strains, your own bioorganisms, what do you take for observational material? Do you order laboratories?– If necessary, we can order, but really it's all on the Internet.

There are colossal databases, there are whole "factories" that generate these sequences. In China, there is the Beijing Genomic Institute, which is a colossal "factory" for the production of these sequences. In addition, there is an international rule with DNA sequences, adopted back in the 80s, that no serious journal will publish your article if you have not put all the sequences you write about in a standard depository. And then everyone can use it.

In addition, we have quite a lot of collaborations with experimenters, they produce data, and we process them together. There are collaborations in Russia, Germany, America and China.

Portal "Eternal youth" http://vechnayamolodost.ru31.01.2013

Found a typo? Select it and press ctrl + enter Print version