22 May 2018

Lectures on bioinformatics

Data analysis, neural networks, and their application in biology and medicine

Nikolay Vyakhkhi, Geektimes

Almost a year ago, in the summer of 2017, a traditional summer school from the Institute of Bioinformatics was held on the basis of MIPT. The main topic of the school this year was data mining. Why? The amount of data obtained in biology and medicine is growing at an incredible rate. At the same time, it is physically impossible to detect previously unknown things in such a volume of information manually (and classical algorithms are already too complicated), so you have to use statistics and supplement natural intelligence with artificial intelligence.

This is exactly what the participants of the summer school were actively engaged in.

For those who want to develop in the field of bioinformatics, applications for a summer school this year, 2018, are still open until May 27. The school itself will be held on July 23-28 near St. Petersburg. There is a chance to jump into the last car and proudly show everyone a post with a review of next year's lectures, saying that they saw it personally.

In 2017, the school was held with the support of our permanent partners – JetBrains, BIOCAD and EPAM Systems, for which we thank them very much.

Below are 21 video recordings of last year's school lectures with slides and a description for anyone interested in the topic of data analysis in bioinformatics.

Lectures that can be watched without additional training are marked with an asterisk "*" (more than half of them).

1*. Introduction to Bioinformatics

(Alexander Predeus, Institute of Bioinformatics)
Video | Slides The lecture examines the main areas in which bioinformatics work in science and industry, the features of bioinformatics and the reasons for its popularity today.

2*. Introduction to Machine Learning

(Grigory Sapunov, Intento)
Video | Slides The constant growth of the amount of data contributes to the development of more and more complex processes of processing, searching and extracting information.
One of the ways to solve such problems is to use artificial intelligence. This lecture is devoted to a brief introduction to the basics of machine learning. Grigory explained the general terminology in this area, and also described the types of tasks solved by machine learning. In addition, the lecture introduces the main stages of machine learning, types of models and metrics of the quality of the data obtained.

3*. Introduction to Deep Learning

(Grigory Sapunov, Intento)
Video | Slides Deep learning (or deep learning) is currently gaining popularity due to the ability not to prescribe specific algorithms for solving a problem, but to use representation learning.
The development of these methods is also facilitated by an increase in the computing power of processors. The lecture is devoted to the basics of neural networks: their types (fully connected neural networks, autoencoders, convolutional, recurrent) and the tasks they solve. Separately, Grigory outlined the current state and trends.

4*. Introduction to oncogenomics and analysis of omix data in oncology

(Mikhail Pyatnitsky, V.N.Orekhovich Research Institute of Biomedical Chemistry)
Video | Slides Sequencing of the human genome, the study of human genetic variations, sequencing of the human metagenome, transcriptomic analysis of human tissues – all these biological methods in the application to “Big Data” have given scientists a large amount of valuable information about what distinguishes humans from other animals.
This lecture is dedicated to "omics" and their practical use. Separately, Mikhail touched upon the use of these data in oncology.

5. Multiomics in biology: technology integration

(Konstantin Okonechnikov, German Cancer Research Center)
Video | Slides The rapid development of experimental technologies in molecular biology, such as, for example, sequencing, made it possible to combine the study of a large range of functional processes occurring in cells, organs or even the whole body.
The lecture discusses how to properly combine massive experimental data obtained from genomics, transcriptomics and epigenomics to establish links between components of ongoing biological processes. Illustrative examples of the use of multimics are selected from a highly sought-after field of cancer research with a focus on pediatric oncology.

6. Quantitative genetics: history and prospects

(Yuri Aulchenko, Laboratory of Theoretical and Applied Functional Genomics of the FEN NSU, group of Methods of Genetic analysis, ICIG SB RAS)
Video | Slides Quantitative genetics is an exact science that is based on a small number of key observations and basic models that allow us to give a quantitative description of natural (micro)evolutionary phenomena and predict the results of genetic experiments.
She uses a powerful mathematical apparatus. Many modern statistical methods were originally developed to solve the problems of quantitative genetics. The breakthrough development of molecular biological technologies over the past decade has made it possible to characterize hundreds of thousands of living organisms by millions of genomic and other "omix" parameters. The total number of experiments conducted and the data already accumulated is enormous. The actual task of modern quantitative genetics is the development of models that will allow us to describe the inheritance of multilevel phenotypic high dimensions. In his lecture, Yuri gave a brief overview of the history of quantitative genetics and the problems facing this science.

7*. Sequencing technologies

(Kirill Grigoriev, Caribbean Genome Center, University of Puerto Rico)
Video | Slides The development and evolution of sequencing processes are inextricably linked with the evolution of technological capabilities.
The lecture shows the history and development of sequencing technologies from Sanger to the present day. Kirill spoke separately about the advantages and disadvantages of each of the currently existing methods, as well as the nature of the data obtained and their application in various fields.

8. Transcriptomics: practical methods and algorithms used

(Alexander Predeus, Institute of Bioinformatics)
Video | Slides Transcriptomics has confidently taken its place in the list of the most popular tasks facing NGS bioinformatics.
Differential analysis of gene expression, clustering of expression data, and interpretation of the obtained data in terms of metabolic and signaling cascades allow us to obtain the richest information about almost any system. The lecture discusses the best pipelines, the main problem areas in the design of experiments and processing, as well as practical cases of successful application of transcriptomic approaches.

9. Analysis of NGS data in medical genetics: definition, annotation and interpretation of genetic variants

(Yuri Barbitov, St. Petersburg State University, Alexander Predeus, Institute of Bioinformatics)
Video | Slides The use of next-generation sequencing has long gone beyond classical science and has been successfully applied in many other fields, including healthcare.
The lecture is devoted to the key aspects of the analysis of new generation sequencing data in medical genetics. Yuri showed all the way from obtaining raw reeds to making a diagnosis, mentioning the difficulties encountered in determining, annotating and interpreting genetic variants. Separately, he touched upon the common mistakes made at each of the stages of data processing. In conclusion, a brief overview of promising areas of research that can improve the accuracy of diagnosis using high-performance sequencing methods is given.

10. Practical application of ChIP-Seq and related methods

(Alexander Predeus, Institute of Bioinformatics)
Video | Slides ChIP-Seq methods, as well as "genomic footprinting" (ATAC-Seq, FAIRE-Seq, DNase-Seq) are widely used to find mechanisms of regulation of biological processes, in particular, for transcriptional regulation.
The potential space of the studied factors is very multidimensional, but the selective approach allows us to obtain rich information about the regulation in the system based on just a few experiments. Using the example of conflicting modern theories, Alexander showed the main difficulties of interpreting regulatory information, and ways to consolidate the results obtained.

11*. What can I do with iScan data

(Tatiana Tatarinova, University of La Verne )
Video | Slides Illumina company produces a large number of devices for various needs.
Chipping makes it possible to quickly detect single nucleotide polymorphisms (SNPs) for a large number of samples. The lecture is devoted to an overview of these iScan chips and their application in clinical diagnostics.

12. Deep learning in computational Biology

(Dmitry Fishman, University of Tartu)
Video | Slides Deep learning is actively used not only to improve machine translation or speech recognition, but also to solve many problems in the field of computational biology.
The lecture is devoted to the application of deep learning methods on specific biological examples. Dmitry spoke about what is new in biology and medicine using deep learning, and whether it is possible to say that machines are revolutionizing medicine and biology.

13*. Application of machine learning methods to search for potential pathogenic mutations in the human genome

(Anna Yershova, MIPT, Research Institute of Physico-Chemical Biology, Lomonosov Moscow State University, N.F. Gamalei Institute of Epidemiology and Microbiology)
Video | Slides The search for pathogenic mutations has become relevant in connection with the sequencing of the human genome.
However, it is simply impossible to solve such a task manually. The lecture is devoted to how machine learning can help to cope with this task.

14*. Immunoinformatics

(Vadim Nazarov, HSE, IBH RAS)
Video | Slides Machine learning has been actively used in various spheres of life for quite a long time, but it has only recently found a place in immunology.
In this lecture, Vadim spoke about several examples of the use of machine and deep learning in immunology, including the task of predicting the binding of MHC-peptide complexes and analyzing the repertoire of T-cell receptors.

15*. Study of host adaptation and resistance development in HIV and hepatitis C viruses using structural bioinformatics methods

(Olga Kalinina, Max Planck Society Institute of Informatics)
Video | Slides Human immunodeficiency virus (HIV) and hepatitis C virus cause severe diseases that are difficult to treat.
Like many other retro- and RNA viruses, these viruses evolve rapidly and, thus, can adapt both to the effects of specific antiviral drugs and to an adaptive immune response from the host organism. In this lecture, Olga showed how, by combining the analysis of viral protein sequences with the analysis of their spatial structure, it is possible to make predictions about the development of resistance mechanisms and the interaction of viruses with the host immune system.

16. Prediction of mutation effect

(Vasily Ramensky, MIPT)
Video | Slides Modern sequencing methods provide a huge amount of information about the polymorphism of the genome, that is, the differences between individual genomes from each other.
These differences (variants) arise as a result of mutations during DNA replication and are partially fixed in the population. The prevalence, localization and functional effect of genomic variants vary greatly – from complete lethality to the absence of any effect on the individual phenotype. The lecture discusses modern approaches to predicting the functional effect of variants used in personalized medicine, medical and population genetics.

17. Multiscale modeling and design of biological molecules

(Nikolay Doholyan, University of North Carolina at Chapel Hill)
Video The life of biological molecules covers time scales and lengths corresponding to time scales and lengths from atomic to cellular.
Therefore, new approaches to molecular modeling should be inherently multiscale. In his lecture, Nikolai described several methodologies developed in his laboratory: an algorithm for fast discrete molecular dynamic modeling, protein design and structural refinement tools. Using these methodologies, several applications can be described that shed light on the molecular etiology of cystic fibrosis and find new pharmaceutical strategies to combat this disease, model the structure of three-dimensional RNA and develop new approaches to protein control in living cells and organisms.

18. Homologous folding of proteins

(Pavel Yakovlev, BIOCAD)
Video In modern structural biology, there are a number of computational methods that allow us to characterize biological molecules, their similarities and differences, ways of interaction and functions with high reliability.
For the construction of such calculations, the spatial structure of the protein is always the input parameter, but its production can be difficult, despite half a century of progress in the field of crystallography. The lecture is devoted to solving this problem with the help of homologous modeling of protein structures – the construction of three-dimensional structures from similar fragments. For example, variable domains of antibodies – proteins with a unique structural diversity of variable loops are considered.

19. How to stop meditating and start modeling

(Artur Zalevsky, Lomonosov Moscow State University)
Video | Slides The large amount of data obtained by the NGS method allows not only to draw biological conclusions from this, but also to use them for modeling.
The constructed models allow us to better understand the biological data and get even more biological meaning from the experiment. The lecture is devoted to modeling and the initial stages of this process.

20*. Standing on the shoulders of giants, or why do we need consortia

(German Demidov, Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Universitat Pompeu Fabra)
Video | Slides Over the past decades, the development of biology has been associated with the accumulation of data arrays, so huge that individual research groups could no longer cope with their bioinformatic analysis.
In order to solve this problem, consortia of dozens of laboratories, such as the Human Genome Project, 1000GP, ENCODE and others, began to be created. Thanks to such collaborations, there are open access data of various types obtained using various technologies. As a result, the comparison of new experimental data with existing ones has become a standard part of any study. Consortia produce not only data, but also bioinformatic pipelines for their processing, and standard formats, and quality assessment procedures. This lecture discusses how consortia work, how to use the results of their work and what to do if you suddenly find yourself a member of such a consortium and you need to process terabytes of data, and then share the results with all other participants.

21*. Overview of bioinformatics companies in Russia and the world

(Andrey Afanasyev, yRisk)
Video | Slides In the modern world, science and business are increasingly intertwined.
The field of bioinformatics has not bypassed this trend either. Andrey spoke about the expectations and reality of the market, about success stories and stories of failures, about people and places related to bioinformatics.

By the way, a post with lectures from the schools before last

All bioinformatics!

Portal "Eternal youth" http://vechnayamolodost.ru