13 February 2017

Machine learning will speed up the search for drug targets

Nadezhda Bessonova, N+1

Canadian scientists have applied machine learning methods to reconstruct the 3D shape of protein molecules from two-dimensional images obtained by cryomicroscopy. The high resolution, accuracy and speed of the new method promise to significantly simplify the development of drugs for drug therapy of a wide range of diseases, including cancer and Alzheimer's disease. The description of the work is published in the journal Nature Methods (Punjani et al., cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination).

One of the directions of modern medicine is targeted therapy based on the identification of features of molecular pathology: the drug finds atypical protein molecules, binds to them and changes their shape, changing the behavior of protein in the body. An ideal drug can bind only to specific proteins, the shape of which is due to a specific disease – this way you can avoid side effects that occur when the drug binds to other proteins in the body. Thus, the development of new drugs resembles the assembly of a puzzle: without knowing the three-dimensional shape of the protein, the problem becomes practically unsolvable.

One of the promising approaches to reconstruct the three-dimensional structure of proteins is based on the use of microscopic two-dimensional images obtained by electron cryomicroscopy (cryo-EM). This method uses electron microscopes to take tens of thousands of images of frozen protein samples from different angles. After the two-dimensional images are obtained, they need to be combined into an accurate high-resolution 3D model.

Existing methods make it possible to accomplish this task in a few days, or even weeks, using a cluster of powerful computers; at the same time, their work requires an initial expert assessment of the molecule whose structure needs to be restored.

The new approach is based on the application of stochastic gradient descent (SGD), as well as optimization algorithms based on maximum likelihood methods and the method of branches and bounds. A set of machine learning methods is combined into the cryoSPARC (cryo-EM Single-Particle Ab initio Reconstruction and Classification) program, which runs on the basis of graphics processors (GPUs). The program performs the task of determining the structure of a molecule for several hours or even minutes, and the main innovation of the method is that the method does not require prior expert knowledge about the structure of a protein molecule, which allows you to obtain, among other things, completely unexpected structures of macromolecules.

cryoSPARC1.jpg

Standard gradient descent methods used to approximate three-dimensional models are sensitive to initial initialization: an arbitrary initial image can lead to a local minimum of the error function far from the desired 3D model, while correct initialization will lead to a correct model (global minimum) – therefore, it is important to have a preliminary expert assessment of the desired structure. At the same time, the classical approach uses all the original two-dimensional images at each step, which significantly slows down the process. The modified stochastic gradient descent method applied in the new work uses at each iteration an arbitrarily selected subset of the initial two-dimensional images to approximate the 3D model; at each iteration, the method uses gradients calculated based on a random set of source images, which avoids getting stuck in a local minimum and ensures multiple updates of the reconstructed model in one pass of the entire source a set of two-dimensional images.

The method was tested on known datasets for ribosome and proteasome molecules: the obtained models provided a resolution of about three angstroms (one angstrom is 10-10 meters), while the models were built in two hours and 70 minutes, respectively – in known analogues, the construction of these models takes about 20 hours.

cryoSPARC2.jpg

Optimization methods allow you to achieve high-resolution images. The figure from the article in Nature Methods shows a proteasome diagram obtained in 70 minutes of the program from 49954 original two–dimensional images.

Scientists expect that the new method will give an innovative approach to the study of structural biology objects and will help in the creation of new drugs.

Portal "Eternal youth" http://vechnayamolodost.ru  13.02.2017


Found a typo? Select it and press ctrl + enter Print version