16 July 2020

Decryptor for viruses

Bioinformatics specialists of St. Petersburg State University have developed a collector for decoding the genomes of coronaviruses

A new development of the Center for Algorithmic Biotechnology of St. Petersburg State University, called coronaSPAdes, allows you to collect the genomes of RNA viruses, and primarily coronaviruses. According to preliminary data, it has already been possible to restore genome sequences of previously unknown coronaviruses with its help.

The coronaspades module is a special mode of the spades Assembler (Saint Petersburg Assembler) – the flagship product of the laboratory "Center for Algorithmic Biotechnology" of St. Petersburg State University, known all over the world. With the help of spades, scientists from different countries analyze the pathogens that caused the outbreak of Middle East Respiratory Syndrome (MERS) in Saudi Arabia, Ebola in the Congo, gonorrhea in England, meningitis in Ghana, dengue fever in Sumatra and dozens of other outbreaks.

The spades collector and its various modes of operation make it possible to decode the genomes of living organisms, including viruses. The fact is that biologists still don't know how to read genomes the same way we read a book: from beginning to end. Instead, they "read" small fragments, which they then assemble into a full text. Therefore, assembling the genome is not much different from assembling a puzzle of a million parts. This task belongs to one of the most complex algorithmic problems in bioinformatics, and in order to solve it, it is necessary to use special tools – genomic collectors.

"We were prompted to create the coronaspades module by the requests of the scientific community. We received numerous questions from various laboratories about how best to collect RNA viruses using the spades family of utilities. One of these centers is the European Institute of Bioinformatics (EMBL-EBI), with which we have a joint grant from the Russian Foundation for Basic Research, and a community of scientists working on the search for new corona and other viruses in public data within the framework of the scientific collaboration Serratus. Since the existing modules of the spades collector do not give a tangible advantage over competing programs, the task was set to create a new module that takes into account the unique features of the structure of the coronavirus genome and sequencing data. The coronaspades collector was immediately actively used by scientists, but it is difficult for us to assess the boundaries of use, because we do not track all users. Coronaspades is an open source program that is available for download and use by everyone. According to our data, in addition to EMBL-EBI, such large research communities as Serratus, Metasub Consortium and Nextflow have shown interest in the collector," said Anton Korobeynikov, an employee of the Center for Algorithmic Biotechnology of St. Petersburg State University, one of the main authors of the new product.

The decisive role in this development belongs to Dmitry Meleshko, an employee of the Center for Algorithmic Biotechnology of St. Petersburg State University. It is also important to note that coronaspades is based on previous laboratory developments and the code base of the spades family of assemblers (metaspades, rnaspades, metaviralspades, biosyntheticspades). Without these developments, the creation of the module would be impossible.

The first version of coronaspades was developed in a couple of weeks. The test data provided by the scientific collaboration Serratus helped to complete the work in such a short time. Today, the creators of the collector are busy with its further improvement, but already now it allows you to restore the genomes of coronaviruses de novo, much more efficiently and better than alternative approaches. For example, full-size genomes of previously unknown coronaviruses were collected from some datasets, according to preliminary data.

The coronaspades module takes into account the features of RNA sequencing data, and also implements unique algorithmic solutions aimed at improving the recovery of the genome sequence of coronaviruses. Moreover, the approaches laid down in coronaspades can be used in the future to develop new collectors using information about the structure of other types of genomes.

According to Alla Lapidus, deputy director of the Center for Algorithmic Biotechnology of the Institute of Translational Biomedicine of St. Petersburg State University, several new programs have been created in the laboratory in a short time, the purpose of which is fast and high-quality processing of genomic data necessary for the analysis of viruses (and not only) that cause various diseases, and primarily coronaviruses.

"In 2020, the epidemiological situation in the world does not allow scientists and doctors to relax – they had not yet managed to cope with the coronavirus, when reports appeared about possibly a new strain of swine flu, called G4 EA H1N1," said Alla Lapidus. – To find out whether this strain is really a new or previously known seasonal strain, first of all, analysis of its genome will help. And recently there were reports of cases of bubonic plague in China caused by the bacterium Yersinia pestis. In such a difficult situation, not only the need for analytical methods increases, but also for competent specialists. This year was the first in the history of St. Petersburg State University graduation of the master's program "Bioinformatics", and I wish our graduates great scientific achievements and discoveries."

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version