26 May 2020

Sequencing of the metavirome

MetaviralSPAdes: bioinformatics have created a new collector for decoding virus genomes

St. Petersburg State University Press Service

Bioinformatics of the Center for Algorithmic Biotechnology of St. Petersburg State University and the University of California at San Diego have developed the metaviralSPAdes assembler, a new assembler that allows you to find and assemble the virus genome among many other sequences. The development will help to decode the genomes of pathogens faster and more conveniently, which means that it will make it possible to start developing test systems and vaccines against dangerous infections faster.

The article by Antipov et al. metaviralSPAdes: assembly of viruses from metagenomic data is published in the journal Bioinformatics.

When humanity encounters a new virus, biologists first of all begin to decipher its genome – this is a necessary condition for further diagnosis of the disease and the development of a vaccine. However, if sequencing needs to be performed during an outbreak of a new pathogen, a problem arises. For example, the saliva of a patient with COVID-19, which was used for the very first decoding of the SARS-CoV-2 coronavirus, contained the genomes of many other, in most cases harmless viruses. Not to mention the hundreds of bacteria that live in a person's mouth and make it difficult to find viral sequences.

This example shows how important it is to be able to solve a much more complex computational task than decoding a single genome – to collect metagenomes, sets of hundreds of different genomes of microorganisms living in the same environment. The difficulty lies in the fact that as a result of such work, thousands of sequences can be obtained, among which there will be fragments of the genetic code of both viruses and bacteria, and it is not easy to understand exactly which data relate to the desired pathogen.

In addition, scientists will inevitably face another task – the sequencing of the metavirome – the essence of which is to identify exactly the viral sequences hidden among much longer bacterial fragments. Then bioinformatics will have to literally piece together the complete genome of the virus that caused the outbreak of the disease.

Until recently, researchers did not have a special tool that would allow them to collect viral metagenomes. However , a group of Russian and American scientists from St. Petersburg State University and the University of California, San Diego have developed the metaviralSPAdes assembler, which turns the analysis of the results of metavirome sequencing into a simple task.

The laboratory "Center for Algorithmic Biotechnology" was established at St. Petersburg State University at the end of 2014. It was headed by Professor of the University of California, Candidate of Physical and Mathematical Sciences Pavel Pevsner. The laboratory's flagship product, the SPAdes genome assembly algorithm (Saint Petersburg Assembler), is used by thousands of genomics specialists around the world.

Biologists still can't read the entire genome the same way we read a book: from beginning to end. Instead, they read small fragments, so assembling the genome is not much different from assembling a puzzle of a million fragments. This task is often considered as one of the most complex algorithmic problems in bioinformatics. It is still possible to solve it: for example, the most widely used genomic assembler SPAdes (Saint Petersburg Assembler), also created by a Russian-American team of scientists, has been used today in almost 9,000 studies. With its help, scientists analyzed the pathogens that caused the outbreak of Middle East Respiratory Syndrome (MERS) in Saudi Arabia, Ebola in the Congo, gonorrhea in England, meningitis in Ghana, dengue fever in Sumatra and dozens of other outbreaks that have occurred over the past eight years since the creation of SPAdes.

Do not forget that assembling a metagenome from 1000 genomes is much more difficult than assembling a sequence of one genome. In this case, you have to deal with 1000 separate puzzles instead of one: you need to assemble a "picture", fragments of which are mixed with billions of pieces from other puzzles. To solve this problem, three years ago, the Russian-American team of scientists who created SPAdes developed the metaSPAdes assembler, which, in turn, became the leading metagenomic assembler. With its help, it has become easier to extract viral sequences from a huge amount of data, but the new generation metaviralSPAdes collector is able not only to find fragments of viral genomes, but also to assemble a ready-made "puzzle" from them – the genome of the pathogen.

The COVID-19 pandemic has become a wake-up call for biologists studying the transmission of viruses from animals to humans, and reminded how important it is to investigate various virus hosts, such as bats, who have an unprecedented immune system that allows them to coexist with a variety of pathogens capable of killing people. We need to know what bats are sick with before, not after, pandemic strikes.

Of course, conducting a census of virus genomes of a wide variety of animals is a complex computational problem. However, with metaviralSPAdes at hand, biologists can now more easily reconstruct the genomes of bat viruses or any other potential sources of future pandemics.

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version