12 February 2020

Epidemic Protocol

What can bioinformatics say about the nCoV outbreak

Georgy Bazykin, "Trinity Variant"

We are in the midst of an epidemic of a new dangerous infection, but there is little reliable information about it. Molecular epidemiology helps in this situation. The course of accumulation of mutations by an evolving virus allows us to assess important parameters of the epidemic even with inaccurate official data. The situation is changing very quickly; all estimates given below are correct as of February 9th.

How did it all start?

By comparing organisms with each other, evolutionary biologists can learn a lot about their history. Most of the material for comparisons is contained in nucleotide sequences, in the case of coronaviruses – in RNA sequences. The current outbreak is perhaps the first case of a new pathogen, when genetic data is available almost from the very beginning. The first sequence of the "new" coronavirus (nCoV) appeared in the public domain just a few weeks after the description of the first cases [1]; for comparison, in the outbreak of SARS (severe acute respiratory syndrome, also caused by coronaviruses) in 2002/2003, the first sequences had to wait many months.

Figure 1 shows the evolutionary tree of coronaviruses. As on ordinary evolutionary trees, the distance between any two branches here corresponds to the time elapsed since the divergence of two lines from their common ancestor (LCA – last common ancestor).

nCoV1.jpg

Fig. 1. The evolutionary tree of coronaviruses (from [4]). The numbers near the branches indicate their statistical reliability on a scale from 0 to 100. The strains of the Wuhan outbreak (Wuhan) are highlighted in red, as well as the strain closest to them isolated from a bat (Bat CoV RaTG13).

It can be seen that the closest relative of the group of viral strains that gave rise to this outbreak is a coronavirus isolated from a bat; about 96% of nucleotides coincide between it and epidemic strains. Such a picture may mean that the virus was transmitted from bats, although it is too early to make final conclusions about the transmission path; in the case of SARS, it took years to figure out. (Information about even more closely related strains from pangolin is currently available only in the form of a press release [2]). In any case, all available evidence suggests that the infection was obtained from a natural reservoir. A little further away are the strains of SARS, with which ~80% of the nucleotides coincide; and even further away are the strains of MERS, Middle East respiratory syndrome [3, 4]. There is no evidence of artificial recombination, insertion of unusual fragments, or any interference in the nCoV sequences; all reports to the contrary that have appeared on preprint servers in recent days have been withdrawn and/or refuted.

All nCoV lines are closer to each other than to any other known viral sequence. This apparently means that the skid was the only one. This is not always the case: for example, MERS outbreaks in different years were caused by new transfers from a natural reservoir – camels [5].

Knowing the rate of evolution (see below), it is possible to date the LCA. Apparently, it existed in late November – early December. LCA could have been in a single person who infected others afterwards, or in an animal from which several people were infected – it is difficult to establish. The first reported cases date back to early December; this means that the outbreak was detected almost immediately. Virus samples isolated from the market in Wuhan are very close to those of the earliest patients from Wuhan; this confirms that the first people were infected there.

It is unknown how mutations that distinguish nCoV from strains common in animals have changed its characteristics, and whether they have changed at all. Perhaps the jump was an accident that was not accompanied by any changes in the genetic characteristics of the virus compared to its ancestors in wild animals.

What's going on now?

Any biological objects change in a number of generations due to random mutations. Having "jumped" into humans, the virus continued to evolve, "sprouting" a branched evolutionary tree already in humans (Fig. 2) Unfortunately, only very few "leaves" of this tree are known today. The newest sequences from Wuhan date back to January 3, and 42 of the 73 sequences known today were obtained from outside China (despite the fact that 99% of confirmed cases are in China [6]). Nevertheless, studying the tree, you can understand quite a lot.

nCoV2.jpg

Fig. 2. The nCoV evolutionary tree [7] (left) and the distance between the sample and the root of the tree depending on the date of receipt of the sample.

First, it is possible to determine the rate of evolution of the virus. To do this, you need to compare the dates of infection with the number of differences in the sequence of the virus from the "ancestral". According to current estimates, the rate is about 10-3 substitutions per nucleotide per year [7, 8]; this is comparable to that of other RNA-containing viruses, for example, influenza [9]. This may mean that, as in the case of influenza, it will be difficult to create a universal vaccine that protects against all strains, and the vaccine will require periodic updates.

Secondly, it is possible to trace the transmission paths. The first reliable data on human-to-human transmission was obtained in this way. You can ask more subtle questions. How many people get infected from family members, how many at work, how many in transport, how many in hospital? How effective are quarantine measures? How exactly is the virus transferred between countries? You can try to find out from molecular data. This helps a lot with other infections for which there is more such data, for example with HIV [10].

Thirdly, it is possible to estimate the rate of spread of the virus, namely, the indicator R 0, which has become famous in recent days. R 0, or the basic reproductive number, is the number of people infected with one infected person during the entire course of the disease in a completely vulnerable population. Obviously, this is a key indicator: if R 0 is less than one, then the epidemic will decline, and if more, it will grow. Different diseases are characterized by very different values of R 0 – from 1.3 for influenza to >10 for measles. For the closest relative of nCoV – SARS – R 0 was about 3 at the beginning of the outbreak and about 0.3 at its end [11].

How to measure R 0? It is possible to build "traditional" epidemic models and estimate the rate of increase in the number of cases. Unfortunately, this is difficult to do accurately. The beginning of the outbreak is difficult to date, and at the height of the epidemic, many mild cases are deliberately not diagnosed; even the speed of diagnosis of severe cases may depend on the capacity of the healthcare system, for example, on the availability of test systems.

Evolutionary methods provide other ways to estimate the spread rate of the epidemic. The basic idea is this: the general statistical characteristics of the tree, for example, the ratio of the lengths of branches near the root to the lengths of branches near the leaves, should depend on whether the pathogen retains an approximately constant number or whether it grows or decreases. This is due to the fact that the length of the branches of a tree constructed from a sample from a certain population is determined by the size of this population: the smaller the population, the faster the branches "meet" with each other, since the probability increases that two randomly selected individuals turn out to be close relatives. A growing outbreak of the virus is determined by relatively short branches at the root and relatively long ones near the leaves.

The application of such an analysis to nCoV gives estimates of R 0 in the region of 2 or 3 [7, 8]). Of course, there are also many assumptions here, many of which are obviously incorrect: for example, that the sample of analyzed sequences is more or less random and that selection does not affect the virus. But the data obtained using different methods mutually confirm each other and allow us to say that R 0, apparently, is somewhere in the range from 2.2 to 3.3 [12].

What could be next?

In recent days, works have appeared in which the development of the epidemic is modeled under various parameters [13, 14, 15]. It is important to understand that these models are not forecasts. Firstly, the accuracy with which R0 is estimated is completely insufficient. At R 0~3, in the absence of any measures and without pre-existing immunity, an outbreak, for example, in a city of ten million will be rapid and acute, will peak in two to three months, and at the peak tens of percent of the population will be infected simultaneously [15]. If R 0 is below two, then the peak will stretch for many months and will be blurred (Fig. 3).

nCoV3.jpg

Fig. 3. The number of infected in the simplest deterministic SIR model [16]. Parameters: N=107, γ=0,119.

Secondly, it is almost impossible to model factors such as the development of vaccines, the effectiveness of available drugs (which remains virtually unknown), as well as measures taken to reduce the rate of spread.

Thirdly, R 0 itself says little about how serious the epidemic will be: The R 0 for rhinoviruses that cause the common cold is ~6, but they are not such a significant global health problem. The key parameters that remain unknown are the proportion of severe and fatal cases. Will it make up ~0.1% of the total number of infected, as for the annual seasonal flu, 2.5%, as for the Spanish flu, or 10%, as for SARS? These scenarios will be very different. The data available today do not allow us to say which of them will come true.

Literature:

  1. Wuhan seafood market pneumonia virus isolate Wuhan-Hu-1, complete genome. (2020).
  2. Cyranoski D. Did pangolins spread the China coronavirus to people? Nature (2020) doi: 10.1038/d41586-020-00364-2.
  3. Lu R. et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The Lancet 10 (2020).
  4. Zhou P. et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 1–4 (2020) doi: 10.1038/s41586-020-2012-7.
  5. Zhang Z., Shen L. & Gu X. Evolutionary Dynamics of MERS-CoV: Potential Recombination, Positive Selection and Transmission. Sci Rep 6, (2016).
  6. Coronavirus 2019-nCoV global cases by Johns Hopkins CSSE.
  7. Rambaut A. Phylodynamic Analysis | 67 genomes | 08 Feb 2020. Virological (2020).
  8. Bedford T. Nextstrain / narratives / ncov / sit-rep / 2020-01-30 (2020).
  9. Peck K. M. & Lauring, A. S. Complexities of Viral Mutation Rates. Journal of Virology 92, (2018).
  10. Poon A. F. Y. et al. Near real-time monitoring of HIV transmission hotspots from routine HIV genotyping: an implementation case study. Lancet HIV 3, e231-238 (2016).
  11. World Health Organization. Consensus document on the epidemiology of severe acute respiratory syndrome (SARS). (2003).
  12. ncov-R0. Google Docs.
  13. Read J. M., Bridgen J. R., Cummings D. A., Ho A. & Jewell C. P. Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions. medRxiv 2020.01.23.20018549 (2020) doi: 10.1101/2020.01.23.20018549.
  14. Riou J. & Althaus C. L. Pattern of early human-to-human transmission of Wuhan 2019-nCoV. bioRxiv 2020.01.23.917351 (2020) doi: 10.1101/2020.01.23.917351.
  15. Wu J. T., Leung K. & Leung, G. M. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. The Lancet (2020) doi: 10.1016/S0140-6736(20)30260-9.
  16. Compartmental models in epidemiology. Wikipedia (2020).

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version