繁体中文  
 
版主:x-file
 · 九阳全新免清洗型豆浆机 全美最低
 
On the origin and continuing evolution of SARS-CoV
送交者:  2020年03月04日20:47:18 于 [世界游戏论坛] 发送悄悄话

RESEARCH ARTICLE MICROBIOLOGY 

On the origin and continuing evolution of SARS-CoV-2 

Xiaolu Tang1,7 , Changcheng Wu1,7 , Xiang Li2,3,4,7 , Yuhe Song2,5,7 , Xinmin Yao1 , Xinkai Wu1 , Yuange Duan1 , Hong Zhang1 , Yirong Wang1 , Zhaohui Qian6 , Jie Cui2,3,*, and Jian Lu1,* 1. State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, 100871, China 2. CAS Key Laboratory of Molecular Virology & Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, China 3. Center for Biosafety Mega-Science, Chinese Academy of Sciences, China 4. University of Chinese Academy of Sciences, China 5. School of Life Sciences, Shanghai University, China 6. NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 7. These authors contributed equally to this work. *Corresponding authors: Jian Lu, Email: LUJ@pku.edu.cn Jie Cui, Email: jcui@ips.ac.cn Downloaded from https://academic.oup.com/nsr/advance-article-abstract/doi/10.1093/nsr/nwaa036/5775463 by guest on 04 March 2020 


ABSTRACT The SARS-CoV-2 epidemic started in late December 2019 in Wuhan, China, and has since impacted a large portion of China and raised major global concern. Herein, we investigated the extent of molecular divergence between SARS-CoV-2 and other related coronaviruses. Although we found only 4% variability in genomic nucleotides between SARS-CoV-2 and a bat SARS-related coronavirus (SARSr-CoV; RaTG13), the difference at neutral sites was 17%, suggesting the divergence between the two viruses is much larger than previously estimated. Our results suggest that the development of new variations in functional sites in the receptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin SARSr-CoVs are likely caused by mutations and natural selection besides recombination. Population genetic analyses of 103 SARS-CoV-2 genomes indicated that these viruses evolved into two major types (designated L and S), that are well defined by two different SNPs that show nearly complete linkage across the viral strains sequenced to date. Although the L type (~70%) is more prevalent than the S type (~30%), the S type was found to be the ancestral version. Whereas the L type was more prevalent in the early stages of the outbreak in Wuhan, the frequency of the L type decreased after early January 2020. Human intervention may have placed more severe selective pressure on the L type, which might be more aggressive and spread more quickly. On the other hand, the S type, which is evolutionarily older and less aggressive, might have increased in relative frequency due to relatively weaker selective pressure. These findings strongly support an urgent need for further immediate, comprehensive studies that combine genomic data, epidemiological data, and chart records of the clinical symptoms of patients with coronavirus disease 2019 (COVID-19). Keywords: SARS-CoV-2, virus, molecular evolution, population genetics Received: 25-Feb-2020; Revised: 28-Feb-2020; Accepted: 29-Feb-2020. Downloaded from https://academic.oup.com/nsr/advance-article-abstract/doi/10.1093/nsr/nwaa036/5775463 by guest on 04 March 2020 


INTRODUCTION The coronavirus disease 2019 (COVID-19) epidemic started in late December 2019 in Wuhan, the capital of Central China's Hubei Province. Since then, it has rapidly spread across China and in other countries, raising major global concerns. The etiological agent is a novel coronavirus, SARS-CoV-2, named for the similarity of its symptoms to those induced by the severe acute respiratory syndrome. As of February 28, 2020, 78,959 cases of SARS-CoV-2 infection have been confirmed in China, with 2,791 deaths. Worryingly, there have also been more than 3,664 confirmed cases outside of China in 46 countries and areas (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/), raising significant doubts about the likelihood of successful containment. Further, the genomic sequences of SARS-CoV-2 viruses isolated from a number of patients share sequence identity higher than 99.9%, suggesting a very recent host shift into humans [1-3]. Coronaviruses are naturally hosted and evolutionarily shaped by bats [4, 5]. Indeed, it has been postulated that most of the coronaviruses in humans are derived from the bat reservoir [6, 7]. Unsurprisingly, several teams have recently confirmed the genetic similarity between SARS-CoV-2 and a bat betacoronavirus of the sub-genus Sarbecovirus [8-13]. The whole-genome sequence identity of the novel virus has 96.2% similarity to a bat SARS-related coronavirus (SARSr-CoV; RaTG13) collected in Yunnan province, China [2, 14], but is not very similar to the genomes of SARS-CoV (about 79%) or MERS-CoV (about 50%) [1, 15]. It has also been confirmed that the SARS-CoV-2 uses the same receptor, the angiotensin converting enzyme II (ACE2), as the SARS-CoV [11]. Although the specific route of transmission from natural reservoirs to humans remains unclear [5, 13], several studies have shown that pangolins may have provided a partial spike gene to SARS-CoV-2; the critical functional sites in the spike protein of SAR-CoV-2 are nearly identical to one identified in a virus isolated from a pangolin [16-18]. Despite these recent discoveries, several fundamental issues related to the evolutionary patterns and driving forces behind this outbreak of SARS-CoV-2 remain unexplored [19]. Herein, we investigated the extent of molecular divergence between SARS-CoV-2 and other related coronaviruses and carried out population genetic analyses of 103 sequenced genomes of SARS-CoV-2. This work provides new insights into the factors driving the evolution of SARS-CoV-2 and its pattern of spread through the human population. RESULTS Molecular phylogeny and divergence between SARS-CoV-2 and related coronaviruses. For each annotated ORF in the reference genome of SARS-CoV-2 (NC_045512), we extracted the orthologous sequences in human SARS-CoV, four bat Downloaded from https://academic.oup.com/nsr/advance-article-abstract/doi/10.1093/nsr/nwaa036/5775463 by guest on 04 March 2020 SARS-related coronaviruses (SARSr-CoV: RaTG13, ZXC21, ZC45, and BM48-31), one Pangolin SARSr-CoV from Guangdong (GD) [17], and six Pangolin SARSr-CoV genomes from Guangxi (GX) [18] (Table S1). We aligned the coding sequences (CDSs) based on the protein alignments (see Materials and Methods). Most ORFs annotated from SARS-CoV-2 were found to be conserved in other viruses, except for ORF8 and ORF10 (Table 1). The protein sequence of SARS-CoV-2 ORF8 shared very low similarity with sequences in SARS-CoV and BM48-31, and ORF10 had a premature stop codon in both SARS-CoV and BM48-31 (Fig. S1). A one-base deletion caused a frame-shift mutation in ORF10 of ZXC21 (Fig. S1). To investigate the phylogenetic relationships between these viruses at the genomic scale, we concatenated coding regions (CDSs) of the nine conserved ORFs (orf1ab, E, M, N, S, ORF3a, ORF6, ORF7a, and ORF7b) and reconstructed the phylogenetic tree using the synonymous sites (Fig. 1A). We also used CODEML in the PAML [20] to infer the ancestral sequence of each node and calculated the dN (nonsynonymous substitutions per nonsynonymous site), dS (synonymous substitutions per synonymous site), and dN/dS (ω) values for each branch (Fig. 1A). In parallel, we also calculated the pairwise dN, dS, and ω values between SARS-CoV-2 and another virus (Table 1). The genome-wide phylogenetic tree indicated that SARS-CoV-2 was closest to RaTG13, followed by GD Pangolin SARSr-CoV, then by GX Pangolin SARSr-CoVs, then by ZC45 and ZXC21, then by human SARS-CoV, and finally by BM48-31(Fig. 1A). Notably, we found that the nucleotide divergence at synonymous sites between SARS-CoV-2 and other viruses was much higher than previously anticipated. For example, although the overall genomic nucleotides overall differ ~4% between SARS-CoV-2 and RaTG13, the genomic average dS was 0.17, which means the divergence at the neutral sites is 17% between these two viruses (Table 1). This is because the nonsynonymous sites are usually under stronger negative selection than synonymous sites, and calculating sequence differences without separating these two classes of sites may underestimate the extent of molecular divergence by several folds. Notably, the dS value varied considerably across genes in SARS-CoV-2 and the other viruses analyzed. In particular, the spike gene (S) consistently exhibited larger dS values than other genes (Table 1). This pattern became clear when we calculated the dS value for each branch in Fig. 1A for the spike gene versus the concatenated sequences of the remaining genes (Fig. S2). In each branch, the dS of spike was 2.22 ± 1.35 (mean ± SD) times as large as that of the other genes. This extremely elevated dS value of spike could be caused either by a high mutation rate or by natural selection that favors synonymous substitutions. Synonymous Downloaded from https://academic.oup.com/nsr/advance-article-abstract/doi/10.1093/nsr/nwaa036/5775463 by guest on 04 March 2020 substitutions may serve as another layer of genetic regulation, guiding the efficiency of mRNA translation by changing codon usage [21]. If positive selection is the driving force for the higher synonymous substation rate seen in spike, we expect the frequency of optimal codons (FOP) of spike to be different from that of other genes. However, our codon usage bias analysis (Table S2) suggests the FOP of spike was only slightly higher than that of the genomic average (0.717 versus 0.698, see Materials and Methods). Thus, we believe that the elevated synonymous substitution rate measured in spike is more likely caused by higher mutational rates; however, the underlying molecular mechanism remains unclear. Both SARS-CoV and SARS-CoV-2 bind to ACE2 through the RBD of spike protein in order to initiate membrane fusion and enter human cells [1, 2, 22-26]. Five out of the six critical amino acid (AA) residues in RBD were different between SARS-CoV-2 and SARS-CoV (Fig. 1B), and a 3D structural analysis indicated that the spike of SARS-CoV-2 has a higher binding affinity to ACE2 than SARS-CoV [23]. Intriguingly, these same six critical AAs are identical between GD Pangolin-CoV and SARS-CoV-2 [16]. In contrast, although the genomes of SARS-CoV-2 and RaTG13 are more similar overall, only one out of the six functional sites are identical between the two viruses (Fig. 1B). It has been proposed that the SARS-CoV-2 RBD region of the spike protein might have resulted from recent recombination events in pangolins [16-18]. Although several ancient recombination events have been described in spike [27, 28], it also seems likely that the identical functional sites in SARS-CoV-2 and GD Pangolin-CoV may actually the result of coincidental convergent evolution [18]. If the functional AA residues in the SARS-CoV-2 RBD region were acquired from GD Pangolin-CoV in a very recent recombination event, we would expect the nucleotide sequences of this region to be nearly identical between the two viruses. However, for the CDS sequences that span five critical AA sites in the SARS-CoV-2 spike (ranging from codon 484 to 507, covering five adjacent functional sites: F486, Q493, S494, N501, and Y505; Fig. S3), we estimated dS = 0.411, dN = 0.019, and ω= 0.046 between SARS-CoV-2 and GD Pangolin-CoV. By assuming the synonymous substitution rate (u) of 1.67-4.67 x 10-3 /site/year, as estimated in SARS-CoV [29], the recombination/introgression, if it occurred at all, would be estimated to happen approximately 19.8-55.4 years ago. Here, the formula was used to calculate divergence time; note that the increased mutational rate of spike was considered for this calculation. Thus, it seems very unlikely that SARS-CoV-2 originated from the GD Pangolin-CoV due to a very recent recombination event. Alternatively, it seems more likely that a high mutation rate in spike, coupled with strong natural selection, has shaped the identical functional AA residues between these two viruses, as proposed previously [18]. Although these sites are maintained in SARS-CoV-2 and GD Downloaded from https://academic.oup.com/nsr/advance-article-abstract/doi/10.1093/nsr/nwaa036/5775463 by guest on 04 March 2020 Pangolin-CoV, mutations may have changed the residues in the RaTG13 lineage after it diverged from SARS-CoV-2 (the blue arrow in Fig. 1A). In summary, it seems that the shared identity of critical AA sites between SARS-CoV-2 and GD Pangolin-CoV might be due to random mutations coupled with natural selection, and not necessarily recombination. Selective constraints and positive selection during the evolution of SARS-CoV-2 and related coronaviruses The genome-wide ω value between SARS-CoV-2 and other viruses ranged from 0.044 to 0.124 (Table 1), indicative of strong negative selection on the nonsynonymous sites. In other words, 87.6% to 95.6% of the nonsynonymous mutations were removed by negative selection during viral evolution. To determine the extent of positive selection, we concatenated the CDS sequences of 9 conserved ORFs in all the viruses in Fig. 1A and fitted the M7 (beta: neutral and negative selection) and M8 (beta + ω>1:neutral, negative selection, and positive selection) model using CODEML (Materials and Methods). The M8 model (lnL = -104,813.732, np =18) was a significantly better fit than the M7 (lnL = -105,063.284, np = 16) model (P < 10-10), suggesting that some AA substitutions were favored by positive Darwinian selection (but not necessarily in the SARS-CoV-2 lineage).Under the M8 model, 98.48% (p0) of the nonsynonymous substitutions were estimated under neutral evolution or purifying selection (0⩽ω⩽1), and 1.52% (p1) of the nonsynonymous substitutions were under positive selection (ω = 1.50). A Bayes Empirical Bayes (BEB) analysis suggested that 10 AA sites showed strong signals of positive selection, and, interestingly, three of those were located in the RBD of spike, including at one critical site (Fig. 1C and Fig. S4). Thus, although these coronaviruses were generally under very strong negative selection, positive selection was also responsible for the evolution of protein sequences. The putatively positively-selected sites might serve as candidates for further functional studies. Mutations in 103 SARS-CoV-2 genomes We downloaded 103 publicly available SARS-CoV-2 genomes, aligned the sequences, and identified the genetic variants. For ease of visualization, we marked each virus strain based on the location and date the virus was isolated with the format of "Location_Date” throughout this study (see Table S1 for details; Each ID did not contain information of the patient's race or ethnicity). Although SARS-CoV-2 is an RNA virus, for simplicity, we presented our results based on DNA sequencing results throughout this study (i.e., the nucleotide T (thymine) means U (uracil) in SARS-CoV-2). For each variant, the ancestral state was inferred based on the genome and CDS alignments of SARS-CoV-2 (NC_045512), RaTG13, and GD Pangolin-CoV (Materials and Methods). In total, we identified mutations in 149 sites across the 103 sequenced strains. Ancestral states for 43 synonymous, 83 non-synonymous, and two stop-gain mutations were unambiguously inferred. The frequency spectra of synonymous and nonsynonymous mutations are shown in Fig. 2. Downloaded from https://academic.oup.com/nsr/advance-article-abstract/doi/10.1093/nsr/nwaa036/5775463 by guest on 04 March 2020 Most derived mutations were singletons (67.4% (29/43) of synonymous mutations and 84.3% (70/83) of nonsynonymous mutations), indicating either a recent origin [30] or population growth [31]. In general, the derived alleles of synonymous mutations were significantly skewed towards higher frequencies than those of nonsynonymous ones (P < 0.01, Wilcoxon rank-sum test; Fig. 2), suggesting the nonsynonymous mutations tended to be selected against. However, 16.3% (7 out of 43) synonymous mutations, and one nonsynonymous (ORF8 (L84S, 28,144)) mutation had a derived frequency of ≥ 70% across the SARS-CoV2 strains. The nonsynonymous mutations that had derived alleles in at least two SARS-CoV-2 strains affected six proteins: orf1ab (A117T, I1607V, L3606F, I6075T), S (H49Y, V367F), ORF3a (G251V), ORF7a (P34S), ORF8 (V62L, S84L), and N (S194L, S202N, P344S). Two major types of SARS-CoV-2 are defined by two SNPs that show complete linkage To detect the possible recombination among SARS-CoV2 viruses, we used Haploview [32] to analyze and visualize the patterns of linkage disequilibrium (LD) between variants with minor alleles in at least two SARS-CoV-2 strains (Fig. 3A). Since most mutations were at very low frequencies, it is not surprising that many pairs had a very low r 2 or LOD value (Fig. 3B-C). Consistent with another recent report [31], we did not find evidence of recombination between the SARS-CoV2 strains. However, we found that SNPs at location 8,782 (orf1ab: T8517C, synonymous) and 28,144 (ORF8: C251T, S84L) showed significant linkage, with an r 2 value of 0.954 (Fig. 3B, red) and a LOD value of 50.13 (Fig. 3C, red). Among the 103 SARS-CoV-2 virus strains, 101 of them exhibited complete linkage between the two SNPs: 72 strains exhibited a “CT” haplotype (defined as “L” type because T28,144 is in the codon of Leucine) and 29 strains exhibited a “TC” haplotype (defined as “S” type because C28,144 is in the codon of Serine) at these two sites. Thus, we categorized the SARS-CoV-2 viruses into two major types, with L being the major type (~70%) and S being the minor type (~30%). The evolutionary history of L and S types of SARS-CoV-2 Although we defined the L and S types based on two tightly linked SNPs, strikingly, the separation between the L (blue) and S (red) types was maintained when we reconstructed the haplotype networks using all the SNPs in the SARS-CoV-2 genomes (Fig. 4A; the number of mutations between two neighboring haplotypes was inferred parsimoniously). This analysis further supports the idea that the two linked SNPs at sites 8,782 and 28,144 adequately define the L and S types of SARS-CoV-2. Downloaded from https://academic.oup.com/nsr/advance-article-abstract/doi/10.1093/nsr/nwaa036/5775463 by guest on 04 March 2020 To determine whether L or S type is ancestral, we examined the genomic alignments of SARS-CoV-2 and other highly related viruses. Strikingly, nucleotides of the S type at sites 8,782 and 28,144 were identical to the orthologous sites in the most closely related viruses (Fig. 4B). Remarkably, both sites were highly conserved in other viruses as well. Hence, although the L type (~70%) was more prevalent than the S type (~30%) in the SARS-CoV-2 viruses we examined, the S type is actually the ancestral version of SARS-CoV-2. To further examine the relationship among the strains in the L and S types, we reconstructed a phylogenetic tree of all the 103 SARS-CoV-2 viruses based on their whole-genome sequences. Our phylogenetic tree also clearly shows the separation of the two types (Fig. 5). Viruses of the L type (blue) first clustered together, and likewise, viruses of the S type (red) were also more closely related to each other. Therefore, our whole-genome comparisons further confirm the separation of the L and S types. Thus far, we found that, although the L type is derived from the S type, L (~70%) is more prevalent than S (~30%) among the sequenced SARS-CoV-2 genomes we examined. This pattern suggests that L has a higher transmission rate than the S type. Furthermore, our mutational load analysis indicated that the L type had accumulated a significantly higher number of derived mutations than S type (P < 0.0001, Wilcoxon rank-sum test; Fig. S5). We propose that, although the L type newly evolved from the ancient S type, it transmits faster or replicates faster in human populations, causing it to accumulate more mutations than the S type. Thus, our results suggest the L might be more aggressive than the S type due to the potentially higher transmission and/or replication rates. To test whether the two types of SARS-CoV-2 had differences in temporal and spatial distributions, we stratified the viruses based on the locations and dates they were isolated (Table S1). Among the 27 viruses isolated from Wuhan, 26 (96.3%) were L type, and only 1 (3.7%) was S type. However, among the other 73 viruses isolated outside Wuhan, 45 (61.6%) were L type, and 28 (38.4%) were S type. This comparison suggests that the L type is significantly more prevalent in Wuhan than in other places (P = 0.0004, Fisher’s exact test, Fig. 6 and Table S3). All of the 26 samples isolated before January 7, 2020, were from Wuhan, and among the 74 samples collected from January 7, 2020, only one was from Wuhan, 33 were from other places in China, and 40 were from patients outside China. Thus, it is not surprising that the L type was significantly more prevalent before January 7, 2020 (96.2%, 25 L and 1 S) than after January 7, 2020 (62.2%, 46 L and 28 S) (P = 0.0008, Fisher’s exact test, Fig. 6 and Table S3). Downloaded from https://academic.oup.com/nsr/advance-article-abstract/doi/10.1093/nsr/nwaa036/5775463 by guest on 04 March 2020 If the L type is more aggressive than the S type, why did the relative frequency of the L type decrease compared to the S type in other places after the initial breakout in Wuhan? One possible explanation is that, since January 2020, the Chinese central and local governments have taken rapid and comprehensive prevention and control measures. These human intervention efforts might have caused severe selective pressure against the L type, which might be more aggressive and spread more quickly. The S type, on the other hand, might have experienced weaker selective pressure by human intervention, leading to an increase in its relative abundance among the SARS-CoV-2 viruses. Thus, we hypothesized that the two types of SARS-CoV-2 viruses might have experienced different selective pressures due to different epidemiological features. Of note, the above analyses were based on very patchy SARS-CoV-2 genomes that were collected from different locations and time points. More comprehensive genomic data is required for further testing of our hypothesis. Heteroplasmy of SARS-CoV-2 viruses in patients It is currently unclear how the L type specifically evolved from the S type during the development of SARS-CoV-2. However, we found that the sequence of viruses isolated from one patient that lived in the United States on January 21 (USA_2020/01/21.a, GISAID ID: EPI_ISL_404253) had the genotype Y (C or T) at both positions 8,782 and 28,144, differing from the general trend of having either C or T. Although novel mutations could lead to this result, the most parsimonious explanation is that this patient may have been infected by both the L and S types (Fig. 7A). The sample of USA_2020/01/21.a was collected from a 63-year-old female patient living in Chicago (from GISAID). Based on the report from the United States Centers for Disease Control and Prevention (https://www.cdc.gov/media/releases/2020/p0124-second-travel-coronavirus.html), we inferred this patient returned to the United States from Wuhan on January 13, 2020. However, whether the co-existence of L and S types in this patient was due to multiple-time infections during her visit to Wuhan is currently unclear. Notably, the viruses identified from a patient in Australia on January 28, 2020 (Australia_2020/01/28.a, GISAID ID: EPI_ISL_407894) had multiple degenerate nucleotides. This sample was collected from a 44-year-old male patient in Gold Cost, Australia (from GISAID). Based on the report from the Courier Mail (January 30, 2020), we inferred this patient had the history of traveling from Wuhan to the Gold Coast before the diagnosis of infection. As shown in Fig. 7B, we inferred this patient might have been infected by at least two different strains of SARS-CoV-2 (Fig. 7B). To further investigate the heteroplasmy of SARS-CoV-2 viruses in patients, we searched 12 deep-sequencing libraries of SARS-CoV-2 genomes that were deposited in the Sequence Read Archive (SRA) (Table S4, Materials and Methods). We found 17 genomic sites that showed evidence of heteroplasmy of SARS-CoV-2 virus in five patients, but we did not find Downloaded from https://academic.oup.com/nsr/advance-article-abstract/doi/10.1093/nsr/nwaa036/5775463 by guest on 04 March 2020 any other instances of the co-existence of L and S types in any patient (Table 2). These findings evince the developing complexity of the evolution of SARS-CoV-2 infections. Further studies investigating how the different alleles of SARS-CoV-2 viruses compete with each other will be of significant value. DISCUSSION In this study, we investigated the patterns of molecular divergence between SARS-CoV-2 and other related coronaviruses. Although the genomic analyses suggested that SARS-CoV-2 was closest to RaTG13, their difference at neutral sites was much higher than previously realized. Our results provide novel insights into tracing the intermediate natural host of SARS-CoV-2. With population genetic analyses of 103 genomes of SARS-CoV-2, we found that SARS-CoV-2 viruses evolved into two major types (L and S types), and the two types were well defined by just two SNPs that show nearly complete linkage across SARS-CoV-2 strains. Although the L type (~70%) was more prevalent than the S type (~30%) in the SARS-CoV-2 viruses we examined, our evolutionary analyses suggested the S type was most likely the more ancient version of SARS-CoV-2. Our results also support the idea that the L type is more aggressive than the S type. Since nonsynonymous sites are usually under stronger negative selection than synonymous sites, calculating sequence differences without separating these two classes of sites could lead to a potentially significant underestimate of the degree of molecular divergence. For example, although the overall nucleotides only differed by ~4% between SARS-CoV-2 and RaTG13, the genomic average dS value, which is usually a neutral proxy, was 0.17 between these two viruses (Table 1). Of note, the genome-wide dS value is 0.012 between humans and chimpanzees [33], and 0.08 between humans and rhesus macaques [34]. Thus, the neutral molecular divergence between SARS-CoV-2 and RaTG13 is 14 times larger than that between humans and chimpanzees, and twice as large as that between humans and macaques. The genomic average dS value between SARS-CoV-2 and GD Pangolin-CoV is 0.475, which is comparable to that between humans and mice (0.5) [35], and the dS value between SARS-CoV-2 and GX Pangolin-Cov is even larger (0.722). The scale of these measures suggests that we should perhaps consider the difference in the neutral evolving site rather than the difference in all nucleotide sequences when tracing the origin and natural intermediate host of SARS-CoV-2. Our analyses of molecular evolution and population genetics suggested that some amino acid changes might be favored by natural selection during the evolution of SARS-CoV-2 and other related viruses. However, negative selection appears to be the predominant force acting on these viruses. Interestingly, the virus isolated from one patient in Shenzhen on January 13, Downloaded from https://academic.oup.com/nsr/advance-article-abstract/doi/10.1093/nsr/nwaa036/5775463 by guest on 04 March 2020 2020 (SZ_2020/01/13.a, GISAID ID: EPI_ISL_406592) had C at both positions 8,782 and 28,144 in the genome, belonging to neither L nor S type (Fig. 4A and 5). Notably, this strain had one stop-gain mutation in orf1ab and had accumulated 20 silent and 5 nonsynonymous mutations after diverging from the ancestor haplotype (Fig. 4A). Thus, it is possible that functional constraints on the genomic sequence were weakened after the disruption of orf1ab in this strain. Notably, on viruses isolated from a patient living in South Korean (Skorea_2020/01.a, GISAID: EPI_ISL_411929), acquired six nonsynonymous mutations that were different from the most recent common ancestor of SARS-CoV-2: orf1ab (M902I and T6891M), S (S221W), ORF3a (W128L and G251V), and E (L37H). If these changes are not due to sequencing errors, it would be interesting to test whether and how these mutations affect the transmission and pathogenesis of SARS-CoV-2. In this work, we propose that SARS-CoV-2 can be divided into two major types (L and S types): the S type is ancestral, and the L type evolved from S type. Intriguingly, the S and L types can be clearly defined by just two tightly linked SNPs at positions 8,782 (orf1ab: T8517C, synonymous) and 28,144 (ORF8: C251T, S84L). However, it is currently unclear whether L type evolved from the S type in humans or in the intermediate hosts. It is also unclear whether the L type is more virulent than the S type. orf1ab, which encodes replicase/transcriptase, is required for viral genome replication and might also be important for viral pathogenesis [36]. Although the T8517C mutation in orf1ab does not change the protein sequence (it changes the codon AGT (Ser) to AGC (Ser)), we hypothesized this mutation might affect orf1ab translation since AGT is preferred while AGC is unpreferred (Table S2). ORF8 promotes the expression of ATF6, the ER unfolded protein response factor, in human cells [37]. Thus, it will be interesting to investigate the function of the S84L AA change in ORF8, as well as the combinatory effect of these two mutations in SARS-CoV-2 pathogenesis. In summary, our analyses of 103 sequenced SARS-CoV-2 genomes suggest that the L type is more aggressive than the S type and that human interference may have shifted the relative abundance of L and S type soon after the SARS-CoV-2 outbreak. As previously noted [19], the data examined in this study are still very limited, and follow-up analyses of a larger set of data are needed to have a better understanding of the evolution and epidemiology of SARS-CoV-2. There is a strong need for further immediate, comprehensive studies that combine genomic data, epidemiological data, and chart records of the clinical symptoms of patients with SARS-CoV-2. 

0%(0)
0%(0)
标 题 (必选项):
内 容 (选填项):
实用资讯
北美最大最全的折扣机票网站
美国名厂保健品一级代理,花旗参,维他命,鱼油,卵磷脂,30天退货保证.买百免邮.
一周点击热帖 更多>>
一周回复热帖
历史上的今天:回复热帖
2019: 孟晚舟的律师对加拿大政府提起诉讼
2018: 黑母鸭是抬轿子的,x-file跟她一伙?
2018: 金冲及:毛周伟大政治家所考虑的问题,
2017: 郑强石油大学演讲 (踩易中天最好玩)
2017: 从巢居、穴居的迁徙与文明中心变化的研
2016: ipad/iphone: pptp/l2tp/ipsec setting
2016: How to use free vpn- WhatsVPN