First report of COVID-19 in South America
We report genomic analysis of a sample retrieved from a nasopharyngeal swab from a 61 years-old male patient (SPBR1) from Sao Paulo city, southeast Brazil, who received a confirmed diagnosis of severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) on the 26 February 2020. The patient resides in the south of Sao Paulo city and traveled to Lombardia, North of Italy, between the 9 and 21 February 2020. Upon returning to Brazil, the patient presented with respiratory symptoms such as fever, dry cough, sore throat, and runny nose.
Diagnostic screening for COVID-19 was initially conducted at the Hospital Albert Einstein, São Paulo, on the 25 February 2020. Following national guidelines from the Brazilian Ministry of Health, a sample was sent to Instituto Adolfo Lutz, São Paulo, regional reference laboratory and a confirmation diagnostic was conducted using real-time RT-PCR protocol to detect SARS-CoV-2 240. RT-PCR results were communicated on the 26 February 2020. Sample from patient SPBR1 had a cycle threshold value of 30.
Currently, there is only one genome sequence available from Italy (GISAID accession number EPI_ISL_410546). The sequence is from a female tourist from Hubei Province that visited Rome around 31 January, 2020, and does not represent local transmission in the city. The first cases of local transmission in Italy were reported on the 21 February, 2020, in Lombardy, northern Italy (Figure 1). However, no genome data exists from these cases yet. Here we present the first complete virus genome analysis from patient SPBR1 that acquired infection in Lombardia.
Figure 1 . Number of daily notified COVID-19 infections in Italy and summary of the travel history of patient SPBR1. Case counts were compiled from the World Health Organization situation reports 34.
A complete virus genome of the positive case was generated by the CADDE project 96 at the Instituto Adolfo Lutz, São Paulo, using the novel coronavirus sequencing and bioinformatics protocols developed by the ARTIC network 51. Sequencing protocol, sequencing multiplex primers and bioinformatic protocols are described in detail in https://artic.network/ncov-2019 190. cDNA synthesis was conducted in duplicate for the BR1 sample. The concentration of PCR products was measured using a Qubit dsDNA dsDNA High Sensitivity kit on a Qubit 3.0 fluorometer (ThermoFisher). Concentrations were as follows: Aliquot 1: Pool A - 4,08ng/uL, Pool B - 11,0ng/uL; Aliquot 2: Pool A - 11,6ng/uL, Pool B - 9,58ng/uL. Library DNA concentrations were as follows: Library 1 - 15,3ng/uL; Library 2 - 14,8ng/uL. The presence of correctly-sized bands were confirmed on an E-Gel electrophoresis machine.
Library preparation was conducted without a barcoding step and libraries were sequenced on a R9.4.1 flow cell. Sequencing of library 1 was conducted in MinKNOW version 19.10.1 for 14 hours. A total of 11.79 million reads were generated. Open source software RAMPART version 10.5 26 was used to assign and map reads in real-time. Raw files were basecalled with Guppy, demultiplexed and trimmed with Porechop and mapped against reference strain Wuhan-Hu-1 (GenBank accession number MN908947). Genome coverage of 96% was obtained by considering regions covered with >20 reads. Variants were called using nanopolish 0.11.3 and accepted if they had a log-likelihood score of greater than 200. Low coverage regions are masked with N characters. Example read data is available for inspection from https://cadde.s3.climb.ac.uk/covid-19/BR1.sorted.bam 87. It is envisaged that the genome will be updated over time and become more complete as more data is collected during the run, which is ongoing.
To put our new Brazil/SPBR1/2020 genome in context, we appended it to a total of 127 complete genomes from 17 different countries where COVID-19 cases have been reported. Multiple sequence alignment was conducted using MAFFT 12 software and manually curated. Final size of the aligned dataset was 29870 base pairs. A maximum likelihood (ML) phylogenetic analysis was conducted using PhyML version 3.0 14 using a general time-reversible nucleotide substitution model with a proportion of invariant sites. Branch support was measured using an approximate likelihood ratio (aLRT) test. Additional reconstructions were conducted with BEAST version 1.10 8.
Preliminary genetic analysis indicates that the Brazil/SPBR1/2020 genome differs by three mutations to the Wuhan-Hu-1 reference strain. Two of these mutations are shared with its closest sequence, strain Germany/BavPat1/2020, a strain recovered from a male patient pertaining to a transmission cluster from Munich, Bavaria, Germany (strain Germany/BavPat1/2020 with GISAID accession ID EPI_ISL_406862).
Figure 2 . Maximum likelihood tree of SARS-CoV-2. Node aLTR support above 0.80 is indicated by a filled circle.The clade pertaining to the Brazil/SPBR1/2020 sequence is zoomed and highlighted on the right hand-side. Scale bar is in substitutions per nucleotide site.
The estimated phylogeny consistently places the Brazil/SPBR1/2020 in a strongly supported cluster with the Germany/BavPat1/2020 strain (aLTR support = 0.92) (Figure 2). The Brazil/SPBR1/2020 genome has 1 unique C to T mutation in the nsp11 coding region in relation to Germany/BavPat1/2020 genome.
In conclusion, we present raw data, consensus genome sequence and preliminary analysis from the first case reported in South America, confirmed in Sao Paulo, Brazil, on the 26 February 2020. We find that the generated sequence is most closely related to a virus genome from Europe. Additional data from Germany and Italy will be important to understand the origins and dynamics of the virus in Italy. Continued monitoring of new suspected cases will be critical to monitor new virus importations in Brazil and also to identify initial clusters of local transmission in the country.
Disclaimer
The new sequence has been deposited in GISAID with accession ID EPI_ISL_412964. Please feel free to download, share, use, and analyze the data from Brazil/SPBR1/2020 strain from GISAID 160. We ask that you communicate with us if you wish to publish results that use this data in a journal. If you have any other questions, please also contact us directly.
Contributing authors
Jaqueline Goes de Jesus, Claudio Sacchi, Ingra Claro, Flávia Salles, Daniela da Silva, Terezinha Maria de Paiva, Margarete Pinho, Katia Correa de Oliveira Santos, Filipe Romero, Fabiana dos Santos, Claudia Gonçalves, Maria do Carmo Timenetsky, Joshua Quick, Nick Loman, Andrew Rambaut, Ester Cerdeira Sabino, Nuno Rodrigues Faria
Affiliations
Strategic Laboratory, Instituto Adolfo Lutz, São Paulo, Brazil
Centro Nacional de Influenza, Instituto Adolfo Lutz, São Paulo, Brazil
University of Birmingham, United Kingdom
Centro de Virologia, Instituto Adolfo Lutz, São Paulo, Brazil
Universidade Federal do Rio de Janeiro, Brazil
Institute Tropical Medicine, Universidade de São Paulo, Brazil
Institute of Evolutionary Biology, University of Edinburgh, United Kingdom
Department of Zoology, University of Oxford, United Kingdom
Funding
FAPESP Medical Research Council Brazil-UP CADDE partnership award (MR/S0195/1), a Wellcome Trust and Royal Society Sir Henry Dale Fellowship (204311/Z/16/Z), Oxford Martin School, and a Wellcome Trust Collaborators Award 206298/Z/17/Z (ARTIC network).
Acknowledgments
We would like to thank all the authors who have kindly deposited and shared genome data on GISAID 160. A table with genome sequence acknowledgments can be found here 105.
Contact information
Professor Ester Cerdeira Sabino, MD, PhD
Instituto Medicina Tropical, University of Sao Paulo, Brazil
Email: sabinoec@gmail.com
Professor Nuno Rodrigues Faria, PhD
Associate Professor, University of Oxford, United Kingdom
Email: nuno.faria@zoo.ox.ac.uk