Description

Attempts to infer phylogenetic relationships, sites under selection, or evidence of recombination from SARS-CoV-2 genome sequences can be led astray by sequencing errors, contamination, and hypermutable sites. In order to make reliable inferences, it is important to identify probable errors and susceptible sites within the genome sequences, carefully consider how those might affect the specific analysis one is about to perform, and perhaps exclude problematic sites from analysis.

This track shows locations in the SARS-CoV-2 genome that have been identified as problematic for analysis for various reasons. They have been collected in the github repository https://github.com/W-L/ProblematicSites_SARS-CoV2/. Locations have been separated into two subtracks and colored corresponding to levels of severity:

Locations are labeled with the following terms to indicate the type of potential problem:

Methods

Multiple groups applied various methods (De Maio, Walker et al.; De Maio, Gozashti et al.; Turakhia et al.) to identify sites that were homoplasic, likely contaminated, likely sequencing error and/or observed in multiple virus lineages by only one or a few laboratories. They contributed their observations and recommendations to the github repository https://github.com/W-L/ProblematicSites_SARS-CoV2/. UCSC downloaded the collection, split the sites into Mask and Caution subsets depending on the recommended action and reformatted the data for display in the Genome Browser.

Data Access

The original data file was downloaded from github: https://raw.githubusercontent.com/W-L/ProblematicSites_SARS-CoV2/master/problematic_sites_sarsCov2.vcf. You can download the bigBed files underlying this track (problematicSites*.bb) from our Download Server. The data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API.

References

De Maio N, Walker C, Borges R, Weilguny L, Slodkowicz G, Goldman N. Issues with SARS-CoV-2 sequencing data. virological.org. 2020 May 5.

De Maio N, Gozashti L, Turakhia Y, Walker C, Lanfear R, Corbett-Detig R, Goldman N. Updated analysis with data from 12th June 2020. virological.org. 2020 July 14.

Turakhia Y, Thornlow B, Gozashti L, Hinrichs AS, Fernandes JD, Haussler D, and Corbett-Detig R. Stability of SARS-CoV-2 Phylogenies. bioRxiv. 2020 June 9.