Description

Given a phylogenetic tree of related sequences, a parsimony score can be computed for each mutation as the minimum number of nucleotide changes along branches of the tree that would lead to the observed sample genotypes at the leaves of the tree. For example, if there is a branch for which all leaves have a mutation, and no other leaves of the tree have the mutation, then the mutation presumably occurred once on that branch and the parsimony score would be one. However, when a mutation appears on leaves belonging to several branches whose other leaves do not have the mutation, then the mutation would need to occur on multiple branches in the tree, increasing the parsimony score. Mutations with a parsimony score that is relatively high, especially when compared to alternate allele count (the number of samples/leaves with the mutation), may be of interest when identifying systematic errors and/or sites of recurrent mutations.

This track shows the parsimony score of each single nucleotide substitution found in sequences available from NCBI Virus / GenBank, COG-UK and the China National Center for Bioinformation, in a phylogenetic tree based on sarscov2phylo release 13-11-20 (Lanfear) with more recent public sequences added using UShER (Turakhia et al.), as a bar graph with the height indicating the score. (The Phylogeny: Public track displays the phylogenetic tree and sample genotypes from which the parsimony scores were generated.)

Methods

Parsimony scores were extracted from the output of the find_parsimonious_assignments program from the strain_phylogenetics package (Turakhia et al.). See the Phylogeny: Public track for details about the phylogenetic tree and mutations.

Data Access

You can download the bigWig file underlying this track from our Download Server. The data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API.

Credits

This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions. Special thanks to Rob Lanfear for developing, running and sharing the sarscov2phylo pipeline and results.

Data usage policy

The data presented here is intended to rapidly disseminate analysis of important pathogens. Unpublished data is included with permission of the data generators, and does not impact their right to publish. Please contact the respective authors if you intend to carry out further research using their data.

References

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

Turakhia Y, Thornlow B, Gozashti L, Hinrichs AS, Fernandes JD, Haussler D, and Corbett-Detig R. Stability of SARS-CoV-2 Phylogenies. bioRxiv. 2020 June 9.

Turakhia Y, Thornlow B, Hinrichs AS, De Maio N, Gozashti L, Lanfear R, Haussler D, and Corbett-Detig R. Ultrafast Sample Placement on Existing Trees (UShER) Empowers Real-Time Phylogenetics for the SARS-CoV-2 Pandemic. bioRxiv. 2020 September 28.