cartVersion cartVersion cartVersion cartVersion 0 0 0 0 0 0 0 0 0 0 0 cartVersion cartVersion cartVersion 0 cartVersion 0 sarsCov2PhyloPub Phylogeny: Public vcfTabix Phylogenetic Tree and Nucleotide Substitution Mutations in Sequences in Public Databases 0 0.1 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays a phylogenetic tree relating public SARS-CoV-2 genome sequences\ available from\ NCBI Virus / GenBank,\ COG-UK and the\ China National Center for Bioinformation,\ contributed by laboratories around the world,\ and mutations found in those sequences. By default, only very common mutations (alternate allele found\ in at least 1% of samples) are displayed, but other subtracks may be made visible in order to see\ more rare mutations.

\

\ The phylogenetic tree is inferred by the\ sarscov2phylo pipeline\ (Lanfear).\ For display in the narrow space to the left of the main genome browser image, nodes in the tree\ are collapsed unless a mutation is associated with a node; i.e. the only branching points displayed\ are those at which mutations occurred.\

\

\ The tree is colored by\ Pangolin\ lineage (Rambaut et al.).\ The coloring scheme is adapted from Figure 1 of\ (Alm et al.) which presents a unified view of a simplified\ phylogenetic tree, Pangolin lineages,\ Nextstrain clades and\ GISAID clades.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
colorPangolin lineage(s)Nextstrain cladeGISAID clade
     A19BS
     B.n (n > 1)19AL
     n/a (color not used when coloring by lineage; overlaps on tree with B.4 - B.7)n/a (overlaps on tree with 19A)O
     n/a (color not used when coloring by lineage; overlaps on tree with B.2)n/a (overlaps on tree with 19A)V
     B.1.5, B.1.6, B.1.8, other B.1.n that overlap GISAID clade G20A (partial)G
     B.1.9, B.1.13, B.1.22, B.1.22, B.1.36, B.1.3720A (partial)GH (partial)
     B.1.3, B.1.12, B.1.26, other B.1.n that overlap GISAID clade GH20CGH (partial)
     B.1.120BGR
\

\ \

Display Conventions

\

\ In "dense" mode, a vertical line is drawn at each position where there is a mutation.\ In "squish" and "pack" modes, the display shows a plot of all \ samples' mutations, with samples ordered using the phylogenetic tree in order to highlight\ patterns of linkage. "Full" display mode shows each mutation on its own row,\ ordered by position instead of lineage.

\

\ Each sample is placed in a horizontal row of pixels; when the number of\ samples exceeds the number of vertical pixels for the track, multiple\ samples fall in the same pixel row and pixels are averaged across samples.

\

\ Each mutation is a vertical bar at its position in the SARS-CoV-2 genome\ with white (invisible) representing the reference allele;\ the non-reference allele is shown in red if it changes the protein sequence of a gene,\ green if it falls within a gene but does not change the protein,\ and black if it does not fall within a gene.\ Tick marks are drawn at the top and bottom of each mutation's vertical bar\ to make the bar more visible when most alleles are reference alleles.\ Only single-nucleotide substitutions are displayed, not insertions or deletions.\

\

\ The phylogenetic tree showing inferred relationships between the samples is depicted\ in the left column of the display.\ Mousing over this will show the sample identifiers.\ At the default track height, about 100 samples are averaged into each row of pixels.\ The track height can be adjusted in the track controls, which can be reached by\ clicking on the gray button to the left of the tree or by right-clicking on the image.\

\ \

Methods

\

\ Rob Lanfear regularly runs the\ sarscov2phylo pipeline\ on all complete, high-coverage sequences available from\ GISAID EpiCoV™.\ The pipeline aligns all sequences to the same reference genome used by the Genome Browser\ (RefSeq NC_045512.2,\ GenBank MN908947.3,\ GISAID sample\ hCoV-19/Wuhan/Hu-1/2019|EPI_ISL_402125|2019-12-31)\ using\ MAFFT\ (Katoh et al.).\ It masks sites identified as problematic by the \ ProblematicSites_SARS-CoV2 repository\ (De Maio et al.),\ as well as sites that are N's or gaps in >50% of samples.\ fasttree\ (Price et al.)\ is used to infer the phylogenetic tree;\ sequences on very long branches are removed using\ TreeShrink\ (Mai et al.).\ The tree is re-rooted to hCoV-19/Wuhan/WH04/2020|EPI_ISL_406801|2020-01-05.\

\

\ For full details, see the\ sarscov2phylo\ documentation.\

\

UCSC makes a reduced version of the tree that contains only samples from fully public\ databases (GenBank, COG-UK direct release, CNCB)\ that do not prohibit UCSC from offering sequence mutations for download (see Data Access).\ UCSC also makes several adjustments to the phylogenetic tree for compact display:\

\

\ \

Data Access

\

\ Files are available from our\ Download Server:\

\

\ The VCF data can be explored interactively with the \ Table Browser\ or the Data Integrator,\ and accessed from scripts through our API.\

\

\ The sarscov2phylo\ repository includes all releases of the full phylogenetic tree.\

\ \ \

Credits

\

\ This work is made possible by the open sharing of genetic data by research\ groups from all over the world.\ We gratefully acknowledge the authors and the originating laboratories where the clinical\ specimen or virus isolate was first obtained and the submitting laboratories, where sequence\ data have been generated and submitted to public databases,\ on which this research is based.\

\

\ Special thanks to\ Rob Lanfear\ for developing, running and sharing the\ sarscov2phylo pipeline\ and results.\

\ \

Data usage policy

\

\ The data presented here is intended to rapidly disseminate analysis of\ important pathogens. Unpublished data is included with permission of the data\ generators, and does not impact their right to publish. Please contact the\ respective authors\ if you intend to carry out further research using their data.\ Authors and/or institutions that provided the sequences are listed in\ acknowledgements.tsv.gz.\

\ \

References

\ \

\ Lanfear, R.\ A global phylogeny of SARS-CoV-2 sequences from GISAID.\ Zenodo DOI: 10.5281/zenodo.3958883. 2020.\ \

\ Rambaut A, Holmes EC, O'Toole Á, Hill V, McCrone JT, Ruis C, du Plessis L, Pybus OG.\ \ A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology.\ Nat Microbiol. 2020 Nov;5(11):1403-1407.\ PMID: 32669681\

\ \

\ Alm E, Broberg EK, Connor T, Hodcroft EB, Komissarov AB, Maurer-Stroh S, Melidou A, Neher RA,\ O'Toole Á, Pereyaslov D et al.\ \ Geographical and temporal distribution of SARS-CoV-2 clades in the WHO European Region, January to\ June 2020.\ Euro Surveill. 2020 Aug;25(32).\ PMID: 32794443; PMC: PMC7427299\

\ \

\ Katoh K, Standley DM.\ \ MAFFT multiple sequence alignment software version 7: improvements in performance and usability.\ Mol Biol Evol. 2013 Apr;30(4):772-80.\ PMID: 23329690; PMC: PMC3603318\

\ \

\ De Maio N, Walker C, Borges R, Weilguny L, Slodkowicz G, Goldman N.\ Masking strategies for SARS-CoV-2 alignments.\ virological.org. 2020 May 13.\

\ \

\ De Maio N, Gozashti L, Turakhia Y, Walker C, Lanfear R, Corbett-Detig R, Goldman N.\ Updated analysis with data from 12th June 2020.\ virological.org. 2020 July 14.\

\ \

\ Turakhia Y, Thornlow B, Hinrichs AS, De Maio N, Gozashti L, Lanfear R, Haussler D, and Corbett-Detig R.\ Ultrafast Sample Placement on Existing Trees (UShER) Empowers Real-Time Phylogenetics for the SARS-CoV-2 Pandemic.\ bioRxiv. 2020 September 28.\

\ \

\ Price MN, Dehal PS, Arkin AP.\ \ FastTree 2--approximately maximum-likelihood trees for large alignments.\ PLoS One. 2010 Mar 10;5(3):e9490.\ PMID: 20224823; PMC: PMC2835736\

\ \

\ Mai U, Mirarab S.\ \ TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic\ trees.\ BMC Genomics. 2018 May 8;19(Suppl 5):272.\ PMID: 29745847; PMC: PMC5998883\

\ varRep 1 compositeTrack on\ dataVersion /gbdb/wuhCor1/sarsCov2PhyloPub/public.all.version.txt\ geneTrack ncbiGeneBGP\ group varRep\ hapClusterColorBy function\ hapClusterEnabled on\ hapClusterHeight 500\ hapClusterMethod treeFile /gbdb/wuhCor1/sarsCov2PhyloPub/public.all.nwk\ longLabel Phylogenetic Tree and Nucleotide Substitution Mutations in Sequences in Public Databases\ priority 0.1\ sampleColorFile Pangolin_lineage=/gbdb/wuhCor1/sarsCov2PhyloPub/public.all.lineageColors.gz\ shortLabel Phylogeny: Public\ track sarsCov2PhyloPub\ type vcfTabix\ vcfDoFilter off\ vcfDoQual off\ visibility hide\ sarsCov2Phylo Phylogeny: GISAID vcfTabix Phylogenetic Tree and Nucleotide Substitution Mutations in High-coverage Sequences in GISAID EpiCoV TM 0 0.2 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays a phylogenetic tree inferred from SARS-CoV-2 genome sequences collected by\ GISAID,\ and mutations found in the sequences. By default, only very common mutations (alternate allele found\ in at least 1% of samples) are displayed, but other subtracks may be made visible in order to see\ more rare mutations.\ The phylogenetic tree is inferred by Rob Lanfear's\ sarscov2phylo pipeline.\ For display in the narrow space to the left of the main genome browser image, nodes in the tree\ are collapsed unless a mutation is associated with a node; i.e. the only branching points displayed\ are those at which mutations occurred.\

\

\ Two options for coloring the tree, by\ Pangolin\ lineage (Rambaut et al.) or\ GISAID clade,\ are available.\ Both coloring schemes are adapted from Figure 1 of\ (Alm et al.) which presents a unified view of a simplified\ phylogenetic tree, Pangolin lineages,\ Nextstrain clades and GISAID clades.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
colorlineage(s)Nextstrain cladeGISAID clade
     A19BS
     B.n (n > 1)19AL
     n/a (color not used when coloring by lineage; overlaps on tree with B.4 - B.7)n/a (overlaps on tree with 19A)O
     n/a (color not used when coloring by lineage; overlaps on tree with B.2)n/a (overlaps on tree with 19A)V
     B.1.5, B.1.6, B.1.8, other B.1.n that overlap GISAID clade G20A (partial)G
     B.1.9, B.1.13, B.1.22, B.1.22, B.1.36, B.1.3720A (partial)GH (partial)
     B.1.3, B.1.12, B.1.26, other B.1.n that overlap GISAID clade GH20CGH (partial)
     B.1.120BGR
\

\ \

Display Conventions

\

\ In "dense" mode, a vertical line is drawn at each position where there is a mutation.\ In "pack" mode, the display shows a plot of all samples' mutations, with samples\ ordered using the phylogenetic tree in order to highlight patterns of linkage.

\

\ Each sample is placed in a horizontal row of pixels; when the number of\ samples exceeds the number of vertical pixels for the track, multiple\ samples fall in the same pixel row and pixels are averaged across samples.

\

\ Each mutation is a vertical bar at its position in the SARS-CoV-2 genome\ with white (invisible) representing the reference allele;\ the non-reference allele is shown in red if it changes the protein sequence of a gene,\ green if it falls within a gene but does not change the protein,\ and black if it does not fall within a gene.\ Tick marks are drawn at the top and bottom of each mutation's vertical bar\ to make the bar more visible when most alleles are reference alleles.\ Only single-nucleotide mutations are displayed, not insertions or deletions.\

\

\ The phylogenetic tree showing inferred relationships between the samples is depicted\ in the left column of the display.\ Mousing over this will show the GISAID identifiers for the different samples.\ At the default track height, about 100 samples are averaged into each row of pixels.\ The track height can be adjusted in the track controls, which can be reached by\ clicking on the gray button to the left of the tree or by right-clicking on the image.\

\ \

Methods

\

\ Rob Lanfear regularly runs the\ sarscov2phylo pipeline\ on all complete, high-coverage sequences available from\ GISAID.\ The pipeline aligns all sequences to the same reference genome used by the Genome Browser\ (RefSeq NC_045512.2,\ GenBank MN908947.3,\ GISAID sample hCoV-19/Wuhan/Hu-1/2019|EPI_ISL_402125|2019-12-31)\ using\ MAFFT\ (Katoh et al.).\ It masks sites identified as problematic by the \ ProblematicSites_SARS-CoV2 repository\ (De Maio et al., Turakhia et al.),\ as well as sites that are N's or gaps in >50% of samples.\ fasttree\ (Price et al.)\ is used to infer the phylogenetic tree;\ sequences on very long branches are removed using\ TreeShrink\ (Mai et al.).\ The tree is re-rooted to hCoV-19/Wuhan/WH04/2020|EPI_ISL_406801|2020-01-05.\

\

\ For full details, see the\ sarscov2phylo\ documentation.\

\

Collapsing of nodes that do not have an associated mutation is done using\ strain_phylogenetics\ (Turakhia et al.).\

\ \

Data Access

\

\ You can download the VCF files underlying this track (gisaid.*.vcf.gz) from our\ Download Server. The data can be explored interactively with the \ Table Browser\ or the Data Integrator. \

\

\ Note: while the VCF files contain mutations found in sequences collected by\ GISAID,\ they are not sufficient to reconstruct the original sequences available from GISAID\ due to treatment of ambiguous IUPAC bases as missing information in the VCF and\ omission of insertion and deletion mutations. Additionally, the subtracks that are\ filtered to include only mutations found in a minimum percentage of samples give\ very incomplete representations of samples. Researchers wishing to work with SARS-CoV-2\ genomic sequences should register with GISAID and download the full sequences.\

\ \

Credits

\

This work is made possible by the open sharing of genetic data by research\ groups from all over the world. We gratefully acknowledge their contributions.\ Sequences are collected by\ GISAID\ and may be downloaded by registered users.\

\

\ Special thanks to\ Rob Lanfear\ for developing, running and sharing the\ sarscov2phylo pipeline\ and results.\

\ \

Data usage policy

\

\ The data presented here is intended to rapidly disseminate analysis of\ important pathogens. Unpublished data is included with permission of the data\ generators, and does not impact their right to publish. Please contact the\ respective authors\ if you intend to carry out further research using their data.\ Author contact info is available via\ https://github.com/roblanf/sarscov2phylo/tree/master/acknowledgements.\

\ \

References

\ \

\ Rambaut A, Holmes EC, O'Toole Á, Hill V, McCrone JT, Ruis C, du Plessis L, Pybus OG.\ \ A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology.\ Nat Microbiol. 2020 Nov;5(11):1403-1407.\ PMID: 32669681\

\ \

\ Alm E, Broberg EK, Connor T, Hodcroft EB, Komissarov AB, Maurer-Stroh S, Melidou A, Neher RA,\ O'Toole Á, Pereyaslov D et al.\ \ Geographical and temporal distribution of SARS-CoV-2 clades in the WHO European Region, January to\ June 2020.\ Euro Surveill. 2020 Aug;25(32).\ PMID: 32794443; PMC: PMC7427299\

\ \

\ Katoh K, Standley DM.\ \ MAFFT multiple sequence alignment software version 7: improvements in performance and usability.\ Mol Biol Evol. 2013 Apr;30(4):772-80.\ PMID: 23329690; PMC: PMC3603318\

\ \

\ De Maio N, Walker C, Borges R, Weilguny L, Slodkowicz G, Goldman N.\ Masking strategies for SARS-CoV-2 alignments.\ virological.org. 2020 May 13.\

\ \

\ De Maio N, Gozashti L, Turakhia Y, Walker C, Lanfear R, Corbett-Detig R, Goldman N.\ Updated analysis with data from 12th June 2020.\ virological.org. 2020 July 14.\

\ \

\ Turakhia Y, Thornlow B, Gozashti L, Hinrichs AS, Fernandes JD, Haussler D, and Corbett-Detig R.\ Stability of SARS-CoV-2 Phylogenies.\ bioRxiv. 2020 June 9.\

\ \

\ Price MN, Dehal PS, Arkin AP.\ \ FastTree 2--approximately maximum-likelihood trees for large alignments.\ PLoS One. 2010 Mar 10;5(3):e9490.\ PMID: 20224823; PMC: PMC2835736\

\ \

\ Mai U, Mirarab S.\ \ TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic\ trees.\ BMC Genomics. 2018 May 8;19(Suppl 5):272.\ PMID: 29745847; PMC: PMC5998883\

\ varRep 1 compositeTrack on\ dataVersion /gbdb/wuhCor1/sarsCov2Phylo/sarscov2phylo.version.txt\ geneTrack ncbiGeneBGP\ group varRep\ hapClusterColorBy function\ hapClusterEnabled on\ hapClusterHeight 500\ hapClusterMethod treeFile /gbdb/wuhCor1/sarsCov2Phylo/sarscov2phylo.ft.linCol.nh\ longLabel Phylogenetic Tree and Nucleotide Substitution Mutations in High-coverage Sequences in GISAID EpiCoV TM\ priority 0.2\ sampleColorFile Pangolin_lineage=/gbdb/wuhCor1/sarsCov2Phylo/sarscov2phylo.lineageColors.gz GISAID_clade=/gbdb/wuhCor1/sarsCov2Phylo/sarscov2phylo.gisaidColors.gz Nextstrain_clade=/gbdb/wuhCor1/sarsCov2Phylo/sarscov2phylo.nextstrainColors.gz\ shortLabel Phylogeny: GISAID\ tableBrowser off\ track sarsCov2Phylo\ type vcfTabix\ vcfDoFilter off\ vcfDoQual off\ visibility hide\ nextstrainClade Nextstrain Clades bigBed 12 + Nextstrain year-letter clade designations (19A, 19B, 20A, etc.) 0 0.5 0 0 0 127 127 127 0 0 0 https://nextstrain.org/ncov?d=tree&m=div&label=clade:$$

Description

\

\ Nextstrain.org displays\ data about mutations that occur in the current 2019/2020 outbreak.\ Nextstrain has a powerful user interface for viewing the timestamped molecular phylogeny tree\ that it infers from the patterns of mutations in sequences worldwide using the\ TreeTime algorithm.\

\

\ Nextstrain defines clades named by year and letter, anticipating that SARS-CoV-2 will\ become a seasonal virus in the coming years. For more information about the rationale\ and properties of current clades, see\ https://github.com/nextstrain/ncov/blob/master/docs/src/reference/naming_clades.md.\

\

\ The Nextstrain Mutations track contains all mutations shown on Nextstrain,\ of which this track's mutations are a very small subset.\

\ \

Methods

\

Nextstrain downloads SARS-CoV-2 genomes from\ GISAID\ as they are submitted by labs worldwide.\ Nextstrain identifies mutations that define clades of interest and puts them in the file\ clades.tsv.\ The genome sequences and metadata including clades.tsv are processed by an\ automated pipeline\ and annotations are written to a data file\ that UCSC downloads and extracts annotations for display.

\ \

Data Access

\

You can download the bigBed file underlying this track (nextstrainClade.bb) from our \ Download Server. The data can be explored interactively with the \ Table Browser\ or the Data Integrator. The data can also be\ accessed from scripts through our API.

\ \

Credits

\

Thanks to\ nextstrain.org for\ sharing its analysis of genomes collected by\ GISAID,\ and to researchers worldwide for sharing their SARS-CoV-2 genome sequences.\

\ \

References

\

\ Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, Sagulenko P, Bedford T, Neher\ RA.\ \ Nextstrain: real-time tracking of pathogen evolution.\ Bioinformatics. 2018 Dec 1;34(23):4121-4123.\ PMID: 29790939; PMC: PMC6247931\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainClade.bb\ group varRep\ itemRgb on\ longLabel Nextstrain year-letter clade designations (19A, 19B, 20A, etc.)\ pennantIcon Updated red ../goldenPath/newsarch.html#071720 "Now updated daily"\ priority 0.5\ shortLabel Nextstrain Clades\ track nextstrainClade\ type bigBed 12 +\ url https://nextstrain.org/ncov?d=tree&m=div&label=clade:$$\ urlLabel View in Nextstrain:\ visibility hide\ nextstrainSamplesViewAll All Samples vcfTabix Nextstrain Subset of GISAID EpiCoV TM Sample Mutations 4 0.51 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Nextstrain Subset of GISAID EpiCoV TM Sample Mutations\ parent nextstrainSamples\ shortLabel All Samples\ track nextstrainSamplesViewAll\ view all\ visibility squish\ nextstrainSamples Nextstrain Mutations vcfTabix Nextstrain Subset of GISAID EpiCoV TM Sample Mutations 0 0.51 0 0 0 127 127 127 0 0 0

Description

\

\ Nextstrain.org displays\ data about mutations in the SARS-CoV-2 RNA and protein sequences that have\ occurred in different samples of the virus during the current 2019-2021 outbreak.\ Nextstrain has a powerful user interface for viewing the evolutionary tree\ that it infers from the patterns of mutations in sequences worldwide, but\ does not offer a detailed plot of mutations along the genome\ that can be correlated with other molecular information,\ so we have processed their data into this track to display the mutations\ called by Nextstrain for each sample that Nextstrain has obtained from\ GISAID.\

\

\ Click on the vertical column in the display for any position in the\ SARS-CoV-2 genome to see more details about the mutation(s) that occur\ at that position, including protein change (if applicable; protein\ changes use gene names in the Nextstrain Genes track), number of\ samples with the mutation, list giving the nucleotide (allele) for that\ position in each GISAID sample, etc.\

\

\ Nextstrain identifies certain clades within the phylogenetic tree\ according to\ a set of defining mutations.\ The Nextstrain Clades\ track provides more information about these clades\ and serves as a useful color key for the clade colors in the phylogenetic tree display.\

\

\ This track is composed of several subtracks so that different subsets of mutations may be viewed:\

\

\ \

Display Conventions

\

\ In "dense" mode, a vertical line is drawn at each position where there is a mutation.\ In "pack" mode, the display shows a plot of all samples' mutations, with samples\ ordered using Nextstrain's phylogenetic tree in order to highlight patterns of linkage.

\

\ Each sample is placed in a horizontal row of pixels; when the number of\ samples exceeds the number of vertical pixels for the track, multiple\ samples fall in the same pixel row and pixels are averaged across samples.

\

\ Each mutation is a vertical bar at its position in the SARS-CoV-2 genome\ with white (invisible) representing the reference allele\ and black representing the non-reference allele(s).\ Tick marks are drawn at the top and bottom of each mutation's vertical bar\ to make the bar more visible when most alleles are reference alleles.\ Insertions and deletions are not shown as these are removed from the data\ by Nextstrain.\

\

\ The phylogenetic tree for the samples built by Nextstrain is depicted\ in the left column of the display.\ Mousing over this will show the GISAID identifiers for the different samples.\ When the vertical height of the track is set sufficiently high\ (10 pixels per sample with the default font),\ sample names are drawn to the right of the tree; however, with thousands of\ samples in the Nextstrain tree, and a maximum track height of 2500 pixels,\ the full Nextstrain tree is too large for sample names to be displayed.\ In the track controls, the user can choose to display subtracks containing\ the phylogenetic trees and mutations for individual clades.\ Some clades have few enough samples that they can be made tall enough to\ display sample names.\ Branches of the phylogenetic tree are colored by clade using the same\ color scheme as\ nextstrain.org.\

\ \

Methods

\

Nextstrain downloads SARS-CoV-2 genomes from\ GISAID\ as they are submitted by labs worldwide, and downsamples to a subset of several thousand\ sequences in order to provide an interactive display.\ The selected subset of GISAID sequences is processed by an\ automated pipeline,\ producing an annotated phylogenetic tree data structure underlying the Nextstrain display;\ UCSC downloads the results and extracts annotations for display.

\ \

Data Access

\

\ SARS-CoV-2 mutations displayed by Nextstrain are derived from a subset of\ GISAID sequences, and the GISAID\ Terms and Conditions\ prohibit the redistribution of GISAID-derived data. They also require that the submitters of all\ sequences be acknowledged when the mutations are used.\ Nextstrain.org offers\ phylogenetic trees, author credits and other files:\ scroll to the bottom of the page and click "DOWNLOAD DATA", and a dialog with\ download options appears.\

\ \

\ All GISAID SARS-CoV-2 genome sequences and metadata are available for download from\ GISAID EpiCoV™ by registered users.\ We have a program faToVcf that can extract VCF from a multi-sequence FASTA alignment such as the\ "msa_date"\ download file from GISAID. faToVcf is available for Linux and MacOSX on the download server:\ https://hgdownload.soe.ucsc.edu/admin/exe.\ It requires at least 4GB of memory to process the complete msa_date file.\ Here are some steps to get started using faToVcf:

\ \ \

Credits

\

This work is made possible by the open sharing of genetic data by research\ groups from all over the world. We gratefully acknowledge their contributions.\ Special thanks to\ nextstrain.org for\ sharing its analysis of genomes collected by\ GISAID.\

\ \

Data usage policy

\

\ The data presented here is intended to rapidly disseminate analysis of\ important pathogens. Unpublished data is included with permission of the data\ generators, and does not impact their right to publish. Please contact the\ respective authors\ if you intend to carry out further research using their data.\ Author contact info is available via\ nextstrain.org:\ scroll to the bottom of the page, click "DOWNLOAD DATA" and click\ "ALL METADATA (TSV)" in the resulting dialog.\

\ \

References

\

\ Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, Sagulenko P, Bedford T, Neher\ RA.\ \ Nextstrain: real-time tracking of pathogen evolution.\ Bioinformatics. 2018 Dec 1;34(23):4121-4123.\ PMID: 29790939; PMC: PMC6247931\

\

\ Sagulenko P, Puller V, Neher RA.\ \ TreeTime: Maximum-likelihood phylodynamic analysis.\ Virus Evol. 2018 Jan;4(1):vex042.\ PMID: 29340210; PMC: PMC5758920\

\

\ Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ.\ \ IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood\ phylogenies.\ Mol Biol Evol. 2015 Jan;32(1):268-74.\ PMID: 25371430; PMC: PMC4271533\

\ varRep 1 compositeTrack on\ geneTrack ncbiGeneBGP\ group varRep\ hapClusterColorBy function\ hapClusterEnabled on\ hapClusterHeight 500\ longLabel Nextstrain Subset of GISAID EpiCoV TM Sample Mutations\ pennantIcon Updated red ../goldenPath/newsarch.html#071720 "Now updated daily"\ priority 0.51\ shortLabel Nextstrain Mutations\ subGroup1 view Views all=All_Samples newClades=Year-Letter_Clades\ tableBrowser off\ track nextstrainSamples\ type vcfTabix\ vcfDoFilter off\ vcfDoQual off\ visibility hide\ nextstrainSamplesViewNewClades Year-Letter Clades vcfTabix Nextstrain Subset of GISAID EpiCoV TM Sample Mutations 0 0.51 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Nextstrain Subset of GISAID EpiCoV TM Sample Mutations\ parent nextstrainSamples\ shortLabel Year-Letter Clades\ track nextstrainSamplesViewNewClades\ view newClades\ visibility hide\ strainCons44way 44 Bat CoVs bed 4 Multiz Alignment & Conservation (44 Strains with bats as hosts) 0 1 0 0 0 127 127 127 0 0 0

\ Downloads for data in this track are available:\

\ \

Description

\

\ This track shows multiple alignments of 44 virus sequences,\ aligned to the SARS-CoV-2 reference sequence NC_045512.2,\ genome assembly GCF_009858895.2.\ It also includes measurements of evolutionary conservation using\ two methods (phastCons and phyloP) from the\ \ PHAST package, for all 44 virus sequences.\ The multiple alignments were generated using multiz and\ other tools in the UCSC/Penn State Bioinformatics\ comparative genomics alignment pipeline.\ Conserved elements identified by phastCons are also displayed in\ this track.

\

\ PhastCons (which has been used in previous Conservation tracks) is a hidden\ Markov model-based method that estimates the probability that each\ nucleotide belongs to a conserved element, based on the multiple alignment.\ It considers not just each individual alignment column, but also its\ flanking columns. By contrast, phyloP separately measures conservation at\ individual columns, ignoring the effects of their neighbors. As a\ consequence, the phyloP plots have a less smooth appearance than the\ phastCons plots, with more "texture" at individual sites. The two methods\ have different strengths and weaknesses. PhastCons is sensitive to "runs"\ of conserved sites, and is therefore effective for picking out conserved\ elements. PhyloP, on the other hand, is more appropriate for evaluating\ signatures of selection at particular nucleotides or classes of nucleotides\ (e.g., third codon positions, or first positions of miRNA target sites).

\

\ Another important difference is that phyloP can measure acceleration\ (faster evolution than expected under neutral drift) as well as\ conservation (slower than expected evolution). In the phyloP plots, sites\ predicted to be conserved are assigned positive scores (and shown in blue),\ while sites predicted to be fast-evolving are assigned negative scores (and\ shown in red). The absolute values of the scores represent -log p-values\ under a null hypothesis of neutral evolution. The phastCons scores, by\ contrast, represent probabilities of negative selection and range between 0\ and 1.

\

\ Both phastCons and phyloP treat alignment gaps and unaligned nucleotides as\ missing data.

\ \

\ In the track display, the sequence is labeled using its\ NCBI Nucleotide accession number.\

\

\ The mapping between sequence accession identifiers and more descriptive names\ is provided via\ a text file on our download server.\

\ \

Display Conventions and Configuration

\

\ Pairwise alignments of each species to the SARS-CoV-2 genome are\ displayed as a series of colored blocks indicating the functional effect of polymorphisms (in pack\ mode), or as a wiggle (in full mode) that indicates alignment quality.\ In dense display mode, percent identity of the whole alignments is shown in grayscale using\ darker values to indicate higher levels of identity.\

\ In pack mode, regions that align with 100% identity are not shown. When there is not 100% percent\ identity, blocks of four colors are drawn.\

\

\ Checkboxes on the track configuration page allow selection of the\ species to include in the pairwise display.\ Configuration buttons are available to select all of the species\ (Set all), deselect all of the species (Clear all), or\ use the default settings (Set defaults).\

\ To view detailed information about the alignments at a specific\ position, zoom the display in to 30,000 or fewer bases, then click on\ the alignment.

\ \

Base Level

\

\ When zoomed-in to the base-level display, the track shows the base\ composition of each alignment.\ The numbers and symbols on the Gaps\ line indicate the lengths of gaps in the SARS-CoV-2 sequence at those\ alignment positions relative to the longest non-SARS-CoV-2 sequence.\ If there is sufficient space in the display, the size of the gap is shown.\ If the space is insufficient and the gap size is a multiple of 3, a\ "*" is displayed; other gap sizes are indicated by "+".

\

\ Codon translation is available in base-level display mode if the\ displayed region is identified as a coding segment. To display this annotation, select the species\ for translation from the pull-down menu in the Codon\ Translation configuration section at the top of the page. Then, select one of\ the following modes:\

\ \

Methods

\

\ Pairwise alignments with the reference sequence were generated for\ each sequence using lastz version 1.04.00.\ Parameters used for each lastz alignment:\

\
# hsp_threshold      = 2200\
# gapped_threshold   = 4000 = L\
# x_drop             = 910\
# y_drop             = 3400 = Y\
# gap_open_penalty   = 400\
# gap_extend_penalty = 30\
#        A    C    G    T\
#   A   91  -90  -25 -100\
#   C  -90  100 -100  -25\
#   G  -25 -100  100  -90\
#   T -100  -25  -90   91\
# seed=1110100110010101111 w/transition\
# step=1\
\ Pairwise alignments were then linked into chains using a dynamic programming\ algorithm that finds maximally scoring chains of gapless subsections\ of the alignments organized in a kd-tree. Parameters used in\ the chaining (axtChain) step: -minScore=10 -linearGap=loose\

\

\ High-scoring chains were then placed along the genome, with\ gaps filled by lower-scoring chains, to produce an alignment net.\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
countsample
date
accessionphylogenetic
distance
descriptive name
0012019-12NC_045512.20.000000SARS-CoV-2/Wuhan-Hu-1
0022013-07-24MN996532.10.111391Bat CoV RaTG13
0032005-10-25DQ022305.20.756533Bat SARS CoV HKU3-1
0042010-04-05GQ153542.10.758373Bat SARS CoV HKU3-7
0052010-04-05GQ153547.10.758589Bat SARS CoV HKU3-12
0062011-09JX993987.10.825373Bat CoV Rp/Shaanxi2011
0072013KJ473814.10.844563BtRs-BetaCoV/HuB2013
0082006-07-13DQ412043.10.861670Bat SARS CoV Rm1
0092011JX993988.10.866485Bat CoV Cp/Yunnan2011
0102006FJ588686.10.870015Bat SARS CoV Rs672/2006
0112006-07-13DQ412042.10.873059Bat SARS CoV Rf1
0122006-07-19DQ648856.10.874586Bat CoV (BtCoV/273/2005)
0132013KJ473812.10.876344BtRf-BetaCoV/HeB2013
0142016-09MK211375.10.880260CoV BtRs-BetaCoV/YN2018A
0152013-04-17KY417147.10.883717Bat SARS-like CoV Rs4237
0162012KY770860.10.884677Bat CoV Jiyuan-84
0172013-04-17KY417149.10.886441Bat SARS-like CoV Rs4255
0182012KU973692.10.886655UNVERIFIED: SARS-related CoV F46
0192016-04-17KY938558.10.886844Bat CoV strain 16BO133
0202016-09MK211378.10.887400CoV BtRs-BetaCoV/YN2018D
0212014-05-12KY417142.10.888076Bat SARS-like CoV As6526
0222013KY770858.10.889779Bat CoV Anlong-103
0232016-09MK211377.10.890783CoV BtRs-BetaCoV/YN2018C
0242012-09-18KY417145.10.891547Bat SARS-like CoV Rf4092
0252013KJ473816.10.892938BtRs-BetaCoV/YN2013
0262018-08-13NC_004718.30.896070SARS CoV
0272013-04-17KY417148.10.897176Bat SARS-like CoV Rs4247
0282012-09-18KY417143.10.898813Bat SARS-like CoV Rs4081
0292013KJ473815.10.900478BtRs-BetaCoV/GX2013
0302006-01-25DQ071615.10.903660Bat SARS CoV Rp3
0312013-05-23KP886808.10.914845Bat SARS-like CoV YNLF_31C
0322016-08MK211374.10.920214CoV BtRl-BetaCoV/SC2018
0332016-09MK211376.10.932471CoV BtRs-BetaCoV/YN2018B
0342012-09KF367457.10.935102Bat SARS-like CoV WIV1
0352012-09-18KY417144.10.938296Bat SARS-like CoV Rs4084
0362015-10-16KY417152.10.938841Bat SARS-like CoV Rs9401
0372011KF569996.10.940405Rhinolophus affinis CoV LYRa11
0382014-10-24KY417151.10.945367Bat SARS-like CoV Rs7327
0392013-04-17KY417146.10.946050Bat SARS-like CoV Rs4231
0402013-07-21KT444582.10.961789SARS-like CoV WIV16
0412007-08KY352407.11.063753SARS-related CoV strain BtKY72
0422008NC_014470.11.075344Bat CoV BM48-31/BGR/2008
0432017-02MG772933.11.076854Bat SARS-like CoV bat-SL-CoVZC45
0442015-07MG772934.11.106462Bat SARS-like CoV bat-SL-CoVZXC21
\

\ The multiple alignment was constructed from the resulting\ pairwise alignments progressively aligned using\ multiz/autoMZ.\ The phylogenetic tree was calculated on 31mer frequency similarity\ and neighbor joining that distance matrix with the\ phylip toolset command:\ neighbor. The reference sequence NC_045512v2 is at the\ top of the tree:\

\
(((NC_045512v2 MN996532v1) ((((DQ022305v2 GQ153547v1) GQ153542v1)\
(MG772933v1 MG772934v1)) ((((((DQ071615v1 KJ473815v1)\
((((FJ588686v1 KY770858v1) ((((((KF367457v1 KY417144v1)\
(KY417151v1 KY417152v1)) ((KY417142v1 MK211377v1) (MK211376v1 MK211378v1)))\
((((KT444582v1 KY417143v1) KY417149v1) KY417146v1) (KY417147v1 KY417148v1)))\
(KJ473816v1 KY417145v1)) MK211375v1)) NC_004718v3) KP886808v1)) MK211374v1)\
(KF569996v1 KU973692v1)) JX993988v1) ((((DQ412042v1 DQ648856v1) (KJ473812v1\
KY770860v1)) KY938558v1) ((DQ412043v1 KJ473814v1) JX993987v1)))))\
(KY352407v1 NC_014470v1))\
\ Framing tables from the genes were constructed to enable\ visualization of codons in the multiple alignment display.

\ \

Phylogenetic Tree Model

\

\ Both phastCons and phyloP are phylogenetic methods that rely\ on a tree model containing the tree topology, branch lengths representing\ evolutionary distance at neutrally evolving sites, the background distribution\ of nucleotides, and a substitution rate matrix.\ The\ all-species tree model for this track was\ generated using the phyloFit program from the PHAST package\ (REV model, EM algorithm, medium precision) using multiple alignments of\ 4-fold degenerate sites extracted from the 44-way alignment\ (msa_view). The 4d sites were derived from the NCBI gene set,\ filtered to select single-coverage long transcripts.

\

\ This same tree model was used in the phyloP calculations; however, the\ background frequencies were modified to maintain reversibility.\ The resulting tree model:\ all species.\

\ \

PhastCons Conservation

\

\ The phastCons program computes conservation scores based on a phylo-HMM, a\ type of probabilistic model that describes both the process of DNA\ substitution at each site in a genome and the way this process changes from\ one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and\ Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for\ conserved regions and a state for non-conserved regions. The value plotted\ at each site is the posterior probability that the corresponding alignment\ column was "generated" by the conserved state of the phylo-HMM. These\ scores reflect the phylogeny (including branch lengths) of the species in\ question, a continuous-time Markov model of the nucleotide substitution\ process, and a tendency for conservation levels to be autocorrelated along\ the genome (i.e., to be similar at adjacent sites). The general reversible\ (REV) substitution model was used. Unlike many conservation-scoring programs,\ phastCons does not rely on a sliding window\ of fixed size; therefore, short highly-conserved regions and long moderately\ conserved regions can both obtain high scores.\ More information about\ phastCons can be found in Siepel et al, 2005.

\

\ The phastCons parameters used were: expected-length=45,\ target-coverage=0.3, rho=0.3.

\ \

PhyloP Conservation

\

\ The phyloP program supports several different methods for computing\ p-values of conservation or acceleration, for individual nucleotides or\ larger elements (http://compgen.cshl.edu/phast/). Here it was used\ to produce separate scores at each base (--wig-scores option), considering\ all branches of the phylogeny rather than a particular subtree or lineage\ (i.e., the --subtree option was not used). The scores were computed by\ performing a likelihood ratio test at each alignment column (--method LRT),\ and scores for both conservation and acceleration were produced (--mode\ CONACC).

\ \

Conserved Elements

\

\ The conserved elements were predicted by running phastCons with the\ --most-conserved option. The predicted elements are segments of the alignment\ that are likely to have been "generated" by the conserved state of the\ phylo-HMM. Each element is assigned a log-odds score equal to its log\ probability under the conserved model minus its log probability under the\ non-conserved model. The "score" field associated with this track contains\ transformed log-odds scores, taking values between 0 and 1000. (The scores\ are transformed using a monotonic function of the form a * log(x) + b.) The\ raw log odds scores are retained in the "name" field and can be seen on the\ details page or in the browser when the track's display mode is set to\ "pack" or "full".

\ \

Credits

\

This track was created using the following programs:\

\

\ \

References

\ \

\ Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\ Fullah M, Dudas G et al.\ Genomic surveillance elucidates Ebola virus origin and transmission\ during the 2014 outbreak.\ Science 2014 Sep 12;345(6202):1369-72.\ PMID: 25214632;\ Supplemental Materials and Methods\

\ \

Phylo-HMMs, phastCons, and phyloP:

\

\ Felsenstein J, Churchill GA.\ A Hidden Markov Model approach to\ variation among sites in rate of evolution.\ Mol Biol Evol. 1996 Jan;13(1):93-104.\ PMID: 8583911\

\ \

\ Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A.\ \ Detection of nonneutral substitution rates on mammalian phylogenies.\ Genome Res. 2010 Jan;20(1):110-21.\ PMID: 19858363; PMC: PMC2798823\

\ \

\ Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K,\ Clawson H, Spieth J, Hillier LW, Richards S, et al.\ Evolutionarily conserved elements in vertebrate, insect, worm,\ and yeast genomes.\ Genome Res. 2005 Aug;15(8):1034-50.\ PMID: 16024819; PMC: PMC1182216\

\ \

\ Siepel A, Haussler D.\ Phylogenetic Hidden Markov Models.\ In: Nielsen R, editor. Statistical Methods in Molecular Evolution.\ New York: Springer; 2005. pp. 325-351.\

\ \

\ Yang Z.\ A space-time process model for the evolution of DNA\ sequences.\ Genetics. 1995 Feb;139(2):993-1005.\ PMID: 7713447; PMC: PMC1206396\

\ \

Chain/Net:

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron:\ duplication, deletion, and rearrangement in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.\ PMID: 14500911; PMC: PMC208784\

\ \

Multiz:

\

\ Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM,\ Baertsch R, Rosenbloom K, Clawson H, Green ED, et al.\ Aligning multiple genomic sequences with the threaded blockset aligner.\ Genome Res. 2004 Apr;14(4):708-15.\ PMID: 15060014; PMC: PMC383317\

\ \

Lastz (formerly Blastz):

\

\ Chiaromonte F, Yap VB, Miller W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput. 2002:115-26.\ PMID: 11928468\

\ \

\ Harris RS.\ Improved pairwise alignment of genomic DNA.\ Ph.D. Thesis. Pennsylvania State University, USA. 2007.\

\ \

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-mouse alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.\ PMID: 12529312; PMC: PMC430961\

\ \

Funding

\

\ This annotation track in the UCSC SARS-CoV-2 genome browser is funded by generous private donors to\ the UC Santa Cruz Genomics Institute.

\ compGeno 1 compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html cons44way\ longLabel Multiz Alignment & Conservation (44 Strains with bats as hosts)\ priority 1\ shortLabel 44 Bat CoVs\ subGroup1 view Views align=Multiz_Alignments phyloP=Basewise_Conservation_(phyloP) phastcons=Element_Conservation_(phastCons)\ track strainCons44way\ type bed 4\ visibility hide\ A_bind_avg A_bind_avg bigWig DMS data for RBD Binding 1 1 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/A_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel A_bind_avg\ track A_bind_avg\ type bigWig\ visibility dense\ A_expr_avg_Expression A_expr_avg bigWig DMS data for RBD expression 1 1 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/A_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel A_expr_avg\ track A_expr_avg_Expression\ type bigWig\ visibility dense\ nextstrainFreqAll All bigWig Nextstrain, all samples: Alternate allele frequency 1 1 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples.bigWig\ longLabel Nextstrain, all samples: Alternate allele frequency\ parent nextstrainFreqViewAll\ priority 1\ shortLabel All\ subGroups view=all\ track nextstrainFreqAll\ type bigWig\ visibility dense\ variantAaMutsV2_B_1_1_7 Alpha AA Muts bigBed 4 Alpha VOC (B.1.1.7 UK Sep-2020) amino acid mutations in 9838 GISAID sequences (Feb 5, 2021) 1 1 63 76 203 159 165 229 0 0 0 https://outbreak.info/situation-reports?pango=B.1.1.7 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantAaMuts_B.1.1.7_2021_02_05.bb\ color 63,76,203\ longLabel Alpha VOC (B.1.1.7 UK Sep-2020) amino acid mutations in 9838 GISAID sequences (Feb 5, 2021)\ parent variantMuts off\ priority 1\ shortLabel Alpha AA Muts\ subGroups variant=A_B117 mutation=AA designation=VOC\ track variantAaMutsV2_B_1_1_7\ url https://outbreak.info/situation-reports?pango=B.1.1.7\ urlLabel B.1.1.7 Situation Report at outbreak.info\ bCellEpitopes B Cell bigBed 9 + B Cell Epitopes 1 1 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/epitopes/bCellEpitopes.bb\ configurable off\ configureByPopup off\ longLabel B Cell Epitopes\ parent epitopes on\ priority 1\ shortLabel B Cell\ track bCellEpitopes\ visibility dense\ strainCons44wayViewphyloP Basewise Conservation (phyloP) bed 4 Multiz Alignment & Conservation (44 Strains with bats as hosts) 0 1 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (44 Strains with bats as hosts)\ parent strainCons44way\ shortLabel Basewise Conservation (phyloP)\ track strainCons44wayViewphyloP\ view phyloP\ viewLimits -4:5\ viewLimitsMax -11.968:4.256\ visibility hide\ strainCons44wayViewphastcons Bat Element Conservation (phastCons) bed 4 Multiz Alignment & Conservation (44 Strains with bats as hosts) 1 1 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (44 Strains with bats as hosts)\ parent strainCons44way\ shortLabel Bat Element Conservation (phastCons)\ track strainCons44wayViewphastcons\ view phastcons\ visibility dense\ igm_COVID_408 COVID 408 bigBed 9 COVID 408 1 1 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_408.bb\ longLabel COVID 408\ parent igm on\ priority 1\ shortLabel COVID 408\ track igm_COVID_408\ type bigBed 9\ cpgIslandExt CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 3 1 0 100 0 128 228 128 0 0 0

Description

\ \

CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time,\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some other reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.

\ \

\ The unmasked version of the track displays potential CpG islands\ that exist in repeat regions and would otherwise not be visible\ in the repeat masked version.\

\ \

\ By default, only the masked version of the track is displayed. To view the\ unmasked version, change the visibility settings in the track controls at\ the top of this page.\

\ \

Methods

\ \

CpG islands were predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment was then\ evaluated for the following criteria:\ \

\

\

\ The entire genome sequence, masking areas included, was\ used for the construction of the track Unmasked CpG.\ The track CpG Islands is constructed on the sequence after\ all masked sequence is removed.\

\ \

The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length. The ratio of observed to expected \ CpG is calculated according to the formula (cited in \ Gardiner-Garden et al. (1987)):\ \

    Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)
\ \ where N = length of sequence.

\

\ The calculation of the track data is performed by the following command sequence:\

\
twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \\\
  | cpg_lh /dev/stdin 2> cpg_lh.err \\\
    |  awk '{$2 = $2 - 1; width = $3 - $2;  printf("%s\\t%d\\t%s\\t%s %s\\t%s\\t%s\\t%0.0f\\t%0.1f\\t%s\\t%s\\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \\\
     | sort -k1,1 -k2,2n > cpgIsland.bed\
\ The unmasked track data is constructed from\ twoBitToFa -noMask output for the twoBitToFa command.\

\ \

Data access

\

\ CpG islands and its associated tables can be explored interactively using the\ REST API, the\ Table Browser or the\ Data Integrator.\ All the tables can also be queried directly from our public MySQL\ servers, with more information available on our\ help page as well as on\ our blog.

\

\ The source for the cpg_lh program can be obtained from\ src/utils/cpgIslandExt/.\ The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file")\

\ \

Credits

\ \

This track was generated using a modification of a program developed by G. Miklem and L. Hillier \ (unpublished).

\ \

References

\ \

\ Gardiner-Garden M, Frommer M.\ \ CpG islands in vertebrate genomes.\ J Mol Biol. 1987 Jul 20;196(2):261-82.\ PMID: 3656447\

\ regulation 1 html cpgIslandSuper\ longLabel CpG Islands (Islands < 300 Bases are Light Green)\ parent cpgIslandSuper pack\ priority 1\ shortLabel CpG Islands\ track cpgIslandExt\ igg_Ctrl_NC67 Ctrl NC67 bigBed 9 Ctrl NC67 1 1 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_NC67.bb\ longLabel Ctrl NC67\ parent igg on\ priority 1\ shortLabel Ctrl NC67\ track igg_Ctrl_NC67\ type bigBed 9\ icshapeInvitro icSHAPE In-vitro bigWig icSHAPE In-vitro 2 1 100 80 40 177 167 147 0 0 0 rna 0 autoScale off\ bigDataUrl /gbdb/wuhCor1/bbi/icshape/invitro.bw\ color 100,80,40\ longLabel icSHAPE In-vitro\ maxHeightPixels 100:40:8\ parent icshape\ shortLabel icSHAPE In-vitro\ track icshapeInvitro\ type bigWig\ viewLimits 0.0:1.0\ visibility full\ treangen Intrahost SNPs bigBed Intrahost SNP patient data from Todd Treangen's group 0 1 0 0 0 127 127 127 0 0 0

Description

\

This track shows iSNPs (intrahost SNPs). These are SNPs that have\ evidence for variation within one host. That is, a single patient can \ have variation among the various SARS-CoV-2 viruses infecting their cells. \ This variation is lost\ when a single consensus genome sequence is reported for a patient. The data\ were published in Sapoval et al, 2020 "Hidden genomic\ diversity of SARS-CoV-2: implications for qRT-PCR diagnostics and\ transmission".

\

In this track, iSNPs (intrahost SNP's) of human patients\ from New York City and Houston are shown.

\ \

Display Conventions and Configuration

\

The track contains a list of iSNPs found in patient data from New York City \ and Houston with nucleotide and amino acid changes, one feature per variant. The \ name field in this track represents the median observed allele frequency for \ patients meeting inclusion criteria in the VCFs provided by Sapoval et al. \ Finally, bedToBigBed was used to create the BigBed track.

\

Interested users may wish to inspect each of the individual VCFs; for \ this track we have chosen to show a condensed version of all VCFs (see Methods).

\ \

Methods

\

VCF files were downloaded from the Rice University data repository. SARS-CoV-2 \ iSNPs from New York City and Houston patient data were parsed, and if the base \ position was modified in more than one sample then it was included. The frequency \ of observing a particular base (A,C,G,T) at the position when a change was recorded \ was then included, and the dominant base change was used to determine whether the \ base modification would also result in an amino acid change.

\

The original data files are available from a shared box.com folder with VCF files.

\ \

References

\ \

\ Sapoval N, Mahmoud M, Jochum MD, Liu Y, Leo Elworth RA, Wang Q, Albin D, Ogilvie H, Lee MD, Villapol\ S et al.\ \ Hidden genomic diversity of SARS-CoV-2: implications for qRT-PCR diagnostics and transmission.\ bioRxiv. 2020 Jul 2;.\ PMID: 32637955; PMC: PMC7337385\

\ \ varRep 1 bigDataUrl /gbdb/wuhCor1/treangen/Treangen_iSNP.bb\ group varRep\ longLabel Intrahost SNP patient data from Todd Treangen's group\ priority 1\ shortLabel Intrahost SNPs\ track treangen\ type bigBed\ visibility hide\ M1_library M1 SARS-CoV peptides bigBed 4 T-Cell reactive epitopes: M1 SARS-CoV peptides mapped to SARS-Cov-2 0 1 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/bbi/M1_peptides.bb\ longLabel T-Cell reactive epitopes: M1 SARS-CoV peptides mapped to SARS-Cov-2\ parent targets\ shortLabel M1 SARS-CoV peptides\ track M1_library\ type bigBed 4\ COV2-2050Total MAB COV2-2050 total bigWig Bloom antibody escape - Total Score - COV2-2050 1 1 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2050.tot.bw\ longLabel Bloom antibody escape - Total Score - COV2-2050\ parent bloomEscTotal on\ shortLabel MAB COV2-2050 total\ track COV2-2050Total\ type bigWig\ visibility dense\ kimMgiLdr3p MGI Ld2Bd 3' bigWig MGISEQ Leader-to-body 3' Breakpoints 1 1 102 168 15 178 211 135 0 0 0 expression 0 alwaysZero on\ bigDataUrl /gbdb/wuhCor1/bbi/kim2020/kim-scv2-mgiseq-leader-to-body-breakpoints.bigWig\ color 102,168,15\ graphTypeDefault bar\ group expression\ longLabel MGISEQ Leader-to-body 3' Breakpoints\ maxHeightPixels 48:48:11\ parent kimNp\ shortLabel MGI Ld2Bd 3'\ smoothingWindow off\ track kimMgiLdr3p\ transformFunc LOG\ type bigWig\ visibility dense\ windowingFunction maximum\ strainCons44wayViewalign Multiz Alignments bed 4 Multiz Alignment & Conservation (44 Strains with bats as hosts) 3 1 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (44 Strains with bats as hosts)\ parent strainCons44way\ shortLabel Multiz Alignments\ track strainCons44wayViewalign\ view align\ viewUi on\ visibility pack\ pond Nat. Selection (Pond) bigBed 8 + Natural selection analysis from Sergei Pond's research group 0 1 0 0 0 127 127 127 0 0 0

Description

\

\

This track shows data from Sergei Pond's research group, updated several\ times between 2020 and 2022, with results published in 2022. The current\ dataset is from February 2022 and is scheduled to be updated soon. Contact us\ or Sergei if you believe that the data shown is too outdated for your analyses. \

\ \

The authors use several statistical techniques to identify selection sites\ of interest in SARS-CoV-2 data from GISAID.\

\ \

Display Conventions and Configuration

\ \

\ This track has two subtracks:\

\ \

Positive Selection: "On average along interior \ tree branches, this site has a dN/dS>1 is accumulating non-synonymous changes (some \ of which might have a functional impact, but most probably don't) faster \ relative to synonymous changes than would be expected under neutral evolution."

\ \

Negative Selection: "On average along interior \ tree branches, this site has a dN/dS<1, meaning that it is conserved, i.e. \ non-synonymous changes might be selected against. Note that sites with no \ changes (i.e. perfectly conserved sites) cannot be detected by dN/dS based methods"

\ \

Methods

\

The CSV used to generate the genomic coordinates of selection sites was parsed \ and the position, gene, site_in_gene, score, and type fields were used to \ generate the resulting fields provided for each site in the data.

\ \

References

\

Pond et al, 2020 \ "Natural selection analysis of SARS-CoV-2/COVID-19"

\ \

Pond et al, 2020 \ "Natural selection analysis of SARS-CoV-2/COVID-19, V2"

\ \

\ Martin DP, Lytras S, Lucaci AG, Maier W, Grüning B, Shank SD, Weaver S, MacLean OA, Orton RJ,\ Lemey P et al.\ \ Selection Analysis Identifies Clusters of Unusual Mutational Changes in Omicron Lineage BA.1 That\ Likely Impact Spike Function.\ Mol Biol Evol. 2022 Apr 11;39(4).\ PMID: 35325204; PMC: PMC9037384\

\ varRep 1 compositeTrack on\ exonArrows off\ filter.p 0:0.00001\ filterByRange.p on\ filterLabel.p p-Value range to filter\ filterLimits.p 0:0.01\ group varRep\ longLabel Natural selection analysis from Sergei Pond's research group\ maxItems 1000000\ priority 1\ shortLabel Nat. Selection (Pond)\ track pond\ type bigBed 8 +\ visibility hide\ ncbiGeneBGP NCBI Genes bigGenePred NCBI Genes from NC_045512.2 3 1 12 12 120 133 133 187 0 0 0

Description

\

\ The NCBI Gene track for the 13 Jan 2020\ SARS-CoV-2 virus/GCF_009858895.2 genome assembly is\ constructed from the NCBI nuccore entry for NC_045512.2 \ https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2\ \

\ \

Data Access

\

\ The raw data can be explored interactively with the \ Table Browser, or the \ Data Integrator. \ For automated analysis, the genome annotation is stored in \ a bigBed file that can be downloaded from \ the download server. \ \ Annotations can\ be converted to ASCII text by our tool bigBedToBed which can be compiled from \ the source code or downloaded as a precompiled binary for your system. \ Instructions for downloading source code and binaries can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range, \ for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/ncbiGene.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\ \

\ Please refer to our \ mailing list archives\ for questions, or our \ Data Access FAQ\ for more information.

\ \

Credits

\

\ This track was created by Max Haeussler and Brian Raney at UCSC, with help from Daniel Schmelter\ and many others. Thanks to NCBI and the US National Institutes of Health\ for making all data available for download.

\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ bigDataUrl /gbdb/wuhCor1/bbi/ncbi/genes.bb\ color 12,12,120\ exonNumbers off\ group genes\ labelFields geneName2,geneName,geneId,product\ longLabel NCBI Genes from NC_045512.2\ mouseOverField note\ priority 1\ searchIndex name\ searchTrix /gbdb/wuhCor1/ncbiSearch.ix\ shortLabel NCBI Genes\ track ncbiGeneBGP\ type bigGenePred\ urls product=https://www.ncbi.nlm.nih.gov/protein/$$ geneId=https://www.ncbi.nlm.nih.gov/gene/$$\ visibility pack\ ncbiProducts NCBI Proteins bigGenePred NCBI Proteins: annotated mature peptide products 0 1 12 12 120 133 133 187 0 0 0

Description

\

\ The NCBI Mature Proteins track for the 13 Jan 2020\ SARS-CoV-2 virus/GCF_009858895.2 genome assembly is\ constructed from the NCBI nuccore entry for NC_045512.2 \ https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2\ \

\

It shows the mature peptides, after cleavage, as annotated on the Genbank record.

\ \

Data Access

\

\ The raw data can be explored interactively with the \ Table Browser, or the \ Data Integrator. \ For automated analysis, the genome annotation is stored in \ a bigBed file that can be downloaded from \ the download server. \ \ Annotations can\ be converted to ASCII text by our tool bigBedToBed which can be compiled from \ the source code or downloaded as a precompiled binary for your system. \ Instructions for downloading source code and binaries can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range, \ for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/ncbiGene.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\ \

\ Please refer to our \ mailing list archives\ for questions, or our \ Data Access FAQ\ for more information.

\ \

Credits

\

\ This track was created by Max Haeussler and Brian Raney at UCSC, with help from Daniel Schmelter\ and many others. Thanks to NCBI and the US National Institutes of Health\ for making all data available for download.

\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ bigDataUrl /gbdb/wuhCor1/bbi/ncbi/peptides.bb\ color 12,12,120\ exonNumbers off\ group genes\ labelFields geneName,product\ longLabel NCBI Proteins: annotated mature peptide products\ mouseOverField note\ priority 1\ searchIndex name\ searchTrix /gbdb/wuhCor1/ncbiProducts.ix\ shortLabel NCBI Proteins\ track ncbiProducts\ type bigGenePred\ urls product=https://www.ncbi.nlm.nih.gov/protein/$$ geneId=https://www.ncbi.nlm.nih.gov/gene/$$\ visibility hide\ Negative_Selection Negative Selection bigBed 8 + Sites of negative selection implicated in data from Sergei Pond's research group 1 1 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/pond/neg.bb\ longLabel Sites of negative selection implicated in data from Sergei Pond's research group\ parent pond\ priority 1\ shortLabel Negative Selection\ track Negative_Selection\ type bigBed 8 +\ visibility dense\ ORFs ORF predictions bigBed 12 + Weizman ORF predictions 0 1 40 80 120 147 167 187 0 0 0

Description

\

\ The Weizman ORFs (Open Reading Frames) track shows previously unannotated ORF\ predictions based on Ribo-Seq and RNA-seq data. It is a collection of\ tracks (super track) \ that contains not only the predicted gene models, but also\ data supporting them.

\ \

Display Conventions and Configuration

\ The Predicted ORFs track shows the predicted exons. All other tracks show the signal as \ a x-y plot with bars.\ \

Methods

\

\ Methods from Finkel et al:

\

\ To capture the full SARS-CoV-2 coding capacity, we applied a suite of ribosome\ profiling approaches to Vero cells infected with SARS-CoV-2 for 5 and 24 hours,\ and Calu3 cells infected for 7 hours. For each time point we prepared three\ different ribosome-profiling libraries, each one in two biological replicates.\ Two Ribo-seq libraries facilitate mapping of translation initiation sites, by\ treating cells with lactimidomycin (LTM) or harringtonine (Harr), two drugs\ with distinct mechanisms that prevent 80S ribosomes at translation initiation\ sites from elongating. The third Ribo-seq library was prepared from cells\ treated with the translation elongation inhibitor cycloheximide (CHX), and\ gives a snap-shot of actively translating ribosomes across the body of the\ translated ORF. In parallel, RNA-sequencing was applied to map viral\ transcripts.

\

\ The ORF prediction was done by using two computational tools, PRICE and\ ORF-RATER, that rely on different features of ribosome profiling data, and by\ manual inspection of the data. The predictions are based on Ribo-seq libraries\ from two time points (5 and 7 hpi) of two different cell lines (Vero E6 and\ Calu3 cells), infected with separate virus isolates.

\

\ The Ribo-Seq data of the 24 hours samples do not show the expected profile of\ read distribution on viral genes and therefore were not used for the procedure\ of ORF predictions.

\

For more details see the paper in the References section below.

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Finkel Y, Mizrahi O, Nachshon A, Weingarten-Gabbay S, Morgenstern D, Yahalom-Ronen Y, Tamir H,\ Achdout H, Stein D, Israeli O et al.\ \ The coding capacity of SARS-CoV-2.\ Nature. 2020 Sep 9;.\ PMID: 32906143\

\ \ genes 1 bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/ORFs.bb\ color 40,80,120\ html weizmanOrfs\ itemRgb on\ longLabel Weizman ORF predictions\ noScoreFilter on\ parent weizmanOrfs pack\ priority 1\ shortLabel ORF predictions\ track ORFs\ type bigBed 12 +\ IgM_Z-score-_COVID-19_patients_P1 P1 bigBed 9 P1 1 1 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgM_Z-score-_COVID-19_patients/P1.bb\ longLabel P1\ parent IgM_Z-score-_COVID-19_patients on\ shortLabel P1\ track IgM_Z-score-_COVID-19_patients_P1\ type bigBed 9\ P1 P1 bigBed 9 P1 1 1 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P1.bb\ longLabel P1\ parent IgG_Z-score-_COVID-19_patients on\ shortLabel P1\ track P1\ type bigBed 9\ PhyloCSFgenes PhyloCSF Genes bigGenePred PhyloCSF Genes - curated conserved genes 3 1 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/wuhCor1/bbi/phyloGenes/PhyloCSFgenes.bb\ longLabel PhyloCSF Genes - curated conserved genes\ parent phyloGenes\ priority 10\ shortLabel PhyloCSF Genes\ track PhyloCSFgenes\ type bigGenePred\ visibility pack\ unipCov2FullSeq Precurs. Proteins bigGenePred UniProt Precursor Proteins (before cleavage into protein products) 4 1 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 baseColorDefault genomicCodons\ bigDataUrl /gbdb/wuhCor1/uniprot/unipFullSeqCoV2.bb\ dataVersion /gbdb/$D/uniprot/version.txt\ exonNumbers off\ group uniprot\ html uniprotCov2\ itemRgb on\ labelFields protShortNames,geneName,uniprotName,acc,hgncSym,refSeq,refSeqProt,ensProt,uniprotName\ longLabel UniProt Precursor Proteins (before cleavage into protein products)\ mouseOverField comments\ priority 1\ searchIndex acc\ shortLabel Precurs. Proteins\ track unipCov2FullSeq\ type bigGenePred\ urls acc="http://www.uniprot.org/uniprot/$$" hgncId="https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$$"\ visibility squish\ unipCov2AliSwissprot Protein Alignments bigPsl UCSC alignment of full-length SwissProt proteins to genome 0 1 2 12 120 128 133 187 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. It shows how the protein sequences in this database \ map to the genome. This mapping was used to "lift" the UniProt protein\ annotations to the SARS-CoV-2 genome. The protein annotation themselves have been\ curated from scientific publications by the UniProt/SwissProt staff.\

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. \

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. For human and mouse, the\ alignments were filtered by retaining only proteins annotated with\ a given transcript in the Genome Browser table kgXref. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipAliSwissprotCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Alejo\ Mujica, Regeneron Pharmaceuticals. Thanks to UniProt for making all data\ available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 baseColorTickColor contrastingColor\ bigDataUrl /gbdb/wuhCor1/uniprot/unipAliSwissprotCov2.bb\ color 2,12,120\ dataVersion /gbdb/$D/uniprot/version.txt\ exonNumbers off\ group uniprot\ indelDoubleInsert on\ indelQueryInsert on\ itemRgb off\ labelFields protShortNames,geneName,uniprotName,acc,hgncSym,refSeq,refSeqProt,ensProt,uniprotName\ longLabel UCSC alignment of full-length SwissProt proteins to genome\ mouseOverField protFullNames\ priority 1\ searchIndex name,acc\ shortLabel Protein Alignments\ track unipCov2AliSwissprot\ type bigPsl\ urls acc="http://www.uniprot.org/uniprot/$$" hgncId="https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$$"\ visibility hide\ Starr_Bloom_bind RBD Mut Bind bigWig S RBD Deep Mutational Scanning: ACE2 Binding (Jesse Bloom's Group) 0 1 0 0 0 127 127 127 0 0 0

Description

\

\ This track contains deep mutational scanning data measuring the effect of the change \ in expression from a wild type allele to a mutant allele. The authors use a yeast \ display system to experimentally measure the effect of all possible point (amino acid) \ RBD mutations on protein expression & ACE2 affinity.

\ \

Display Conventions and Configuration

\

\ Each subtrack contains all the scores representing mutations to a particular amino acid \ (each annotation is an S codon). For instance the A subtrack measures the change in \ ACE2 binding of S RBD if the annotated amino acid is mutated to alanine (if the wildtype \ amino acid is A, then the score is 0). A positive score indicates increased binding a \ negative score is a loss of binding.

\ \

\ Please see the interactive heatmap generated by the authors at this link. Structural \ visualizations of the data are available from the authors via dms-view here.

\ \

Methods

\

\ Table S2 from Starr et al, was downloaded and parsed into bedGraph format using the \ average value of both replicates reported. All NA values were filtered out.

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Starr TN, Greaney AJ, Hilton SK, Ellis D, Crawford KHD, Dingens AS, Navarro MJ, Bowen JE, Tortorici\ MA, Walls AC et al.\ \ Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and\ ACE2 Binding.\ Cell. 2020 Sep 3;182(5):1295-1310.e20.\ PMID: 32841599; PMC: PMC7418704\

\ immu 0 autoScale on\ compositeTrack on\ group immu\ longLabel S RBD Deep Mutational Scanning: ACE2 Binding (Jesse Bloom's Group)\ priority 1\ shortLabel RBD Mut Bind\ track Starr_Bloom_bind\ type bigWig\ visibility hide\ Starr_Bloom RBD Mut Expr bigWig S RBD Deep Mutational Scanning: Expression (Jesse Bloom's Group) 0 1 0 0 0 127 127 127 0 0 0

Description

\

\ This track contains deep mutational scanning data measuring the effect of the change \ in expression, ACE2 or antibody binding from a wild type allele to a mutant\ allele. The authors use a yeast display system to experimentally measure the\ effect of all possible point (amino acid) RBD mutations on protein expression, ACE2 affinity or \ antibody binding.

\ \

Display Conventions and Configuration

\

\ Each subtrack contains all the scores representing mutations to a particular amino \ acid (each annotation is a S codon). For instance the A subtrack measures the change \ in expression of S RBD expression if the annotated amino acid is mutated to alanine \ (if the wildtype amino acid is A, then the score is 0). A positive score indicates \ increased expression a negative score is a loss of expression.

\ \

\ Please see the interactive heatmap generated by the authors at this link. Structural\ visualizations of the data are available from the authors via dms-view here.

\ \

Methods

\

\ Table S2 from Starr et al, was downloaded and parsed into bedGraph format using the\ average value of both replicates reported. All NA values were filtered out.

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Starr TN, Greaney AJ, Hilton SK, Ellis D, Crawford KHD, Dingens AS, Navarro MJ, Bowen JE, Tortorici\ MA, Walls AC et al.\ \ Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and\ ACE2 Binding.\ Cell. 2020 Sep 3;182(5):1295-1310.e20.\ PMID: 32841599; PMC: PMC7418704\

\ \ immu 0 autoScale on\ compositeTrack on\ group immu\ longLabel S RBD Deep Mutational Scanning: Expression (Jesse Bloom's Group)\ priority 1\ shortLabel RBD Mut Expr\ track Starr_Bloom\ type bigWig\ visibility hide\ nextstrainSamplesRb Rec Bi-allelic vcfTabix Recurrent Bi-allelic Mutations in Nextstrain Subset of GISAID EpiCoV TM Samples 4 1 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainRecurrentBiallelic.vcf.gz\ hapClusterHeight 500\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain.nh\ longLabel Recurrent Bi-allelic Mutations in Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewAll on\ priority 1\ shortLabel Rec Bi-allelic\ subGroups view=all\ track nextstrainSamplesRb\ SHAPE SHAPE Reactivity bigWig SHAPE Reactivity (VeroE6 cells, virus isolate USA-WA1/2020) 2 1 0 0 0 127 127 127 0 0 0 rna 0 bigDataUrl /gbdb/wuhCor1/pyle/SHAPE_Reactivity.bw\ longLabel SHAPE Reactivity (VeroE6 cells, virus isolate USA-WA1/2020)\ parent pyle\ priority 1\ shortLabel SHAPE Reactivity\ track SHAPE\ type bigWig\ viewLimits -0.5:4\ visibility full\ pyle SHAPE Struct Pyle bigWig RNA SHAPE Structure from the Pyle group 0 1 0 0 0 127 127 127 0 0 0

Description

\

This track shows data from Anna Pyle's lab at Yale University describing the RNA secondary \ structure of the SARS-CoV-2 genome. The authors performed experimental\ measurements using SHAPE (selective 2-hydroxyl acylation) as well as in-silico analysis.\

\ \

These data are described in depth in \ de Cesaris Araujo Tavares et al \ and Huston, Wan, et al.

\ \

Display Conventions and Configuration

\ \

Two tracks are available:

\

SHAPE reactivity: A low SHAPE score (<0.4) indicates that the nucleotide is\ not accessible (folded between complementary regions - base-paired). A mid \ (0.4<SHAPE<0.85) or high SHAPE reactivity\ (>0.85) indicates the nucleotide is flexible (single-stranded). A score of -999\ means no data for that nucleotide was recovered experimentally. For\ visualization purposes the minimum display value has been set to -0.5 and the\ max has been set to 4.

\ \

Full Length Shannon Entropy: A genome-wide Shannon entropy profile derived from \ base pairing probabilities was computed using SuperFold with in vivo SHAPE reactivity as \ constraints . A low Shannon entropy (near 0) is evidence for well-determined RNA conformation.\

\ \

Note that the authors provide further data (including secondary structure \ predictions) at their github repository.

\ \

Methods

\

For Shannon entropy, comSuperFold was used with default settings to predict\ secondary structure for the full-length SARS-CoV-2 RNA genome.

\

For\ SHAPE_reactivity, VeroE6 cells were infected with 105 PFU of SARS133 CoV-2\ isolate USA-WA1/2020. See the preprint\ for further details. \ wigToBigWig was used to convert wiggles from the author's github repository to bigWig files after filtering out headers.\

\ \

The Github repository with all raw data files can be found here: https://github.com/pylelab/SARS-CoV-2_SHAPE_MaP_structure.

\ \

References

\ \

\ Huston NC, Wan H, Araujo Tavares RC, Wilen C, Pyle AM.\ \ Comprehensive in-vivo secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs\ and mechanisms.\ bioRxiv. 2020 Jul 10;.\ PMID: 32676598; PMC: PMC7359520\

\

\ Araujo Tavares RC, Mahadeshwar G, Pyle AM.\ \ The global and local distribution of RNA structure throughout the SARS-CoV-2 genome.\ bioRxiv. 2020 Jul 7;.\ doi: 190660\

\ rna 0 compositeTrack on\ group rna\ longLabel RNA SHAPE Structure from the Pyle group\ priority 1\ shortLabel SHAPE Struct Pyle\ track pyle\ type bigWig\ visibility hide\ TRS_sites TRS sites bigBed 6 Transcription Regulatory Sequences (TRS) of Canonical Subgenomic Transcripts 0 1 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/wuhCor1/bbi/kim2020/TRS.bb\ longLabel Transcription Regulatory Sequences (TRS) of Canonical Subgenomic Transcripts\ parent transcriptome\ priority 1\ shortLabel TRS sites\ track TRS_sites\ type bigBed 6\ varskip1a VarSkip 1a bigBed 6 NEB VarSkip V1a 3 1 0 0 0 127 127 127 0 0 0 map 1 bigDataUrl /gbdb/wuhCor1/varskip/neb_vss1a.primer.bb\ longLabel NEB VarSkip V1a\ parent varskip\ priority 1\ shortLabel VarSkip 1a\ track varskip1a\ type bigBed 6\ visibility pack\ strainCons119way 119 Vertebrate CoVs bed 4 Multiz Alignment & Conservation (119 strains: strains with vertebrate hosts and human SARS-Cov2) 0 2 0 0 0 127 127 127 0 0 0

\ Downloads for data in this track are available:\

\ \

Description

\

\ This track shows multiple alignments of 119 virus sequences,\ aligned to the SARS-CoV-2 reference sequence SARS-CoV-2/NC_045512.2,\ genome assembly assembly GCF_009858895.2_ASM985889v3.\ These 119 sequences are from very different coronavirus strains:\

\ \ It also includes measurements of evolutionary conservation using\ two methods (phastCons and phyloP) from the\ \ PHAST package, for all 119 virus sequences.\ The multiple alignments were generated using multiz and\ other tools in the UCSC/Penn State Bioinformatics\ comparative genomics alignment pipeline.\ Conserved elements identified by phastCons are also displayed in\ this track.

\

\ PhastCons (which has been used in previous Conservation tracks) is a hidden\ Markov model-based method that estimates the probability that each\ nucleotide belongs to a conserved element, based on the multiple alignment.\ It considers not just each individual alignment column, but also its\ flanking columns. By contrast, phyloP separately measures conservation at\ individual columns, ignoring the effects of their neighbors. As a\ consequence, the phyloP plots have a less smooth appearance than the\ phastCons plots, with more "texture" at individual sites. The two methods\ have different strengths and weaknesses. PhastCons is sensitive to "runs"\ of conserved sites, and is therefore effective for picking out conserved\ elements. PhyloP, on the other hand, is more appropriate for evaluating\ signatures of selection at particular nucleotides or classes of nucleotides\ (e.g., third codon positions, or first positions of miRNA target sites).

\

\ Another important difference is that phyloP can measure acceleration\ (faster evolution than expected under neutral drift) as well as\ conservation (slower than expected evolution). In the phyloP plots, sites\ predicted to be conserved are assigned positive scores (and shown in blue),\ while sites predicted to be fast-evolving are assigned negative scores (and\ shown in red). The absolute values of the scores represent -log p-values\ under a null hypothesis of neutral evolution. The phastCons scores, by\ contrast, represent probabilities of negative selection and range between 0\ and 1.

\

\ Both phastCons and phyloP treat alignment gaps and unaligned nucleotides as\ missing data.

\ \

\ In the track display, the sequence is labeled using its\ NCBI Nucleotide accession number.\

\

\ The mapping between sequence accession identifiers and more descriptive names\ is provided via a text file on our download server.\

\ \

Display Conventions and Configuration

\

\ Pairwise alignments of each species to the SARS-CoV-2 genome are\ displayed as a series of colored blocks indicating the functional effect of polymorphisms (in pack\ mode), or as a wiggle (in full mode) that indicates alignment quality.\ In dense display mode, percent identity of the whole alignments is shown in grayscale using\ darker values to indicate higher levels of identity.\

\ In pack mode, regions that align with 100% identity are not shown. When there is not 100% percent\ identity, blocks of four colors are drawn.\

\

\ Checkboxes on the track configuration page allow selection of the\ species to include in the pairwise display.\ Configuration buttons are available to select all of the species\ (Set all), deselect all of the species (Clear all), or\ use the default settings (Set defaults).\

\ To view detailed information about the alignments at a specific\ position, zoom the display in to 30,000 or fewer bases, then click on\ the alignment.

\ \

Base Level

\

\ When zoomed-in to the base-level display, the track shows the base\ composition of each alignment.\ The numbers and symbols on the Gaps\ line indicate the lengths of gaps in the SARS-CoV-2 sequence at those\ alignment positions relative to the longest non-SARS-CoV-2 sequence.\ If there is sufficient space in the display, the size of the gap is shown.\ If the space is insufficient and the gap size is a multiple of 3, a\ "*" is displayed; other gap sizes are indicated by "+".

\

\ Codon translation is available in base-level display mode if the\ displayed region is identified as a coding segment. To display this annotation, select the species\ for translation from the pull-down menu in the Codon\ Translation configuration section at the top of the page. Then, select one of\ the following modes:\

\ \

Methods

\

\ Pairwise alignments with the reference sequence were generated for\ each sequence using lastz version 1.04.00.\ Parameters used for each lastz alignment:\

\
# hsp_threshold      = 2200\
# gapped_threshold   = 4000 = L\
# x_drop             = 910\
# y_drop             = 3400 = Y\
# gap_open_penalty   = 400\
# gap_extend_penalty = 30\
#        A    C    G    T\
#   A   91  -90  -25 -100\
#   C  -90  100 -100  -25\
#   G  -25 -100  100  -90\
#   T -100  -25  -90   91\
# seed=1110100110010101111 w/transition\
# step=1\
\ Pairwise alignments were then linked into chains using a dynamic programming\ algorithm that finds maximally scoring chains of gapless subsections\ of the alignments organized in a kd-tree. Parameters used in\ the chaining (axtChain) step: -minScore=10 -linearGap=loose\

\

\ High-scoring chains were then placed along the genome, with\ gaps filled by lower-scoring chains, to produce an alignment net.\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
countsample
date
accessionphylogenetic
distance
descriptive name
0012019-12-30NC_045512.20.000000Wuhan-Hu-1
0022020-01-02MN988668.10.0000032019-nCoV WHU01
0032019-12-30MN996528.10.000003WIV04
0042019-12-30MT019532.10.000003BetaCoV/Wuhan/IPBCAMS-WH-04/2019
0052020-01-01LR757996.10.000004BetaCoV/Wuhan/WH-03/2019
0062019-12-30MN996530.10.000006WIV06
0072020-01-20MT039873.10.000006BetaCoV/Hangzhou/HZ-1/2020 20cov-1L
0082020-01-07NMDC60013002-070.000006BetaCoV/Wuhan/YS8011/2020
0092020-01-01MT019533.10.000008BetaCoV/Wuhan/IPBCAMS-WH-05/2020
0102020-02-10MT106053.10.0000082019-nCoV/USA-CA8/2020
0112020-02-23MT118835.10.0000082019-nCoV/USA-CA9/2020
0122019-12-30NMDC60013002-060.000008BetaCoV/Wuhan/WH19008/2019
0132019-12-30MT019531.10.000010BetaCoV/Wuhan/IPBCAMS-WH-03/2019
0142019-12-30NMDC60013002-100.000010BetaCoV/Wuhan/WH19005/2019
0152019-12-30MT019530.10.000011BetaCoV/Wuhan/IPBCAMS-WH-02/2019
0162020-01-13MT072688.10.000012SARS0CoV-2/61-TW/human/2020/NPL
0172020-01-08MT093631.10.000013SARS-CoV-2/WH-09/human/2020/CHN
0182020-01-31MT039887.10.0000142019-nCoV/USA-WI1/2020
0192020-01-29MT027064.10.0000152019-nCoV/USA-CA5/2020
0202020-01-22MN994468.10.0000162019-nCoV/USA-CA2/2020
0212020-01-01NMDC60013002-090.000017BetaCoV/Wuhan/WH19004/2020
0222020-02-05MT066176.10.000018BetaCov/Taiwan/NTU02/2020
0232020-02-10LC528232.10.000020SARS-CoV-2/Hu/DP/Kng/19-020
0242020-02-10LC528233.10.000020SARS-CoV-2/Hu/DP/Kng/19-027
0252019-12-30MN996531.10.000020WIV07
0262019-12-30MN996529.10.000021WIV05
0272020-01-27MT044258.10.000022SARS-CoV-2/CA6/human/2020/USA
0282019-12-26LR757998.10.000023BetaCoV/Wuhan/WH-01/2019
0292019-12-30MN996527.10.000024WIV02
0302020-01-29MT027062.10.0000252019-nCoV/USA-CA3/2020
0312020-02-05MT123290.10.000026SARS-CoV-2/IQTC01/human/2020/CHN
0322020-01-25LC5219250.000027BetaCoV/Japan/AI/I-004/2020
0332020-01-29LC5229720.0000272019-nCoV/Japan/KY/V-029/2020
0342019-12-23MT019529.10.000027BetaCoV/Wuhan/IPBCAMS-WH-01/2019
0352020-02-28MT126808.10.000028SARS-CoV-2/SP02/human/2020/BRA
0362020-01-25MT007544.10.000030BetaCoV/Australia/VIC01/2020
0372020-01-17MT049951.10.000030SARS-CoV-2/Yunnan-01/human/2020/CHN
0382020-02-28MT123292.10.000031SARS-CoV-2/IQTC04/human/2020/CHN
0392020-01-29MT020781.10.000032BetaCoV/Finland/1/2020
0402020-02-06MT106052.10.0000322019-nCoV/USA-CA7/2020
0412020-01-26MT135041.10.000032SARS-CoV-2/105/human/2020/CHN
0422020-01-28MT135043.10.000032SARS-CoV-2/233/human/2020/CHN
0432020-01MN975262.10.0000332019-nCoV_HKU-SZ-005b_2020
0442020-03-05MT152824.10.000033SARS-CoV-2/WA2/human/2020/USA
0452020-02-11MT039888.10.0000342019-nCoV/USA-MA1/2020
0462020-01-31LC5229750.0000352019-nCoV/Japan/TY/WK-521/2020
0472020-01-29LC5229730.0000362019-nCoV/Japan/TY/WK-012/2020
0482020-01-31LC5229740.0000362019-nCoV/Japan/TY/WK-501/2020
0492020-01MN938384.10.0000362019-nCoV_HKU-SZ-002a_2020
0502020-01-19MN985325.10.0000362019-nCoV/USA-WA1/2020
0512020-01-22MN997409.10.0000362019-nCoV/USA-AZ1/2020
0522020-02-11MT106054.10.0000362019-nCoV/USA-TX1/2020
0532020-01-29MT123291.10.000036SARS-CoV-2/IQTC02/human/2020/CHN
0542020-02-28MT123293.10.000036SARS-CoV-2/IQTC03/human/2020/CHN
0552020-01-05LR757995.10.000037BetaCoV/Wuhan/WH-04/2019
0562020-02-01MT066175.10.000037Taiwan/NTU01/2020
0572020-01-23MN994467.10.0000382019-nCoV/USA-CA1/2020
0582020-01-28MT044257.10.000038SARS-CoV-2/IL2/human/2020/USA
0592020-02-07MT093571.10.000038SARS-CoV-2/01/human/2020/SWE
0602020-01-21MN988713.10.0000392019-nCoV/USA-IL1/2020
0612020-01MT039890.10.000040BetaCoV/Korea/SNU01/2020
0622019-12-31LR757997.10.001411BetaCoV/Wuhan/WH-02/2019
0632019-12-30NMDC60013002-050.002318BetaCoV/Wuhan/WH19002/2019
0642013-07-24GWHABKP000000000.126322Bat CoV TG13
0652019-03-01GWHABKW000000000.318494Pangolin-CoV-2020 MP789
0662018-08-13NC_004718.30.885159SARS CoV
0672018-08-13NC_014470.11.088135Bat CoV BM48-31/BGR/2008
0682018-08-13NC_035191.11.483474Wencheng Sm shrew CoV Xingguo-101
0692018-08-13NC_009019.12.026165Bat CoV HKU4-1
0702018-08-13NC_010646.12.170350Beluga Whale CoV SW1
0712018-08-13NC_016995.12.225972Wigeon CoV HKU20
0722018-08-13NC_001451.12.263777Avian infectious bronchitis virus
0732018-08-13NC_026011.12.265050BetaCoV HKU24 strain HKU24-R05005I
0742018-08-13NC_010800.12.317576Turkey CoV
0752018-08-24NC_039207.12.337043BetaCoV ErinaceusCoV/VMC/2012-174/GER/2012
0762018-08-13NC_025217.12.384179Bat Hp-betaCoV/Zhejiang2013
0772018-08-24NC_038294.12.434929BetaCoV England 1
0782018-08-13NC_019843.32.434930MERS Middle East respiratory syndrome CoV
0792018-08-13NC_017083.12.491178Rabbit CoV HKU14
0802018-08-13NC_016994.12.498432Night-heron CoV HKU19
0812018-08-13NC_003045.12.507057Bovine CoV
0822019-03-10NC_034440.12.542837Bat CoV PREDICT/PDF-2180
0832018-08-13NC_009020.12.551785Bat CoV HKU5-1
0842019-02-21NC_006213.12.589639Human CoV OC43 strain ATCC VR-759
0852018-08-13NC_016996.12.598443Common-moorhen CoV HKU21
0862018-08-24NC_011547.12.612548Bulbul CoV HKU11-934
0872018-08-13NC_006577.22.649716Human CoV HKU1
0882018-08-13NC_009021.12.778940Bat CoV HKU9-1
0892018-08-13NC_012936.12.785289Rat CoV Parker
0902018-08-13NC_028811.12.786411BtMr-AlphaCoV/SAX2011
0912018-08-13NC_001846.12.792354Mouse hepatitis virus strain MHV-A59 C12 mutant
0922018-08-13NC_016993.12.828026Magpie-robin CoV HKU18
0932018-08-13NC_018871.12.831499Rousettus bat CoV HKU10
0942018-08-13NC_030886.12.885630Rousettus bat CoV GCCDC1 356
0952018-08-13NC_016992.12.887272Sparrow CoV HKU17
0962018-08-24NC_039208.12.902341Porcine CoV HKU15 strain HKU15-155
0972018-08-13NC_011550.12.936050Munia CoV HKU13-3514
0982018-08-13NC_002645.12.983896Human CoV 229E
0992018-08-13NC_016991.13.002680White-eye CoV HKU16
1002018-08-13NC_034972.13.004210Coronavirus AcCoV-JC34
1012018-08-13NC_028752.13.008138Camel alphaCoV camel/Riyadh/Ry141/2015
1022018-08-13NC_005831.23.009141Human Coronavirus NL63
1032018-08-13NC_023760.13.023450Mink CoV strain WD1127
1042018-08-13NC_030292.13.041640Ferret CoV FRCoV-NL-2010
1052018-08-13NC_003436.13.068747Porcine epidemic diarrhea virus
1062018-08-13NC_009657.13.074701Scotophilus bat CoV 512
1072018-08-13NC_032107.13.086646NL63-related bat CoV strain BtKYNL63-9a
1082018-08-13NC_010438.13.120462Bat CoV HKU8
1092018-08-13NC_022103.13.126101Bat CoV CDPHE15/USA/2006
1102018-08-13NC_011549.13.154652Thrush CoV HKU12-600
1112018-08-13NC_028814.13.185708BtRf-AlphaCoV/HuB2013
1122018-08-13NC_032730.13.203052Lucheng Rn rat CoV Lucheng-19
1132018-08-13NC_009988.13.204418Bat CoV HKU2
1142018-08-13NC_028833.13.260958BtNv-AlphaCoV/SC2013
1152018-08-13NC_028806.13.346396Swine enteric CoV strain Italy/213306/2009
1162018-08-24NC_038861.13.359831Transmissible gastroenteritis virus
1172018-08-13NC_010437.13.404316Bat CoV 1A
1182018-08-13NC_002306.33.472931Feline infectious peritonitis virus
1192018-08-13NC_028824.13.505645BtRf-AlphaCoV/YN2012
\

\ The multiple alignment was constructed from the resulting\ pairwise alignments progressively aligned using\ multiz/autoMZ.\ The phylogenetic tree was calculated on 31mer frequency similarity\ and neighbor joining that distance matrix with the\ phylip toolset command:\ neighbor. The reference sequence NC_045512v2 is at the\ top of the tree:\

\
((((((((((((((((((((((((((((((((((((((((((((((((((((((NC_045512v2 (MN996528v1\
MT019532v1)) MN988668v1) LR757996v1) (MN996530v1 NMDC60013002_07)) MT039873v1)\
(MT106053v1 NMDC60013002_06)) MT019533v1) MT118835v1) MT019531v1)\
NMDC60013002_10) MT019530v1) MT072688v1) MT093631v1) MT039887v1) MT027064v1)\
MN994468v1) NMDC60013002_09) MT066176v1) (LC528232v1 LC528233v1)) MN996531v1)\
MN996529v1) MT044258v1) LR757998v1) MN996527v1) MT027062v1)\
(LC522972 MT123290v1)) MT019529v1) LC521925) MT126808v1)\
(((((((LC522973 LC522974) LC522975) (((LR757995v1 MT066175v1) MN985325v1)\
(MN938384v1 MN997409v1))) MN975262v1) MT106052v1) (MT135041v1 MT135043v1))\
MT049951v1)) MT007544v1) MT123292v1) MT020781v1) MT152824v1) MT039888v1)\
(MT123291v1 MT123293v1)) MT106054v1) (MN994467v1 MT044257v1)) MT093571v1)\
MN988713v1) MT039890v1) NMDC60013002_05) LR757997v1) GWHABKP00000000)\
GWHABKW00000000) NC_004718v3) NC_014470v1) NC_035191v1) (NC_009019v1\
((NC_009020v1 ((NC_019843v3 NC_038294v1) NC_034440v1)) NC_039207v1)))\
((NC_001451v1 NC_010800v1) ((((NC_011547v1 (NC_011549v1 NC_016991v1))\
((NC_011550v1 NC_016993v1) (NC_016992v1 NC_039208v1))) NC_016996v1)\
NC_016994v1))) NC_025217v1) (((((NC_001846v1 NC_012936v1) ((NC_003045v1\
NC_006213v1) NC_017083v1)) NC_026011v1) NC_006577v2) NC_010646v1))\
((((((NC_002306v3 (NC_028806v1 NC_038861v1)) NC_003436v1) (NC_023760v1\
NC_030292v1)) (((NC_002645v1 NC_028752v1) (NC_005831v2 NC_032107v1))\
(((((NC_009657v1 NC_010437v1) NC_010438v1) ((NC_009988v1 NC_028824v1)\
NC_028833v1)) NC_022103v1) (NC_018871v1 NC_028814v1)))) (NC_028811v1\
(NC_032730v1 NC_034972v1))) (NC_009021v1 NC_030886v1))) NC_016995v1)\
\ Framing tables from the genes were constructed to enable\ visualization of codons in the multiple alignment display.

\ \

Phylogenetic Tree Model

\

\ Both phastCons and phyloP are phylogenetic methods that rely\ on a tree model containing the tree topology, branch lengths representing\ evolutionary distance at neutrally evolving sites, the background distribution\ of nucleotides, and a substitution rate matrix.\ The\ all-species tree model for this track was\ generated using the phyloFit program from the PHAST package\ (REV model, EM algorithm, medium precision) using multiple alignments of\ 4-fold degenerate sites extracted from the 119-way alignment\ (msa_view). The 4d sites were derived from the NCBI gene set,\ filtered to select single-coverage long transcripts.

\

\ This same tree model was used in the phyloP calculations; however, the\ background frequencies were modified to maintain reversibility.\ The resulting tree model:\ all species.\

\ \

PhastCons Conservation

\

\ The phastCons program computes conservation scores based on a phylo-HMM, a\ type of probabilistic model that describes both the process of DNA\ substitution at each site in a genome and the way this process changes from\ one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and\ Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for\ conserved regions and a state for non-conserved regions. The value plotted\ at each site is the posterior probability that the corresponding alignment\ column was "generated" by the conserved state of the phylo-HMM. These\ scores reflect the phylogeny (including branch lengths) of the species in\ question, a continuous-time Markov model of the nucleotide substitution\ process, and a tendency for conservation levels to be autocorrelated along\ the genome (i.e., to be similar at adjacent sites). The general reversible\ (REV) substitution model was used. Unlike many conservation-scoring programs,\ phastCons does not rely on a sliding window\ of fixed size; therefore, short highly-conserved regions and long moderately\ conserved regions can both obtain high scores.\ More information about\ phastCons can be found in Siepel et al, 2005.

\

\ The phastCons parameters used were: expected-length=45,\ target-coverage=0.3, rho=0.3.

\ \

PhyloP Conservation

\

\ The phyloP program supports several different methods for computing\ p-values of conservation or acceleration, for individual nucleotides or\ larger elements (http://compgen.cshl.edu/phast/). Here it was used\ to produce separate scores at each base (--wig-scores option), considering\ all branches of the phylogeny rather than a particular subtree or lineage\ (i.e., the --subtree option was not used). The scores were computed by\ performing a likelihood ratio test at each alignment column (--method LRT),\ and scores for both conservation and acceleration were produced (--mode\ CONACC).

\ \

Conserved Elements

\

\ The conserved elements were predicted by running phastCons with the\ --most-conserved option. The predicted elements are segments of the alignment\ that are likely to have been "generated" by the conserved state of the\ phylo-HMM. Each element is assigned a log-odds score equal to its log\ probability under the conserved model minus its log probability under the\ non-conserved model. The "score" field associated with this track contains\ transformed log-odds scores, taking values between 0 and 1000. (The scores\ are transformed using a monotonic function of the form a * log(x) + b.) The\ raw log odds scores are retained in the "name" field and can be seen on the\ details page or in the browser when the track's display mode is set to\ "pack" or "full".

\ \

Credits

\

This track was created using the following programs:\

\

\ \

References

\ \

\ Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\ Fullah M, Dudas G et al.\ Genomic surveillance elucidates Ebola virus origin and transmission\ during the 2014 outbreak.\ Science 2014 Sep 12;345(6202):1369-72.\ PMID: 25214632;\ Supplemental Materials and Methods\

\ \

Phylo-HMMs, phastCons, and phyloP:

\

\ Felsenstein J, Churchill GA.\ A Hidden Markov Model approach to\ variation among sites in rate of evolution.\ Mol Biol Evol. 1996 Jan;13(1):93-104.\ PMID: 8583911\

\ \

\ Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A.\ \ Detection of nonneutral substitution rates on mammalian phylogenies.\ Genome Res. 2010 Jan;20(1):110-21.\ PMID: 19858363; PMC: PMC2798823\

\ \

\ Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K,\ Clawson H, Spieth J, Hillier LW, Richards S, et al.\ Evolutionarily conserved elements in vertebrate, insect, worm,\ and yeast genomes.\ Genome Res. 2005 Aug;15(8):1034-50.\ PMID: 16024819; PMC: PMC1182216\

\ \

\ Siepel A, Haussler D.\ Phylogenetic Hidden Markov Models.\ In: Nielsen R, editor. Statistical Methods in Molecular Evolution.\ New York: Springer; 2005. pp. 325-351.\

\ \

\ Yang Z.\ A space-time process model for the evolution of DNA\ sequences.\ Genetics. 1995 Feb;139(2):993-1005.\ PMID: 7713447; PMC: PMC1206396\

\ \

Chain/Net:

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron:\ duplication, deletion, and rearrangement in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.\ PMID: 14500911; PMC: PMC208784\

\ \

Multiz:

\

\ Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM,\ Baertsch R, Rosenbloom K, Clawson H, Green ED, et al.\ Aligning multiple genomic sequences with the threaded blockset aligner.\ Genome Res. 2004 Apr;14(4):708-15.\ PMID: 15060014; PMC: PMC383317\

\ \

Lastz (formerly Blastz):

\

\ Chiaromonte F, Yap VB, Miller W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput. 2002:115-26.\ PMID: 11928468\

\ \

\ Harris RS.\ Improved pairwise alignment of genomic DNA.\ Ph.D. Thesis. Pennsylvania State University, USA. 2007.\

\ \

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-mouse alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.\ PMID: 12529312; PMC: PMC430961\

\ compGeno 1 compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html cons119way\ longLabel Multiz Alignment & Conservation (119 strains: strains with vertebrate hosts and human SARS-Cov2)\ priority 2\ shortLabel 119 Vertebrate CoVs\ subGroup1 view Views align=Multiz_Alignments phyloP=Basewise_Conservation_(phyloP) phastcons=Element_Conservation_(phastCons) elements=Conserved_Elements\ track strainCons119way\ type bed 4\ visibility hide\ nextstrainSamplesAll All vcfTabix Mutations in Nextstrain Subset of GISAID EpiCov TM Samples 4 2 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples.vcf.gz\ hapClusterHeight 500\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain.nh\ longLabel Mutations in Nextstrain Subset of GISAID EpiCov TM Samples\ parent nextstrainSamplesViewAll off\ priority 2\ shortLabel All\ subGroups view=all\ track nextstrainSamplesAll\ variantNucMutsV2_B_1_1_7 Alpha Nuc Muts bigBed 4 Alpha VOC (B.1.1.7 UK Sep-2020) nucleotide mutations in 9838 GISAID sequences (Feb 5, 2021) 1 2 63 76 203 159 165 229 0 0 0 https://outbreak.info/situation-reports?pango=B.1.1.7 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantNucMuts_B.1.1.7_2021_02_05.bb\ color 63,76,203\ longLabel Alpha VOC (B.1.1.7 UK Sep-2020) nucleotide mutations in 9838 GISAID sequences (Feb 5, 2021)\ parent variantMuts off\ priority 111\ shortLabel Alpha Nuc Muts\ subGroups variant=A_B117 mutation=NUC designation=VOC\ track variantNucMutsV2_B_1_1_7\ url https://outbreak.info/situation-reports?pango=B.1.1.7\ urlLabel B.1.1.7 Situation Report at outbreak.info\ strainCons119wayViewphyloP Basewise Conservation (phyloP) bed 4 Multiz Alignment & Conservation (119 strains: strains with vertebrate hosts and human SARS-Cov2) 0 2 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (119 strains: strains with vertebrate hosts and human SARS-Cov2)\ parent strainCons119way\ shortLabel Basewise Conservation (phyloP)\ track strainCons119wayViewphyloP\ view phyloP\ viewLimits -4:5\ viewLimitsMax -7.405:20\ visibility hide\ C_bind_avg C_bind_avg bigWig DMS data for RBD Binding 1 2 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/C_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel C_bind_avg\ track C_bind_avg\ type bigWig\ visibility dense\ C_expr_avg_Expression C_expr_avg bigWig DMS data for RBD expression 1 2 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/C_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel C_expr_avg\ track C_expr_avg_Expression\ type bigWig\ visibility dense\ cd4Epitopes CD4+ T-Cell bigBed 9 + CD4+ T-Cell Epitopes 1 2 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/epitopes/cd4Epitopes.bb\ configurable off\ configureByPopup off\ longLabel CD4+ T-Cell Epitopes\ parent epitopes on\ priority 2\ shortLabel CD4+ T-Cell\ track cd4Epitopes\ visibility dense\ igm_COVID_13 COVID 13 bigBed 9 COVID 13 1 2 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_13.bb\ longLabel COVID 13\ parent igm on\ priority 2\ shortLabel COVID 13\ track igm_COVID_13\ type bigBed 9\ igg_Ctrl_LC168 Ctrl LC168 bigBed 9 Ctrl LC168 1 2 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_LC168.bb\ longLabel Ctrl LC168\ parent igg on\ priority 2\ shortLabel Ctrl LC168\ track igg_Ctrl_LC168\ type bigBed 9\ strainCons119wayViewphastcons Element Conservation (phastCons) bed 4 Multiz Alignment & Conservation (119 strains: strains with vertebrate hosts and human SARS-Cov2) 1 2 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (119 strains: strains with vertebrate hosts and human SARS-Cov2)\ parent strainCons119way\ shortLabel Element Conservation (phastCons)\ track strainCons119wayViewphastcons\ view phastcons\ visibility dense\ icshapeInvivo icSHAPE In-vivo bigWig icSHAPE In-vivo 2 2 100 50 20 177 152 137 0 0 0 rna 0 autoScale off\ bigDataUrl /gbdb/wuhCor1/bbi/icshape/invivo.bw\ color 100,50,20\ longLabel icSHAPE In-vivo\ maxHeightPixels 100:40:8\ parent icshape\ shortLabel icSHAPE In-vivo\ track icshapeInvivo\ type bigWig\ viewLimits -0.4:4.0\ visibility full\ mRNAs Known Transcripts (gRNA and mRNA) bigBed 12 Canonical Subgenomic Transcripts (gRNA and mRNA) 0 2 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/wuhCor1/bbi/kim2020/transcripts.bb\ longLabel Canonical Subgenomic Transcripts (gRNA and mRNA)\ parent transcriptome\ priority 2\ shortLabel Known Transcripts (gRNA and mRNA)\ track mRNAs\ type bigBed 12\ M2_library M2 SARS-CoV peptides bigBed 4 T-Cell reactive epitopes: M2 SARS-CoV peptides mapped to SARS-Cov-2 0 2 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/bbi/M2_peptides.bb\ longLabel T-Cell reactive epitopes: M2 SARS-CoV peptides mapped to SARS-Cov-2\ parent targets\ shortLabel M2 SARS-CoV peptides\ track M2_library\ type bigBed 4\ COV2-2082Total MAB COV2-2082 total bigWig Bloom antibody escape - Total Score - COV2-2082 1 2 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2082.tot.bw\ longLabel Bloom antibody escape - Total Score - COV2-2082\ parent bloomEscTotal on\ shortLabel MAB COV2-2082 total\ track COV2-2082Total\ type bigWig\ visibility dense\ kimMgiNc3p MGI NC-Brkpt 3' bigWig Kim et al. MGISEQ Noncanonical 3' Breakpoints 1 2 102 168 15 178 211 135 0 0 0 expression 0 alwaysZero on\ bigDataUrl /gbdb/wuhCor1/bbi/kim2020/kim-scv2-mgiseq-3p-breakpoints.bigWig\ color 102,168,15\ graphTypeDefault bar\ group expression\ longLabel Kim et al. MGISEQ Noncanonical 3' Breakpoints\ maxHeightPixels 48:48:11\ parent kimNp\ shortLabel MGI NC-Brkpt 3'\ smoothingWindow off\ track kimMgiNc3p\ transformFunc LOG\ type bigWig\ visibility dense\ windowingFunction maximum\ strainCons119wayViewalign Multiz Alignments bed 4 Multiz Alignment & Conservation (119 strains: strains with vertebrate hosts and human SARS-Cov2) 3 2 0 0 0 127 127 127 0 0 0 compGeno 1 longLabel Multiz Alignment & Conservation (119 strains: strains with vertebrate hosts and human SARS-Cov2)\ parent strainCons119way\ shortLabel Multiz Alignments\ track strainCons119wayViewalign\ view align\ viewUi on\ visibility pack\ IgM_Z-score-_COVID-19_patients_P10 P10 bigBed 9 P10 1 2 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgM_Z-score-_COVID-19_patients/P10.bb\ longLabel P10\ parent IgM_Z-score-_COVID-19_patients on\ shortLabel P10\ track IgM_Z-score-_COVID-19_patients_P10\ type bigBed 9\ P10 P10 bigBed 9 P10 1 2 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P10.bb\ longLabel P10\ parent IgG_Z-score-_COVID-19_patients on\ shortLabel P10\ track P10\ type bigBed 9\ PhyloCSFrejected PhyloCSF Rejected Genes bigGenePred Genes rejected from PhyloCSF genes list 0 2 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/wuhCor1/bbi/phyloGenes/PhyloCSFrejectedGenes.bb\ itemRgb on\ longLabel Genes rejected from PhyloCSF genes list\ parent phyloGenes off\ priority 10\ shortLabel PhyloCSF Rejected Genes\ track PhyloCSFrejected\ type bigGenePred\ visibility hide\ Positive_Selection Positive Selection bigBed 8 + Sites of positive selection implicated in data from Sergei Pond's research group 1 2 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/pond/pos.bb\ longLabel Sites of positive selection implicated in data from Sergei Pond's research group\ parent pond\ priority 1\ shortLabel Positive Selection\ track Positive_Selection\ type bigBed 8 +\ visibility dense\ unipCov2Chain Protein Products bigGenePred UniProt Protein Products (Polypeptide Chains, after cleavage) 3 2 0 2 80 127 128 167 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 baseColorDefault genomicCodons\ bigDataUrl /gbdb/wuhCor1/uniprot/unipChainCov2.bb\ color 0,2,80\ dataVersion /gbdb/$D/uniprot/version.txt\ exonNumbers off\ group uniprot\ html uniprotCov2\ itemRgb on\ longLabel UniProt Protein Products (Polypeptide Chains, after cleavage)\ mouseOverField comments\ priority 2\ shortLabel Protein Products\ track unipCov2Chain\ type bigGenePred\ urls uniProtId="http://www.uniprot.org/uniprot/$$#ptm_processing" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility pack\ Entropy Shannon Entropy bigWig Shannon Entropy derived from the in vivo SHAPE constrained Superfold structure prediction 2 2 0 0 0 127 127 127 0 0 0 rna 0 bigDataUrl /gbdb/wuhCor1/pyle/Full_Length_Shannon_Entropy.bw\ longLabel Shannon Entropy derived from the in vivo SHAPE constrained Superfold structure prediction\ parent pyle\ priority 2\ shortLabel Shannon Entropy\ track Entropy\ type bigWig\ viewLimits 0.0:1.0\ visibility full\ cpgIslandExtUnmasked Unmasked CpG bed 4 + CpG Islands on All Sequence (Islands < 300 Bases are Light Green) 0 2 0 100 0 128 228 128 0 0 0

Description

\ \

CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time,\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some other reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.

\ \

\ The unmasked version of the track displays potential CpG islands\ that exist in repeat regions and would otherwise not be visible\ in the repeat masked version.\

\ \

\ By default, only the masked version of the track is displayed. To view the\ unmasked version, change the visibility settings in the track controls at\ the top of this page.\

\ \

Methods

\ \

CpG islands were predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment was then\ evaluated for the following criteria:\ \

\

\

\ The entire genome sequence, masking areas included, was\ used for the construction of the track Unmasked CpG.\ The track CpG Islands is constructed on the sequence after\ all masked sequence is removed.\

\ \

The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length. The ratio of observed to expected \ CpG is calculated according to the formula (cited in \ Gardiner-Garden et al. (1987)):\ \

    Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)
\ \ where N = length of sequence.

\

\ The calculation of the track data is performed by the following command sequence:\

\
twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \\\
  | cpg_lh /dev/stdin 2> cpg_lh.err \\\
    |  awk '{$2 = $2 - 1; width = $3 - $2;  printf("%s\\t%d\\t%s\\t%s %s\\t%s\\t%s\\t%0.0f\\t%0.1f\\t%s\\t%s\\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \\\
     | sort -k1,1 -k2,2n > cpgIsland.bed\
\ The unmasked track data is constructed from\ twoBitToFa -noMask output for the twoBitToFa command.\

\ \

Data access

\

\ CpG islands and its associated tables can be explored interactively using the\ REST API, the\ Table Browser or the\ Data Integrator.\ All the tables can also be queried directly from our public MySQL\ servers, with more information available on our\ help page as well as on\ our blog.

\

\ The source for the cpg_lh program can be obtained from\ src/utils/cpgIslandExt/.\ The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file")\

\ \

Credits

\ \

This track was generated using a modification of a program developed by G. Miklem and L. Hillier \ (unpublished).

\ \

References

\ \

\ Gardiner-Garden M, Frommer M.\ \ CpG islands in vertebrate genomes.\ J Mol Biol. 1987 Jul 20;196(2):261-82.\ PMID: 3656447\

\ regulation 1 html cpgIslandSuper\ longLabel CpG Islands on All Sequence (Islands < 300 Bases are Light Green)\ parent cpgIslandSuper hide\ priority 2\ shortLabel Unmasked CpG\ track cpgIslandExtUnmasked\ varskipVsl1a VarSkip Long 1a bigBed 6 NEB VarSkip Long V1a 3 2 0 0 0 127 127 127 0 0 0 map 1 bigDataUrl /gbdb/wuhCor1/varskip/neb_vsl1a.primer.bb\ longLabel NEB VarSkip Long V1a\ parent varskip\ priority 2\ shortLabel VarSkip Long 1a\ track varskipVsl1a\ type bigBed 6\ visibility pack\ variantAaMutsV2_B_1_351 Beta AA Muts bigBed 4 Beta VOC (B.1.351 SA Dec-2020) amino acid mutations in 793 GISAID sequences (Feb 5, 2021) 1 3 73 42 181 164 148 218 0 0 0 https://outbreak.info/situation-reports?pango=B.1.351 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantAaMuts_B.1.351_2021_02_05.bb\ color 73,42,181\ longLabel Beta VOC (B.1.351 SA Dec-2020) amino acid mutations in 793 GISAID sequences (Feb 5, 2021)\ parent variantMuts off\ priority 2\ shortLabel Beta AA Muts\ subGroups variant=B_B1351 mutation=AA designation=VOC\ track variantAaMutsV2_B_1_351\ url https://outbreak.info/situation-reports?pango=B.1.351\ urlLabel B.1.351 Situation Report at outbreak.info\ cd8Epitopes CD8+ T-Cell bigBed 9 + CD8+ T-Cell Epitopes 1 3 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/epitopes/cd8Epitopes.bb\ configurable off\ configureByPopup off\ longLabel CD8+ T-Cell Epitopes\ parent epitopes on\ priority 3\ shortLabel CD8+ T-Cell\ track cd8Epitopes\ visibility dense\ igm_Ctrl_LC181 Ctrl LC181 bigBed 9 Ctrl LC181 1 3 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_LC181.bb\ longLabel Ctrl LC181\ parent igm on\ priority 3\ shortLabel Ctrl LC181\ track igm_Ctrl_LC181\ type bigBed 9\ igg_Ctrl_LC182 Ctrl LC182 bigBed 9 Ctrl LC182 1 3 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_LC182.bb\ longLabel Ctrl LC182\ parent igg on\ priority 3\ shortLabel Ctrl LC182\ track igg_Ctrl_LC182\ type bigBed 9\ D_bind_avg D_bind_avg bigWig DMS data for RBD Binding 1 3 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/D_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel D_bind_avg\ track D_bind_avg\ type bigWig\ visibility dense\ D_expr_avg_Expression D_expr_avg bigWig DMS data for RBD expression 1 3 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/D_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel D_expr_avg\ track D_expr_avg_Expression\ type bigWig\ visibility dense\ unipCov2Interest Highlights bigGenePred UniProt highlighted "Regions of Interest" 3 3 2 12 100 128 133 177 0 0 0

Description

\ \

\ This track shows protein sequence annotations defined as "regions of interest"\ from the UniProt/SwissProt database,\ mapped to genomic coordinates.\ The data has been curated from scientific publications by the UniProt/SwissProt staff.\

\ \ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name. \ A click on the item shows additional annotation deltails.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. \

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipInterestCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Alejo\ Mujica, Regeneron Pharmaceuticals. Thanks to UniProt for making all data\ available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/unipInterestCov2.bb\ color 2,12,100\ dataVersion /gbdb/$D/uniprot/version.txt\ exonNumbers off\ group uniprot\ itemRgb on\ longLabel UniProt highlighted "Regions of Interest"\ mouseOverField comments\ priority 3\ shortLabel Highlights\ track unipCov2Interest\ type bigGenePred\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility pack\ Kim_RNAs Kim recomb. trans. bigBed 12 . Subgenomic Trans.: All recombined transcripts from Kim et al. 2020 0 3 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/wuhCor1/bbi/kim2020/Kim_recombination.bb\ longLabel Subgenomic Trans.: All recombined transcripts from Kim et al. 2020\ parent kimTranscripts\ priority 3\ scoreFilter 900\ shortLabel Kim recomb. trans.\ track Kim_RNAs\ type bigBed 12 .\ COV2-2094Total MAB COV2-2094 total bigWig Bloom antibody escape - Total Score - COV2-2094 1 3 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2094.tot.bw\ longLabel Bloom antibody escape - Total Score - COV2-2094\ parent bloomEscTotal on\ shortLabel MAB COV2-2094 total\ track COV2-2094Total\ type bigWig\ visibility dense\ kimMgiNc5p MGI NC-Brkpt 5' bigWig Kim et al. MGISEQ Noncanonical 5' Breakpoints 1 3 102 168 15 178 211 135 0 0 0 genes 0 alwaysZero on\ bigDataUrl /gbdb/wuhCor1/bbi/kim2020/kim-scv2-mgiseq-5p-breakpoints.bigWig\ color 102,168,15\ graphTypeDefault bar\ longLabel Kim et al. MGISEQ Noncanonical 5' Breakpoints\ maxHeightPixels 48:48:11\ parent kimNp\ shortLabel MGI NC-Brkpt 5'\ smoothingWindow off\ track kimMgiNc5p\ transformFunc LOG\ type bigWig\ visibility dense\ windowingFunction maximum\ IgM_Z-score-_COVID-19_patients_P11 P11 bigBed 9 P11 1 3 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgM_Z-score-_COVID-19_patients/P11.bb\ longLabel P11\ parent IgM_Z-score-_COVID-19_patients on\ shortLabel P11\ track IgM_Z-score-_COVID-19_patients_P11\ type bigBed 9\ P11 P11 bigBed 9 P11 1 3 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P11.bb\ longLabel P11\ parent IgG_Z-score-_COVID-19_patients on\ shortLabel P11\ track P11\ type bigBed 9\ unipCov2LocSignal Signal Peptides bigGenePred UniProt Signal Peptides 0 3 255 0 150 255 127 202 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/unipLocSignalCov2.bb\ color 255,0,150\ dataVersion /gbdb/$D/uniprot/version.txt\ group uniprot\ html uniprotCov2\ itemRgb off\ longLabel UniProt Signal Peptides\ mouseOverField comments\ priority 3\ shortLabel Signal Peptides\ track unipCov2LocSignal\ type bigGenePred\ urls acc="http://www.uniprot.org/uniprot/$$" hgncId="https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$$"\ visibility hide\ varskip2 VarSkip 2 bigBed 6 NEB VarSkip 2 3 3 0 0 0 127 127 127 0 0 0 map 1 bigDataUrl /gbdb/wuhCor1/varskip/neb_vss2a.primer.bb\ longLabel NEB VarSkip 2\ parent varskip\ priority 3\ shortLabel VarSkip 2\ track varskip2\ type bigBed 6\ visibility pack\ transcriptome Subgenomic Canonical bigBed 12 Canonical Subgenomic Transcripts 0 3.1 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows predicted and experimental representations of the\ SARS-CoV-2 transcriptome based on long-read Nanopore sequencing.

\ \

Display Conventions and Configuration

\

\ SARS-CoV-2 generates sub-genomic mRNAs (sgmRNAs) for all ORFs. The virus\ achieves this by recombination mechanisms in which replication machinery\ jumps from one of many TRS-B site (transcription regulatory sequence, body) to\ the TRS-L (leader sequence) during negative strand synthesis.\ These negative strands are then used as templates for mRNA synthesis.

\ \

\ On these tracks we depict the predicted mRNAs with the excised sequence\ drawn like introns. The ORFs predicted to be translated by these mRNAs are\ shown in thick boxes. The thin bars function as UTRs for that particular mRNA\ species.

\ \

Multiple subtracks are available:

\ \ \

Methods

\ \ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\ \

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/kim2020/TRS.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

Credits

\

\ Thanks to Jason Fernandes (Haussler-lab, UCSC) for preparing this track.

\ \

References

\

\ Kim D, Lee JY, Yang JS, Kim JW, Kim VN, Chang H.\ \ The Architecture of SARS-CoV-2 Transcriptome.\ Cell. 2020 Apr 18;.\ PMID: 32330414; PMC: PMC7179501\

\ \ genes 1 compositeTrack on\ group genes\ longLabel Canonical Subgenomic Transcripts\ priority 3.1\ shortLabel Subgenomic Canonical\ track transcriptome\ type bigBed 12\ visibility hide\ kimNp Subgenomic Breakpts bigWig Subgenomic Transcript Breakpoints from Kim et al 2020: Nanopore and MGISeq 0 3.6 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the locations of RNA transcript breakpoints as determined by Nanopore and DNA Nanoball MGISEQ sequencing by\ \ Kim et al, Cell 2020.\

\ \

Display Conventions and Configuration

\

\ The height of the bars show the frequency of the transcript breakpoints.\ This track contains six subtracks, each of which can be hidden and modified in height\ and min/max settings for the bars by clicking its "Configure" link above. You can also configure \ all tracks together with the controls at the top of the track configuration page.\

\ \

Credits

\ Thanks to Hyeshik Chang for preparing and sharing custom tracks.\

\ \

References

\ \

\ \ The architecture of SARS-CoV-2 transcriptome.\ Cell 2020. pre-proof\ \

\ \ genes 0 autoScale on\ compositeTrack on\ group genes\ longLabel Subgenomic Transcript Breakpoints from Kim et al 2020: Nanopore and MGISeq\ priority 3.6\ shortLabel Subgenomic Breakpts\ track kimNp\ type bigWig\ visibility hide\ kimTranscripts Subgenomic Observed bigBed 12 Subgenomic Transcripts found in long-read sequences by Kim et al. 2020 0 3.8 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows predicted and experimental representations of the\ SARS-CoV-2 transcriptome based on long-read Nanopore sequencing.

\ \

Display Conventions and Configuration

\

\ SARS-CoV-2 generates sub-genomic mRNAs (sgmRNAs) for all ORFs. The virus\ achieves this by recombination mechanisms in which replication machinery\ jumps from one of many TRS-B site (transcription regulatory sequence, body) to\ the TRS-L (leader sequence) during negative strand synthesis.\ These negative strands are then used as templates for mRNA synthesis.

\ \

\ On these tracks, we depict the predicted mRNAs with the excised sequence\ drawn like introns. The ORFs predicted to be translated by these mRNAs are\ shown in thick boxes. The thin bars function as UTRs for that particular mRNA\ species.

\ \

Multiple subtracks are available:

\ \ \

Methods

\ \ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\ \

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/kim2020/Kim_TRS.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

Credits

\

\ Thanks to Jason Fernandes (Haussler-lab, UCSC) for preparing this track.

\ \

References

\

\ Kim D, Lee JY, Yang JS, Kim JW, Kim VN, Chang H.\ \ The Architecture of SARS-CoV-2 Transcriptome.\ Cell. 2020 Apr 18;.\ PMID: 32330414; PMC: PMC7179501\

\ \ genes 1 compositeTrack on\ group genes\ longLabel Subgenomic Transcripts found in long-read sequences by Kim et al. 2020\ priority 3.8\ shortLabel Subgenomic Observed\ track kimTranscripts\ type bigBed 12\ visibility hide\ kimIvtCov Nanopore coverage bigWig Nanopore coverage of in-vitro-transcribed RNA seq + PCR, Kim et al 2020 0 3.9 0 0 0 127 127 127 0 0 0

Description

\ \

This track shows the coverage of Nanopore sequences from \ Kim et al. 2020\ obtained after in-vitro reverse transcription and tiling PCR of SARS-CoV-2\ genomes. This is not direct RNA sequencing, but multiplex PCR on DNA, followed by\ sequencing. The coverage shown here does not allow to draw conclusions on RNA\ modifications or RNA editing, but indicates regions of the genome that are\ harder to sequence with Nanopore sequencing.\ \

Display Conventions and Configuration

\ \

\ Sequence coverage of every bp is shown. All reads were used.

\ \

Related tracks

\ \ \

Methods

\

\ Minimap2 alignments BAM files were processed with bamCoverage.\ \

References

\

\ Kim D, Lee JY, Yang JS, Kim JW, Kim VN, Chang H.\ \ The Architecture of SARS-CoV-2 Transcriptome.\ Cell. 2020 May 14;181(4):914-921.e10.\ PMID: 32330414; PMC: PMC7179501\

\ \ map 0 autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/IVT.bw\ group map\ longLabel Nanopore coverage of in-vitro-transcribed RNA seq + PCR, Kim et al 2020\ priority 3.9\ shortLabel Nanopore coverage\ track kimIvtCov\ type bigWig\ visibility hide\ kimRnaMod Subgenomic RNA Modif. bigBarChart Subgenomic RNA Modifications from Kim et al. 2020: gRNA S 3a E M 6 7a 7b 8 N 0 3.95 224 49 49 239 152 152 0 0 0

Description

\

\ This track shows the locations of RNA-modifications as determined by Nanopore sequencing \ \ Kim et al, Cell 2020.\

\ \

Display Conventions and Configuration

\

\ Very small tickmarks indicate the position of the RNA modifications on the\ genome. One has to zoom in to basepair level detail to see their exact extent. A\ small barchart over the tick indicates the fraction of transcripts that are modified\ relative to the un-modified transcripts. E.g. frac=0.3 means that 30% of the transcripts\ were modified. There is one barchart per transcript.\

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\ \

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/kim2020.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \ \ \

Credits

\

\ Thanks to Hyeshik Chang for preparing and sharing custom tracks.\

\ \

References

\ \

\ \ The architecture of SARS-CoV-2 transcriptome.\ Cell 2020. pre-proof\ \

\ \ genes 1 barChartBars gRNA S ORF3a E M ORF6 ORF7a ORF7b ORF8 N\ barChartMetric median\ barChartSizeWindows 500 8000\ barChartUnit frac\ bedNameLabel Position on genome\ bigDataUrl /gbdb/wuhCor1/bbi/kim2020/kim-scv2-drs-modifications.bb\ color 224,49,49\ group genes\ longLabel Subgenomic RNA Modifications from Kim et al. 2020: gRNA S 3a E M 6 7a 7b 8 N\ maxLimit 1\ priority 3.95\ shortLabel Subgenomic RNA Modif.\ track kimRnaMod\ type bigBarChart\ visibility hide\ strainPhyloP44way Bat PhyloP wig -11.968 4.256 44 Bat virus strains Basewise Conservation by PhyloP 0 4 60 60 140 140 60 60 0 0 0 compGeno 0 altColor 140,60,60\ autoScale off\ color 60,60,140\ configurable on\ longLabel 44 Bat virus strains Basewise Conservation by PhyloP\ maxHeightPixels 100:50:11\ noInherit on\ parent strainCons44wayViewphyloP on\ priority 4\ shortLabel Bat PhyloP\ spanList 1\ subGroups view=phyloP\ track strainPhyloP44way\ type wig -11.968 4.256\ viewLimits -11.968:4.256\ windowingFunction mean\ variantNucMutsV2_B_1_351 Beta Nuc Muts bigBed 4 Beta VOC (B.1.351 SA Dec-2020) nucleotide mutations in 793 GISAID sequences (Feb 5, 2021) 1 4 73 42 181 164 148 218 0 0 0 https://outbreak.info/situation-reports?pango=B.1.351 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantNucMuts_B.1.351_2021_02_05.bb\ color 73,42,181\ longLabel Beta VOC (B.1.351 SA Dec-2020) nucleotide mutations in 793 GISAID sequences (Feb 5, 2021)\ parent variantMuts off\ priority 112\ shortLabel Beta Nuc Muts\ subGroups variant=B_B1351 mutation=NUC designation=VOC\ track variantNucMutsV2_B_1_351\ url https://outbreak.info/situation-reports?pango=B.1.351\ urlLabel B.1.351 Situation Report at outbreak.info\ igg_Ctrl_LC169 Ctrl LC169 bigBed 9 Ctrl LC169 1 4 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_LC169.bb\ longLabel Ctrl LC169\ parent igg on\ priority 4\ shortLabel Ctrl LC169\ track igg_Ctrl_LC169\ type bigBed 9\ igm_Ctrl_LC175 Ctrl LC175 bigBed 9 Ctrl LC175 1 4 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_LC175.bb\ longLabel Ctrl LC175\ parent igm on\ priority 4\ shortLabel Ctrl LC175\ track igm_Ctrl_LC175\ type bigBed 9\ E_bind_avg E_bind_avg bigWig DMS data for RBD Binding 1 4 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/E_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel E_bind_avg\ track E_bind_avg\ type bigWig\ visibility dense\ E_expr_avg_Expression E_expr_avg bigWig DMS data for RBD expression 1 4 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/E_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel E_expr_avg\ track E_expr_avg_Expression\ type bigWig\ visibility dense\ unipCov2LocExtra Extracellular bigGenePred UniProt Extracellular Domain 0 4 0 150 255 127 202 255 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/unipLocExtraCov2.bb\ color 0,150,255\ dataVersion /gbdb/$D/uniprot/version.txt\ group uniprot\ html uniprotCov2\ itemRgb off\ longLabel UniProt Extracellular Domain\ priority 4\ shortLabel Extracellular\ track unipCov2LocExtra\ type bigGenePred\ urls acc="http://www.uniprot.org/uniprot/$$" hgncId="https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$$"\ visibility hide\ Kim_TRS_sgmRNAs Kim recomb. TRS trans. bigBed 12 . Subgenomic Trans.: Recombined transcripts from Kim et al. 2020 with a TRS 0 4 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/wuhCor1/bbi/kim2020/Kim_TRS.bb\ longLabel Subgenomic Trans.: Recombined transcripts from Kim et al. 2020 with a TRS\ parent kimTranscripts\ priority 4\ scoreFilter 900\ shortLabel Kim recomb. TRS trans.\ track Kim_TRS_sgmRNAs\ type bigBed 12 .\ COV2-2096Total MAB COV2-2096 total bigWig Bloom antibody escape - Total Score - COV2-2096 1 4 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2096.tot.bw\ longLabel Bloom antibody escape - Total Score - COV2-2096\ parent bloomEscTotal on\ shortLabel MAB COV2-2096 total\ track COV2-2096Total\ type bigWig\ visibility dense\ kimNpLdr3pBreak Nanop Ld2Bd 3' bigWig Nanopore Leader-to-body 3' Breakpoints 1 4 25 113 194 140 184 224 0 0 0 genes 0 alwaysZero on\ bigDataUrl /gbdb/wuhCor1/bbi/kim2020/kim-scv2-drs-leader-to-body-breakpoints.bigWig\ color 25,113,194\ graphTypeDefault bar\ longLabel Nanopore Leader-to-body 3' Breakpoints\ maxHeightPixels 48:48:11\ parent kimNp\ shortLabel Nanop Ld2Bd 3'\ smoothingWindow off\ track kimNpLdr3pBreak\ transformFunc LOG\ type bigWig\ visibility dense\ windowingFunction maximum\ IgM_Z-score-_COVID-19_patients_P15 P15 bigBed 9 P15 1 4 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgM_Z-score-_COVID-19_patients/P15.bb\ longLabel P15\ parent IgM_Z-score-_COVID-19_patients on\ shortLabel P15\ track IgM_Z-score-_COVID-19_patients_P15\ type bigBed 9\ P15 P15 bigBed 9 P15 1 4 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P15.bb\ longLabel P15\ parent IgG_Z-score-_COVID-19_patients on\ shortLabel P15\ track P15\ type bigBed 9\ strainPhyloP119way PhyloP wig -7.405 20 119 virus strains Basewise Conservation by PhyloP 0 4 60 60 140 140 60 60 0 0 0 compGeno 0 altColor 140,60,60\ autoScale off\ color 60,60,140\ configurable on\ longLabel 119 virus strains Basewise Conservation by PhyloP\ maxHeightPixels 100:50:11\ noInherit on\ parent strainCons119wayViewphyloP on\ priority 4\ shortLabel PhyloP\ spanList 1\ subGroups view=phyloP\ track strainPhyloP119way\ type wig -7.405 20\ viewLimits -7.405:20\ windowingFunction mean\ varskip2b VarSkip 2b bigBed 6 NEB VarSkip 2b (2a + spike-ins) 3 4 0 0 0 127 127 127 0 0 0 map 1 bigDataUrl /gbdb/wuhCor1/varskip/neb_vss2b.primer.bb\ longLabel NEB VarSkip 2b (2a + spike-ins)\ parent varskip\ priority 4\ shortLabel VarSkip 2b\ track varskip2b\ type bigBed 6\ visibility pack\ artic ARTIC Primers V3 bigBed 6 + ARTIC V3 Oxford Nanopore sequencing primers 0 5 0 0 0 127 127 127 0 0 0

Description

\ \

This track shows the primers for the \ ARTIC network SARS-CoV-2 sequencing protocol, Version 3.\

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of primers are highlighted. A click on them shows the primer pool.

\ \

Methods

\

\ Artic Network primer sequences were downloaded from the file\ Artic Github BED file and converted to bigBed.\

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/artic.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \ map 1 bigDataUrl /gbdb/wuhCor1/bbi/artic.bb\ group map\ longLabel ARTIC V3 Oxford Nanopore sequencing primers\ noScoreFilter on\ priority 5\ shortLabel ARTIC Primers V3\ track artic\ type bigBed 6 +\ visibility hide\ articV4 ARTIC Primers V4 bigBed 6 + ARTIC V4 Oxford Nanopore sequencing primers 0 5 0 0 0 127 127 127 0 0 0

Description

\ \

This track shows the primers for the\ ARTIC network SARS-CoV-2 sequencing protocol, Version 4 (June 21, 2021).\

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of primers are highlighted. A click on them shows the primer pool.

\ \

Methods

\

\ Artic Network primer sequences were downloaded from\ github (file\ SARS-CoV-2.primer.bed) and converted to bigBed.\

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/articV4.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \ map 1 bigDataUrl /gbdb/wuhCor1/bbi/articV4.bb\ group map\ longLabel ARTIC V4 Oxford Nanopore sequencing primers\ noScoreFilter on\ priority 5\ shortLabel ARTIC Primers V4\ track articV4\ type bigBed 6 +\ visibility hide\ articV4_1 ARTIC Primers V4.1 bigBed 6 + ARTIC V4.1 Oxford Nanopore sequencing primers 0 5 0 0 0 127 127 127 0 0 0

Description

\ \

This track shows the primers for the\ ARTIC network SARS-CoV-2 sequencing protocol,\ Version 4.1 (January 7, 2022), which restores the functionality of some primers against\ the Omicron variant.\

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of primers are highlighted. A click on them shows the primer pool.

\ \

Methods

\

\ Artic Network primer sequences were downloaded from\ Github (file\ SARS-CoV-2.primer.bed) and converted to bigBed.\

\ \

Data Access

\

\ The raw data can be explored interactively with the\ REST API,\ Table Browser, or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/articV4.1.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ map 1 bigDataUrl /gbdb/wuhCor1/bbi/articV4.1.bb\ group map\ longLabel ARTIC V4.1 Oxford Nanopore sequencing primers\ noScoreFilter on\ priority 5\ shortLabel ARTIC Primers V4.1\ track articV4_1\ type bigBed 6 +\ visibility hide\ igg_Ctrl_LC171 Ctrl LC171 bigBed 9 Ctrl LC171 1 5 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_LC171.bb\ longLabel Ctrl LC171\ parent igg on\ priority 5\ shortLabel Ctrl LC171\ track igg_Ctrl_LC171\ type bigBed 9\ igm_Ctrl_NC67 Ctrl NC67 bigBed 9 Ctrl NC67 1 5 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_NC67.bb\ longLabel Ctrl NC67\ parent igm on\ priority 5\ shortLabel Ctrl NC67\ track igm_Ctrl_NC67\ type bigBed 9\ F_bind_avg F_bind_avg bigWig DMS data for RBD Binding 1 5 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/F_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel F_bind_avg\ track F_bind_avg\ type bigWig\ visibility dense\ F_expr_avg_Expression F_expr_avg bigWig DMS data for RBD expression 1 5 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/F_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel F_expr_avg\ track F_expr_avg_Expression\ type bigWig\ visibility dense\ variantAaMutsV2_P_1 Gamma AA Muts bigBed 4 Gamma VOC (P.1 Brazil Nov-2020) amino acid mutations in 78 GISAID sequences (Feb 5, 2021) 1 5 66 113 206 160 184 230 0 0 0 https://outbreak.info/situation-reports?pango=P.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantAaMuts_P.1_2021_02_05.bb\ color 66,113,206\ longLabel Gamma VOC (P.1 Brazil Nov-2020) amino acid mutations in 78 GISAID sequences (Feb 5, 2021)\ parent variantMuts off\ priority 3\ shortLabel Gamma AA Muts\ subGroups variant=C_P1 mutation=AA designation=VOC\ track variantAaMutsV2_P_1\ url https://outbreak.info/situation-reports?pango=P.1\ urlLabel P.1 Situation Report at outbreak.info\ Kim_non-TRS_sgmRNAs Kim Recomb. Novel transcripts bigBed 12 . Recombined Subgenomic Trans.: Transcripts from Kim et al. 2020 without a TRS 0 5 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/wuhCor1/bbi/kim2020/Kim_notTRS.bb\ longLabel Recombined Subgenomic Trans.: Transcripts from Kim et al. 2020 without a TRS\ parent kimTranscripts\ priority 5\ scoreFilter 900\ shortLabel Kim Recomb. Novel transcripts\ track Kim_non-TRS_sgmRNAs\ type bigBed 12 .\ COV2-2165Total MAB COV2-2165 total bigWig Bloom antibody escape - Total Score - COV2-2165 1 5 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2165.tot.bw\ longLabel Bloom antibody escape - Total Score - COV2-2165\ parent bloomEscTotal on\ shortLabel MAB COV2-2165 total\ track COV2-2165Total\ type bigWig\ visibility dense\ kimNpNc3pBrk Nanop NC-Brkpt 3' bigWig Nanopore Noncanonical 3' Breakpoints 1 5 102 168 15 178 211 135 0 0 0 genes 0 alwaysZero on\ bigDataUrl /gbdb/wuhCor1/bbi/kim2020/kim-scv2-drs-3p-breakpoints.bigWig\ color 102,168,15\ graphTypeDefault bar\ longLabel Nanopore Noncanonical 3' Breakpoints\ maxHeightPixels 48:48:11\ parent kimNp\ shortLabel Nanop NC-Brkpt 3'\ smoothingWindow off\ track kimNpNc3pBrk\ transformFunc LOG\ type bigWig\ visibility dense\ windowingFunction maximum\ IgM_Z-score-_COVID-19_patients_P32 P32 bigBed 9 P32 1 5 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgM_Z-score-_COVID-19_patients/P32.bb\ longLabel P32\ parent IgM_Z-score-_COVID-19_patients on\ shortLabel P32\ track IgM_Z-score-_COVID-19_patients_P32\ type bigBed 9\ P32 P32 bigBed 9 P32 1 5 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P32.bb\ longLabel P32\ parent IgG_Z-score-_COVID-19_patients on\ shortLabel P32\ track P32\ type bigBed 9\ rapid RAPID/Midnight Primers bigBed 6 + RAPID/Midnight 1200bp amplicon Oxford Nanopore sequencing primers 0 5 0 0 0 127 127 127 0 0 0

Description

\ \

This track shows the primers for the \ RAPID \ SARS-CoV-2 sequencing protocol, also commonly referred to as Midnight.\ The primers enable amplification of the genome of SARS-CoV-2.\ This approach uses multiplexed 1200 base pair (bp) tiled amplicons. Briefly, two PCR \ reactions are performed for each SARS-CoV-2 positive patient sample to be sequenced. \ One PCR reaction contains thirty primers that generate the odd numbered \ amplicons ("Pool 1"), while the second PCR reaction contains twenty eight primers \ that generate the even numbered amplicons ("Pool 2"). After PCR, the two amplicon \ pools are combined and can be used for a range of downstream sequencing approaches. Primers \ were all designed using Primal Scheme\ and described in \ Nature Protocols 2017. This primer set results in amplicons that exhibit lower levels of \ variation in coverage compared to other commonly used primer sets.\

\ \

Display Conventions and Configuration

\

\ Genomic locations of primers are highlighted. A click on them shows the primer pool. \ This is one of the few tracks that may be best displayed in "full" mode.

\ \

Methods

\

\ RAPID primer sequences were downloaded from the\ Google Spreadsheet and converted to bigBed. More \ details are available in the paper referenced below or in the \ supplemental files on Zenodo.\

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/rapid.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Freed NE, Vlková M, Faisal MB, Silander OK.\ \ Rapid and inexpensive whole-genome sequencing of SARS-CoV-2 using 1200 bp tiled amplicons and\ Oxford Nanopore Rapid Barcoding.\ Biol Methods Protoc. 2020;5(1):bpaa014.\ PMID: 33029559; PMC: PMC7454405\

\ map 1 bigDataUrl /gbdb/wuhCor1/bbi/rapid.bb\ group map\ longLabel RAPID/Midnight 1200bp amplicon Oxford Nanopore sequencing primers\ noScoreFilter on\ priority 5\ shortLabel RAPID/Midnight Primers\ track rapid\ type bigBed 6 +\ visibility hide\ swift Swift Primers bigBed 6 + Swift BioSciences sequencing primers 0 5 0 0 0 127 127 127 0 0 0

Description

\ \

This track shows the primers for the \ Swift Amplicon ® SARS-CoV-2 Panel single-tube NGS assay:\

\ This kit leverages patented multiplex PCR technology, enabling library construction from\ 1st-strand or 2nd-strand cDNA using tiled primer pairs to target the entire 29.9 kb viral\ genome with a single pool of multiplexed primer pairs.\ Primers were designed against the NCBI Reference Sequence NC_045512.2 (Severe acute respiratory\ syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome).\ In silico analysis predicted zero off-target products from human host genome sequences.\
\

\ \

Display Conventions and Configuration

\

\ Genomic locations of primers are highlighted. A click on them shows the primer sequence. \ This is one of the few tracks that may be best displayed in "full" mode.

\ \

Methods

\

\ Primer sequences, names and genomic locations were\ downloaded from Swift\ and converted to bigBed. More details are available from\ Swift Biosciences.\

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/swift.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ map 1 bigDataUrl /gbdb/wuhCor1/bbi/swift.bb\ group map\ longLabel Swift BioSciences sequencing primers\ noScoreFilter on\ priority 5\ shortLabel Swift Primers\ track swift\ type bigBed 6 +\ visibility hide\ unipCov2LocTransMemb Transmem. Domains bigGenePred UniProt Transmembrane Domains 0 5 0 150 0 127 202 127 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/unipLocTransMembCov2.bb\ color 0,150,0\ dataVersion /gbdb/$D/uniprot/version.txt\ group uniprot\ html uniprotCov2\ itemRgb off\ longLabel UniProt Transmembrane Domains\ mouseOverField comments\ priority 5\ shortLabel Transmem. Domains\ track unipCov2LocTransMemb\ type bigGenePred\ urls acc="http://www.uniprot.org/uniprot/$$" hgncId="https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$$"\ visibility hide\ igm_COVID_404 COVID 404 bigBed 9 COVID 404 1 6 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_404.bb\ longLabel COVID 404\ parent igm on\ priority 6\ shortLabel COVID 404\ track igm_COVID_404\ type bigBed 9\ publicAnnots Crowd-Sourced Data bigBed 9 + Crowd-sourced data: annotations contributed via bit.ly/cov2annots 0 6 0 0 0 127 127 127 0 0 0

Description

\

\

This track shows annotations made via a public spreadsheet available at http://bit.ly/cov2annots.\ \

Originally, anyone could add annotations to this spreadsheet and they went live after 1-2 days.\ We switched off the automated updates from the spreadsheet to the track in mid-2021. To add changes to this track now, \ please contact us.\ \

Display Conventions and Configuration

\

\

Only start-end annotations can be shown. Contact us at genome-www@soe.ucsc.edu if you \ have feedback on the form, e.g. you need exon or intron lines or have datasets with \ more than 5-10 annotations.

\ map 1 bigDataUrl /gbdb/wuhCor1/bbi/public.bb\ filterType.extraField0 singleList\ filterValues.extraField0 genes,evolution,RNA,antibodies,CRISPR,primers,proteins\ group map\ longLabel Crowd-sourced data: annotations contributed via bit.ly/cov2annots\ mouseOverField extraField0\ priority 6\ shortLabel Crowd-Sourced Data\ track publicAnnots\ type bigBed 9 +\ igg_Ctrl_NC65 Ctrl NC65 bigBed 9 Ctrl NC65 1 6 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_NC65.bb\ longLabel Ctrl NC65\ parent igg on\ priority 6\ shortLabel Ctrl NC65\ track igg_Ctrl_NC65\ type bigBed 9\ unipCov2LocCytopl Cytoplasmic bigGenePred UniProt Cytoplasmic Domains 0 6 255 150 0 255 202 127 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/unipLocCytoplCov2.bb\ color 255,150,0\ dataVersion /gbdb/$D/uniprot/version.txt\ group uniprot\ html uniprotCov2\ itemRgb off\ longLabel UniProt Cytoplasmic Domains\ priority 6\ shortLabel Cytoplasmic\ track unipCov2LocCytopl\ type bigGenePred\ urls acc="http://www.uniprot.org/uniprot/$$" hgncId="https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$$"\ visibility hide\ G_bind_avg G_bind_avg bigWig DMS data for RBD Binding 1 6 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/G_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel G_bind_avg\ track G_bind_avg\ type bigWig\ visibility dense\ G_expr_avg_Expression G_expr_avg bigWig DMS data for RBD expression 1 6 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/G_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel G_expr_avg\ track G_expr_avg_Expression\ type bigWig\ visibility dense\ variantNucMutsV2_P_1 Gamma Nuc Muts bigBed 4 Gamma VOC (P.1 Brazil Nov-2020) nucleotide mutations in 78 GISAID sequences (Feb 5, 2021) 1 6 66 113 206 160 184 230 0 0 0 https://outbreak.info/situation-reports?pango=P.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantNucMuts_P.1_2021_02_05.bb\ color 66,113,206\ longLabel Gamma VOC (P.1 Brazil Nov-2020) nucleotide mutations in 78 GISAID sequences (Feb 5, 2021)\ parent variantMuts off\ priority 113\ shortLabel Gamma Nuc Muts\ subGroups variant=C_P1 mutation=NUC designation=VOC\ track variantNucMutsV2_P_1\ url https://outbreak.info/situation-reports?pango=P.1\ urlLabel P.1 Situation Report at outbreak.info\ COV2-2479Total MAB COV2-2479 total bigWig Bloom antibody escape - Total Score - COV2-2479 1 6 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2479.tot.bw\ longLabel Bloom antibody escape - Total Score - COV2-2479\ parent bloomEscTotal on\ shortLabel MAB COV2-2479 total\ track COV2-2479Total\ type bigWig\ visibility dense\ kimNp5pBreak Nanop NC-Brkpt 5' bigWig Nanopore Noncanonical 5' Breakpoints 1 6 240 140 0 247 197 127 0 0 0 genes 0 alwaysZero on\ bigDataUrl /gbdb/wuhCor1/bbi/kim2020/kim-scv2-drs-5p-breakpoints.bigWig\ color 240,140,0\ graphTypeDefault bar\ longLabel Nanopore Noncanonical 5' Breakpoints\ maxHeightPixels 48:48:11\ parent kimNp\ shortLabel Nanop NC-Brkpt 5'\ smoothingWindow off\ track kimNp5pBreak\ transformFunc LOG\ type bigWig\ visibility dense\ windowingFunction maximum\ IgM_Z-score-_COVID-19_patients_P33 P33 bigBed 9 P33 1 6 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgM_Z-score-_COVID-19_patients/P33.bb\ longLabel P33\ parent IgM_Z-score-_COVID-19_patients on\ shortLabel P33\ track IgM_Z-score-_COVID-19_patients_P33\ type bigBed 9\ P33 P33 bigBed 9 P33 1 6 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P33.bb\ longLabel P33\ parent IgG_Z-score-_COVID-19_patients on\ shortLabel P33\ track P33\ type bigBed 9\ pbm Antib Pept Array Antibody Proteome Peptide Binding Microarray Raw Data from Wang et al, ACS 2020, Xiaobo Yu group, NCPSB Beijing 0 7 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows intensities of a microarray spotted with short peptides\ derived from the entire proteome of SARS-CoV-2. Sera from 10 COVID-19\ patients (early stage) and 12 healthy controls were screened on the\ peptide microarray for both IgG and IgM responses. \ Note that the infections\ here were in the early stage, unlike the other microarray track shown on this genome browser.

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of peptides that were spotted on the array are highlighted. Because these peptides\ overlap, the tracks default to dense mode and the sequence is shown as labels drawn onto the\ rectangles, but only visible on high zoom levels. Put any track into pack mode to fully see all\ probe sequences.

\ \

\ The color is assigned based on the Z-Score, without any other normalization. Blue with decreasing\ intensity is assigned to the values -5 to 0, white is 0, and red colors with increasing intensity\ are used for the values 0-3.5, exactly as in the original publication figures.

\ \

\ There are also two wiggle/signal style tracks to summary the information, they\ show the sum of Z-scores across all peptides, as one score per nucleotide.\

\ \

Methods

\

\ Supplemental files were converted from Excel, rearranged and run through the command line script\ bigHeat to create a heatmap-like display, with multiplication factors of 2 for negative values,\ 0.285 for positive values. For better visibility, colormap seismic from matplotlib was used from 0.1\ to 0.9 and the range of values after multiplication were restricted to the limits -1 to 1 to address\ outliers. Like all tracks, the exact commands are documented in our\ makeDoc text files.\

\ \

Data Access

\

\ The raw data can be explored interactively with the Table Browser or combined\ with other datasets in the Data Integrator tool. For automated analysis,\ the genome annotation is stored in a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page. The tool can also be used to obtain features within a given\ range without downloading the file, for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P52.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our mailing list archives for questions, or our\ Data Access FAQ for more\ information.

\ \

References

\

\ Hongye Wang, Xian Wu, Xiaomei Zhang, Xin Hou, Te Liang, Dan Wang, Fei Teng, Jiayu Dai, Hu Duan,\ Shubin Guo, Yongzhe Li, and Xiaobo Yu\ SARS-CoV-2\ Proteome Microarray for Mapping COVID-19 Antibody Interactions at Amino Acid Resolution.\ ACS Central Science. 2020\ DOI: 10.1021/acscentsci.0c00742\

\ \ immu 0 group immu\ html pbm\ longLabel Antibody Proteome Peptide Binding Microarray Raw Data from Wang et al, ACS 2020, Xiaobo Yu group, NCPSB Beijing\ priority 7\ shortLabel Antib Pept Array\ superTrack on\ track pbm\ visibility hide\ bpmIggCovidSum Antib Pept Array Sum (IgG) bigWig Antibody Proteome Peptide Binding Microarray, Wang et al 2020 - IgG, Covid - Sum of scores per nucleotide 2 7 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows intensities of a microarray spotted with short peptides\ derived from the entire proteome of SARS-CoV-2. Sera from 10 COVID-19\ patients (early stage) and 12 healthy controls were screened on the\ peptide microarray for both IgG and IgM responses. \ Note that the infections\ here were in the early stage, unlike the other microarray track shown on this genome browser.

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of peptides that were spotted on the array are highlighted. Because these peptides\ overlap, the tracks default to dense mode and the sequence is shown as labels drawn onto the\ rectangles, but only visible on high zoom levels. Put any track into pack mode to fully see all\ probe sequences.

\ \

\ The color is assigned based on the Z-Score, without any other normalization. Blue with decreasing\ intensity is assigned to the values -5 to 0, white is 0, and red colors with increasing intensity\ are used for the values 0-3.5, exactly as in the original publication figures.

\ \

\ There are also two wiggle/signal style tracks to summary the information, they\ show the sum of Z-scores across all peptides, as one score per nucleotide.\

\ \

Methods

\

\ Supplemental files were converted from Excel, rearranged and run through the command line script\ bigHeat to create a heatmap-like display, with multiplication factors of 2 for negative values,\ 0.285 for positive values. For better visibility, colormap seismic from matplotlib was used from 0.1\ to 0.9 and the range of values after multiplication were restricted to the limits -1 to 1 to address\ outliers. Like all tracks, the exact commands are documented in our\ makeDoc text files.\

\ \

Data Access

\

\ The raw data can be explored interactively with the Table Browser or combined\ with other datasets in the Data Integrator tool. For automated analysis,\ the genome annotation is stored in a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page. The tool can also be used to obtain features within a given\ range without downloading the file, for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P52.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our mailing list archives for questions, or our\ Data Access FAQ for more\ information.

\ \

References

\

\ Hongye Wang, Xian Wu, Xiaomei Zhang, Xin Hou, Te Liang, Dan Wang, Fei Teng, Jiayu Dai, Hu Duan,\ Shubin Guo, Yongzhe Li, and Xiaobo Yu\ SARS-CoV-2\ Proteome Microarray for Mapping COVID-19 Antibody Interactions at Amino Acid Resolution.\ ACS Central Science. 2020\ DOI: 10.1021/acscentsci.0c00742\

\ \ immu 0 autoScale on\ bigDataUrl /gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/allSum.bw\ html pbm\ longLabel Antibody Proteome Peptide Binding Microarray, Wang et al 2020 - IgG, Covid - Sum of scores per nucleotide\ maxHeightPixels 100:30:8\ parent pbm\ shortLabel Antib Pept Array Sum (IgG)\ track bpmIggCovidSum\ type bigWig\ visibility full\ bpmIgmCovidSum Antib Pept Array Sum (IgM) bigWig Antibody Proteome Peptide Binding Microarray, Wang et al 2020 - IgM, Covid - Sum of scores per nucleotide 2 7 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows intensities of a microarray spotted with short peptides\ derived from the entire proteome of SARS-CoV-2. Sera from 10 COVID-19\ patients (early stage) and 12 healthy controls were screened on the\ peptide microarray for both IgG and IgM responses. \ Note that the infections\ here were in the early stage, unlike the other microarray track shown on this genome browser.

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of peptides that were spotted on the array are highlighted. Because these peptides\ overlap, the tracks default to dense mode and the sequence is shown as labels drawn onto the\ rectangles, but only visible on high zoom levels. Put any track into pack mode to fully see all\ probe sequences.

\ \

\ The color is assigned based on the Z-Score, without any other normalization. Blue with decreasing\ intensity is assigned to the values -5 to 0, white is 0, and red colors with increasing intensity\ are used for the values 0-3.5, exactly as in the original publication figures.

\ \

\ There are also two wiggle/signal style tracks to summary the information, they\ show the sum of Z-scores across all peptides, as one score per nucleotide.\

\ \

Methods

\

\ Supplemental files were converted from Excel, rearranged and run through the command line script\ bigHeat to create a heatmap-like display, with multiplication factors of 2 for negative values,\ 0.285 for positive values. For better visibility, colormap seismic from matplotlib was used from 0.1\ to 0.9 and the range of values after multiplication were restricted to the limits -1 to 1 to address\ outliers. Like all tracks, the exact commands are documented in our\ makeDoc text files.\

\ \

Data Access

\

\ The raw data can be explored interactively with the Table Browser or combined\ with other datasets in the Data Integrator tool. For automated analysis,\ the genome annotation is stored in a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page. The tool can also be used to obtain features within a given\ range without downloading the file, for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P52.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our mailing list archives for questions, or our\ Data Access FAQ for more\ information.

\ \

References

\

\ Hongye Wang, Xian Wu, Xiaomei Zhang, Xin Hou, Te Liang, Dan Wang, Fei Teng, Jiayu Dai, Hu Duan,\ Shubin Guo, Yongzhe Li, and Xiaobo Yu\ SARS-CoV-2\ Proteome Microarray for Mapping COVID-19 Antibody Interactions at Amino Acid Resolution.\ ACS Central Science. 2020\ DOI: 10.1021/acscentsci.0c00742\

\ \ immu 0 autoScale on\ bigDataUrl /gbdb/wuhCor1/pbm/IgM_Z-score-_COVID-19_patients/allSum.bw\ html pbm\ longLabel Antibody Proteome Peptide Binding Microarray, Wang et al 2020 - IgM, Covid - Sum of scores per nucleotide\ maxHeightPixels 100:30:8\ parent pbm\ shortLabel Antib Pept Array Sum (IgM)\ track bpmIgmCovidSum\ type bigWig\ visibility full\ igm_COVID_16 COVID 16 bigBed 9 COVID 16 1 7 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_16.bb\ longLabel COVID 16\ parent igm on\ priority 7\ shortLabel COVID 16\ track igm_COVID_16\ type bigBed 9\ igg_Ctrl_LC177 Ctrl LC177 bigBed 9 Ctrl LC177 1 7 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_LC177.bb\ longLabel Ctrl LC177\ parent igg on\ priority 7\ shortLabel Ctrl LC177\ track igg_Ctrl_LC177\ type bigBed 9\ variantAaMutsV2_B_1_617_2 Delta AA Muts bigBed 4 Delta VOC (B.1.617.2 India Oct-2020) amino acid mutations in 3000 GISAID sequences (Sep 10, 2021) 1 7 76 143 192 165 199 223 0 0 0 https://outbreak.info/situation-reports?pango=B.1.617.2 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantAaMuts_B.1.617.2_2021_09_10.bb\ color 76,143,192\ longLabel Delta VOC (B.1.617.2 India Oct-2020) amino acid mutations in 3000 GISAID sequences (Sep 10, 2021)\ parent variantMuts on\ priority 4\ shortLabel Delta AA Muts\ subGroups variant=D_B16172 mutation=AA designation=VOC\ track variantAaMutsV2_B_1_617_2\ url https://outbreak.info/situation-reports?pango=B.1.617.2\ urlLabel B.1.617.2 Situation Report at outbreak.info\ H_bind_avg H_bind_avg bigWig DMS data for RBD Binding 1 7 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/H_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel H_bind_avg\ track H_bind_avg\ type bigWig\ visibility dense\ H_expr_avg_Expression H_expr_avg bigWig DMS data for RBD expression 1 7 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/H_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel H_expr_avg\ track H_expr_avg_Expression\ type bigWig\ visibility dense\ IgG_Z-score-_COVID-19_patients IgG Z-score- early COVID-19 patients bed 9 Proteome Peptide Microarray - IgG - early COVID-19 patients 1 7 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows intensities of a microarray spotted with short peptides\ derived from the entire proteome of SARS-CoV-2. Sera from 10 COVID-19\ patients (early stage) and 12 healthy controls were screened on the\ peptide microarray for both IgG and IgM responses. \ Note that the infections\ here were in the early stage, unlike the other microarray track shown on this genome browser.

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of peptides that were spotted on the array are highlighted. Because these peptides\ overlap, the tracks default to dense mode and the sequence is shown as labels drawn onto the\ rectangles, but only visible on high zoom levels. Put any track into pack mode to fully see all\ probe sequences.

\ \

\ The color is assigned based on the Z-Score, without any other normalization. Blue with decreasing\ intensity is assigned to the values -5 to 0, white is 0, and red colors with increasing intensity\ are used for the values 0-3.5, exactly as in the original publication figures.

\ \

\ There are also two wiggle/signal style tracks to summary the information, they\ show the sum of Z-scores across all peptides, as one score per nucleotide.\

\ \

Methods

\

\ Supplemental files were converted from Excel, rearranged and run through the command line script\ bigHeat to create a heatmap-like display, with multiplication factors of 2 for negative values,\ 0.285 for positive values. For better visibility, colormap seismic from matplotlib was used from 0.1\ to 0.9 and the range of values after multiplication were restricted to the limits -1 to 1 to address\ outliers. Like all tracks, the exact commands are documented in our\ makeDoc text files.\

\ \

Data Access

\

\ The raw data can be explored interactively with the Table Browser or combined\ with other datasets in the Data Integrator tool. For automated analysis,\ the genome annotation is stored in a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page. The tool can also be used to obtain features within a given\ range without downloading the file, for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P52.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our mailing list archives for questions, or our\ Data Access FAQ for more\ information.

\ \

References

\

\ Hongye Wang, Xian Wu, Xiaomei Zhang, Xin Hou, Te Liang, Dan Wang, Fei Teng, Jiayu Dai, Hu Duan,\ Shubin Guo, Yongzhe Li, and Xiaobo Yu\ SARS-CoV-2\ Proteome Microarray for Mapping COVID-19 Antibody Interactions at Amino Acid Resolution.\ ACS Central Science. 2020\ DOI: 10.1021/acscentsci.0c00742\

\ \ immu 1 compositeTrack on\ html pbm\ itemRgb on\ labelOnFeature on\ longLabel Proteome Peptide Microarray - IgG - early COVID-19 patients\ parent pbm\ shortLabel IgG Z-score- early COVID-19 patients\ track IgG_Z-score-_COVID-19_patients\ type bed 9\ visibility dense\ IgM_Z-score-_COVID-19_patients IgM Z-score - early COVID-19 patients bed 9 Proteome Peptide Microarray - IgM - early COVID-19 patients 1 7 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows intensities of a microarray spotted with short peptides\ derived from the entire proteome of SARS-CoV-2. Sera from 10 COVID-19\ patients (early stage) and 12 healthy controls were screened on the\ peptide microarray for both IgG and IgM responses. \ Note that the infections\ here were in the early stage, unlike the other microarray track shown on this genome browser.

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of peptides that were spotted on the array are highlighted. Because these peptides\ overlap, the tracks default to dense mode and the sequence is shown as labels drawn onto the\ rectangles, but only visible on high zoom levels. Put any track into pack mode to fully see all\ probe sequences.

\ \

\ The color is assigned based on the Z-Score, without any other normalization. Blue with decreasing\ intensity is assigned to the values -5 to 0, white is 0, and red colors with increasing intensity\ are used for the values 0-3.5, exactly as in the original publication figures.

\ \

\ There are also two wiggle/signal style tracks to summary the information, they\ show the sum of Z-scores across all peptides, as one score per nucleotide.\

\ \

Methods

\

\ Supplemental files were converted from Excel, rearranged and run through the command line script\ bigHeat to create a heatmap-like display, with multiplication factors of 2 for negative values,\ 0.285 for positive values. For better visibility, colormap seismic from matplotlib was used from 0.1\ to 0.9 and the range of values after multiplication were restricted to the limits -1 to 1 to address\ outliers. Like all tracks, the exact commands are documented in our\ makeDoc text files.\

\ \

Data Access

\

\ The raw data can be explored interactively with the Table Browser or combined\ with other datasets in the Data Integrator tool. For automated analysis,\ the genome annotation is stored in a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page. The tool can also be used to obtain features within a given\ range without downloading the file, for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P52.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our mailing list archives for questions, or our\ Data Access FAQ for more\ information.

\ \

References

\

\ Hongye Wang, Xian Wu, Xiaomei Zhang, Xin Hou, Te Liang, Dan Wang, Fei Teng, Jiayu Dai, Hu Duan,\ Shubin Guo, Yongzhe Li, and Xiaobo Yu\ SARS-CoV-2\ Proteome Microarray for Mapping COVID-19 Antibody Interactions at Amino Acid Resolution.\ ACS Central Science. 2020\ DOI: 10.1021/acscentsci.0c00742\

\ \ immu 1 compositeTrack on\ html pbm\ itemRgb on\ labelOnFeature on\ longLabel Proteome Peptide Microarray - IgM - early COVID-19 patients\ parent pbm\ shortLabel IgM Z-score - early COVID-19 patients\ track IgM_Z-score-_COVID-19_patients\ type bed 9\ visibility dense\ COV2-2499Total MAB COV2-2499 total bigWig Bloom antibody escape - Total Score - COV2-2499 1 7 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2499.tot.bw\ longLabel Bloom antibody escape - Total Score - COV2-2499\ parent bloomEscTotal on\ shortLabel MAB COV2-2499 total\ track COV2-2499Total\ type bigWig\ visibility dense\ varskip NEB VarSkip Primers bed 6 New England Biolabs (NEB) VarSkip Primers 0 7 0 0 0 127 127 127 0 0 0

Description

\ \

This track shows the primers for the \ NEB VarSkip v1 and v2 sequencing primers.\

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of primers are highlighted. For primer tracks, the\ "full" visibility mode is often more suitable than the "pack" or "squish" display modes.

\ \

Methods

\

\ Primer sequences were downloaded from the \ NEB GitHub repository and converted to bigBed.\

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/artic.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \ map 1 compositeTrack on\ group map\ longLabel New England Biolabs (NEB) VarSkip Primers\ noScoreFilter on\ priority 7\ shortLabel NEB VarSkip Primers\ track varskip\ type bed 6\ visibility hide\ IgM_Z-score-_COVID-19_patients_P4 P4 bigBed 9 P4 1 7 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgM_Z-score-_COVID-19_patients/P4.bb\ longLabel P4\ parent IgM_Z-score-_COVID-19_patients on\ shortLabel P4\ track IgM_Z-score-_COVID-19_patients_P4\ type bigBed 9\ P4 P4 bigBed 9 P4 1 7 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P4.bb\ longLabel P4\ parent IgG_Z-score-_COVID-19_patients on\ shortLabel P4\ track P4\ type bigBed 9\ igm_COVID_502 COVID 502 bigBed 9 COVID 502 1 8 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_502.bb\ longLabel COVID 502\ parent igm on\ priority 8\ shortLabel COVID 502\ track igm_COVID_502\ type bigBed 9\ igg_Ctrl_LC181 Ctrl LC181 bigBed 9 Ctrl LC181 1 8 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_LC181.bb\ longLabel Ctrl LC181\ parent igg on\ priority 8\ shortLabel Ctrl LC181\ track igg_Ctrl_LC181\ type bigBed 9\ variantNucMutsV2_B_1_617_2 Delta Nuc Muts bigBed 4 Delta VOC (B.1.617.2 India Oct-2020) nucleotide mutations in 3000 GISAID sequences (Sep 10, 2021) 1 8 76 143 192 165 199 223 0 0 0 https://outbreak.info/situation-reports?pango=B.1.617.2 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantNucMuts_B.1.617.2_2021_09_10.bb\ color 76,143,192\ longLabel Delta VOC (B.1.617.2 India Oct-2020) nucleotide mutations in 3000 GISAID sequences (Sep 10, 2021)\ parent variantMuts off\ priority 114\ shortLabel Delta Nuc Muts\ subGroups variant=D_B16172 mutation=NUC designation=VOC\ track variantNucMutsV2_B_1_617_2\ url https://outbreak.info/situation-reports?pango=B.1.617.2\ urlLabel B.1.617.2 Situation Report at outbreak.info\ unipCov2DisulfBond Disulf. Bonds bigGenePred UniProt Disulfide Bonds 0 8 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/unipDisulfBondCov2.bb\ dataVersion /gbdb/$D/uniprot/version.txt\ exonNumbers off\ group uniprot\ html uniprotCov2\ itemRgb on\ longLabel UniProt Disulfide Bonds\ mouseOverField comments\ priority 8\ shortLabel Disulf. Bonds\ track unipCov2DisulfBond\ type bigGenePred\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ I_bind_avg I_bind_avg bigWig DMS data for RBD Binding 1 8 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/I_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel I_bind_avg\ track I_bind_avg\ type bigWig\ visibility dense\ I_expr_avg_Expression I_expr_avg bigWig DMS data for RBD expression 1 8 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/I_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel I_expr_avg\ track I_expr_avg_Expression\ type bigWig\ visibility dense\ COV2-2677Total MAB COV2-2677 total bigWig Bloom antibody escape - Total Score - COV2-2677 1 8 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2677.tot.bw\ longLabel Bloom antibody escape - Total Score - COV2-2677\ parent bloomEscTotal on\ shortLabel MAB COV2-2677 total\ track COV2-2677Total\ type bigWig\ visibility dense\ IgM_Z-score-_COVID-19_patients_P45 P45 bigBed 9 P45 1 8 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgM_Z-score-_COVID-19_patients/P45.bb\ longLabel P45\ parent IgM_Z-score-_COVID-19_patients on\ shortLabel P45\ track IgM_Z-score-_COVID-19_patients_P45\ type bigBed 9\ P45 P45 bigBed 9 P45 1 8 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P45.bb\ longLabel P45\ parent IgG_Z-score-_COVID-19_patients on\ shortLabel P45\ track P45\ type bigBed 9\ unipCov2Domain Protein Domains bigGenePred UniProt Domains 1 8 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/unipDomainCov2.bb\ dataVersion /gbdb/$D/uniprot/version.txt\ exonNumbers off\ group uniprot\ html uniprotCov2\ longLabel UniProt Domains\ mouseOverField comments\ priority 8\ shortLabel Protein Domains\ track unipCov2Domain\ type bigGenePred\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ pbmShanghai S Antib Pept Array S Protein Antibody Peptide Binding Microarray from Li et al, Cell & Mol Imm 2020, Sheng-ce Tao group, Jiao Tung Univ. 0 8 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows intensities of a microarray spotted with short peptides\ derived from the S protein. Fifty-five sera from convalescent COVID-19\ patients and 18 control sera were screened on the\ peptide microarray for both IgG and IgM responses. When comparing the microarry\ tracks, note that they are from patients at difference stages of the infection.\

\ \

\ A total of 211 peptides were synthesized and conjugated to\ BSA. The conjugates along with control proteins were prepared in triplicate at\ three dilutions.

\ \

Display Conventions and Configuration

\ \

This track contains two composite tracks, for IgG and IgM. It also contains\ two signal (wiggle) tracks, that show the "response frequency", roughly calculated as\ described in the paper, for IgG and IgM separately.\

\ \

\ Genomic locations of peptides that were spotted on the array are highlighted.\ Because these peptides overlap, the tracks default to dense mode and the\ sequence is shown as labels drawn onto the rectangles, but only visible on high\ zoom levels. Put any track into pack mode to fully see the sequence and also\ the triplicates. Since every peptide was spotted in three concentrations, every\ peptide is shown three times in pack mode.

\ \

\ The color is assigned based on the log of the fluorescent signal, with all negative values replaced\ by 0. The fluorescent signal was restricted to the range 0-15, scaled to 0-1.0 and\ the viridis color palette was used to assign a color intensity. This color\ palette is different from the one used in the original paper, to make the\ quantitative data easier to see.

\ \

\ The two signal tracks show the "response frequency", as defined in the paper.\ The response frequency is the share of Covid samples at a position that exceed\ the threshold mean(x)+3*stdev(y), with x being all values at a position and y\ being the negative controls. At positions where two peptides overlap, the two\ values are summed up, which means that the result on the Genome Browser can\ exceed 1.0.\

\ \

Methods

\

\ Supplemental files were received from the authors, converted from Excel,\ rearranged and run through the command line script bigHeat to create a\ heatmap-like display. 0.285 for positive values. For better visibility,\ colormap viridis from matplotlib was used. Like all tracks, the exact commands\ are documented in our makeDoc text files.\

\ \

Data Access

\

\ The raw data can be explored interactively with the Table Browser or combined\ with other datasets in the Data Integrator tool. For automated analysis,\ the genome annotation is stored in a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page. Please refer to our mailing list archives for questions, or our\ Data Access FAQ for more\ information.

\ \

References

\

\ Li Y, Lai DY, Zhang HN, Jiang HW, Tian X, Ma ML, Qi H, Meng QF, Guo SJ, Wu Y et al.\ \ Linear epitopes of SARS-CoV-2 spike protein elicit neutralizing antibodies in COVID-19 patients.\ Cell Mol Immunol. 2020 Oct;17(10):1095-1097.\ PMID: 32895485; PMC: PMC7475724\

\ immu 0 group immu\ longLabel S Protein Antibody Peptide Binding Microarray from Li et al, Cell & Mol Imm 2020, Sheng-ce Tao group, Jiao Tung Univ.\ priority 8\ shortLabel S Antib Pept Array\ superTrack on\ track pbmShanghai\ visibility hide\ igg S-PBM IgG bed 9 S Protein Antibody Peptide Binding Microarray - IgG - Sheng-ce Tao group, Jiao Tung Univ. 1 8 0 0 0 127 127 127 0 0 0 immu 1 compositeTrack on\ itemRgb on\ labelOnFeature on\ longLabel S Protein Antibody Peptide Binding Microarray - IgG - Sheng-ce Tao group, Jiao Tung Univ.\ parent pbmShanghai\ shortLabel S-PBM IgG\ track igg\ type bed 9\ visibility dense\ igm S-PBM IgM bed 9 S Protein Antibody Peptide Binding Microarray - IgM - Sheng-ce Tao group, Jiao Tung Univ. 1 8 0 0 0 127 127 127 0 0 0 immu 1 compositeTrack on\ itemRgb on\ labelOnFeature on\ longLabel S Protein Antibody Peptide Binding Microarray - IgM - Sheng-ce Tao group, Jiao Tung Univ.\ parent pbmShanghai\ shortLabel S-PBM IgM\ track igm\ type bed 9\ visibility dense\ iggAllSum S-PBM: IgG Response Frequency bigWig S Protein Antibody Peptide Binding Microarray - IgG - Response Frequency 2 8 0 0 0 127 127 127 0 0 0 immu 0 autoScale on\ bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/allSum.bw\ longLabel S Protein Antibody Peptide Binding Microarray - IgG - Response Frequency\ maxHeightPixels 100:24:8\ parent pbmShanghai\ shortLabel S-PBM: IgG Response Frequency\ track iggAllSum\ type bigWig\ visibility full\ igmAllSum S-PBM: IgM Response Frequency bigWig S Protein Antibody Peptide Binding Microarray - IgM - Response Frequency 2 8 0 0 0 127 127 127 0 0 0 immu 0 autoScale on\ bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/allSum.bw\ longLabel S Protein Antibody Peptide Binding Microarray - IgM - Response Frequency\ maxHeightPixels 100:24:8\ parent pbmShanghai\ shortLabel S-PBM: IgM Response Frequency\ track igmAllSum\ type bigWig\ visibility full\ igm_COVID_15 COVID 15 bigBed 9 COVID 15 1 9 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_15.bb\ longLabel COVID 15\ parent igm on\ priority 9\ shortLabel COVID 15\ track igm_COVID_15\ type bigBed 9\ igg_Ctrl_NC96 Ctrl NC96 bigBed 9 Ctrl NC96 1 9 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_NC96.bb\ longLabel Ctrl NC96\ parent igg on\ priority 9\ shortLabel Ctrl NC96\ track igg_Ctrl_NC96\ type bigBed 9\ unipCov2Modif Glycosyl/Phosph. bigGenePred UniProt Amino Acid Glycosylation/Phosphorylation sites 0 9 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/unipModifCov2.bb\ dataVersion /gbdb/$D/uniprot/version.txt\ group uniprot\ html uniprotCov2\ itemRgb on\ longLabel UniProt Amino Acid Glycosylation/Phosphorylation sites\ mouseOverField comments\ priority 9\ shortLabel Glycosyl/Phosph.\ track unipCov2Modif\ type bigGenePred\ urls uniProtId="http://www.uniprot.org/uniprot/$$#aaMod_section" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ K_bind_avg K_bind_avg bigWig DMS data for RBD Binding 1 9 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/K_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel K_bind_avg\ track K_bind_avg\ type bigWig\ visibility dense\ K_expr_avg_Expression K_expr_avg bigWig DMS data for RBD expression 1 9 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/K_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel K_expr_avg\ track K_expr_avg_Expression\ type bigWig\ visibility dense\ COV2-2832Total MAB COV2-2832 total bigWig Bloom antibody escape - Total Score - COV2-2832 1 9 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2832.tot.bw\ longLabel Bloom antibody escape - Total Score - COV2-2832\ parent bloomEscTotal on\ shortLabel MAB COV2-2832 total\ track COV2-2832Total\ type bigWig\ visibility dense\ variantAaMuts_B_1_1_529 Omicron BA.1 AA Muts bigBed 4 Omicron (BA.1 SA Nov-2021) amino acid mutations from cov-lineages.org (Nov 2021) 1 9 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/B.1.1.529_prot.bb\ color 219,40,35\ longLabel Omicron (BA.1 SA Nov-2021) amino acid mutations from cov-lineages.org (Nov 2021)\ parent variantMuts off\ priority 11\ shortLabel Omicron BA.1 AA Muts\ subGroups variant=J_BA1 mutation=AA designation=VOC\ track variantAaMuts_B_1_1_529\ url https://outbreak.info/situation-reports?pango=BA.1\ urlLabel BA.1 Situation Report at outbreak.info\ IgM_Z-score-_COVID-19_patients_P52 P52 bigBed 9 P52 1 9 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgM_Z-score-_COVID-19_patients/P52.bb\ longLabel P52\ parent IgM_Z-score-_COVID-19_patients on\ shortLabel P52\ track IgM_Z-score-_COVID-19_patients_P52\ type bigBed 9\ P52 P52 bigBed 9 P52 1 9 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P52.bb\ longLabel P52\ parent IgG_Z-score-_COVID-19_patients on\ shortLabel P52\ track P52\ type bigBed 9\ nextstrainFreq19A 19A bigWig Nextstrain, 19A clade: Alternate allele frequency 1 10 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples19A.bigWig\ longLabel Nextstrain, 19A clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 10\ shortLabel 19A\ subGroups view=newClades\ track nextstrainFreq19A\ type bigWig\ visibility dense\ nextstrainSamples19A 19A Mutations vcfTabix Mutations in Clade 19A Nextstrain Subset of GISAID EpiCoV TM Samples 0 10 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples19A.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain19A.nh\ longLabel Mutations in Clade 19A Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades\ priority 10\ shortLabel 19A Mutations\ subGroups view=newClades\ track nextstrainSamples19A\ galaxyEnaQ1Ay-4 AY.4 mutations bigBed 8 + Mutations (amino acid level) in AY.4 between 2021-10-05 and 2022-01-05 1 10 89 30 113 172 142 184 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q1/00_AY.4_data.bb\ color 89,30,113\ html galaxyEna\ longLabel Mutations (amino acid level) in AY.4 between 2021-10-05 and 2022-01-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q1_tracks on\ priority 10\ shortLabel AY.4 mutations\ spectrum on\ track galaxyEnaQ1Ay-4\ type bigBed 8 +\ galaxyEnaQ3Ay-4 AY.4 mutations bigBed 8 + Mutations (amino acid level) in AY.4 between 2021-04-05 and 2021-07-05 1 10 89 30 113 172 142 184 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q3/00_AY.4_data.bb\ color 89,30,113\ html galaxyEna\ longLabel Mutations (amino acid level) in AY.4 between 2021-04-05 and 2021-07-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q3_tracks on\ priority 10\ shortLabel AY.4 mutations\ spectrum on\ track galaxyEnaQ3Ay-4\ type bigBed 8 +\ galaxyEnaQ2Ay-4 AY.4 mutations bigBed 8 + Mutations (amino acid level) in AY.4 between 2021-07-05 and 2021-10-05 1 10 89 30 113 172 142 184 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q2/00_AY.4_data.bb\ color 89,30,113\ html galaxyEna\ longLabel Mutations (amino acid level) in AY.4 between 2021-07-05 and 2021-10-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q2_tracks on\ priority 10\ shortLabel AY.4 mutations\ spectrum on\ track galaxyEnaQ2Ay-4\ type bigBed 8 +\ galaxyEnaQ0Ba-2 BA.2 mutations bigBed 8 + Mutations (amino acid level) in BA.2 between 2022-01-05 and 2022-03-12 3 10 0 99 116 127 177 185 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q0/00_BA.2_data.bb\ color 0,99,116\ html galaxyEna\ longLabel Mutations (amino acid level) in BA.2 between 2022-01-05 and 2022-03-12\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q0_tracks on\ priority 10\ shortLabel BA.2 mutations\ spectrum on\ track galaxyEnaQ0Ba-2\ type bigBed 8 +\ Calu3_07hpi Calu3 7hpi bigWig Calu3 7hpi Ribo-seq and RNA-seq 0 10 0 0 0 127 127 127 0 0 0

Description

\

\ The Weizman ORFs (Open Reading Frames) track shows previously unannotated ORF\ predictions based on Ribo-Seq and RNA-seq data. It is a collection of\ tracks (super track) \ that contains not only the predicted gene models, but also\ data supporting them.

\ \

Display Conventions and Configuration

\ The Predicted ORFs track shows the predicted exons. All other tracks show the signal as \ a x-y plot with bars.\ \

Methods

\

\ Methods from Finkel et al:

\

\ To capture the full SARS-CoV-2 coding capacity, we applied a suite of ribosome\ profiling approaches to Vero cells infected with SARS-CoV-2 for 5 and 24 hours,\ and Calu3 cells infected for 7 hours. For each time point we prepared three\ different ribosome-profiling libraries, each one in two biological replicates.\ Two Ribo-seq libraries facilitate mapping of translation initiation sites, by\ treating cells with lactimidomycin (LTM) or harringtonine (Harr), two drugs\ with distinct mechanisms that prevent 80S ribosomes at translation initiation\ sites from elongating. The third Ribo-seq library was prepared from cells\ treated with the translation elongation inhibitor cycloheximide (CHX), and\ gives a snap-shot of actively translating ribosomes across the body of the\ translated ORF. In parallel, RNA-sequencing was applied to map viral\ transcripts.

\

\ The ORF prediction was done by using two computational tools, PRICE and\ ORF-RATER, that rely on different features of ribosome profiling data, and by\ manual inspection of the data. The predictions are based on Ribo-seq libraries\ from two time points (5 and 7 hpi) of two different cell lines (Vero E6 and\ Calu3 cells), infected with separate virus isolates.

\

\ The Ribo-Seq data of the 24 hours samples do not show the expected profile of\ read distribution on viral genes and therefore were not used for the procedure\ of ORF predictions.

\

For more details see the paper in the References section below.

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Finkel Y, Mizrahi O, Nachshon A, Weingarten-Gabbay S, Morgenstern D, Yahalom-Ronen Y, Tamir H,\ Achdout H, Stein D, Israeli O et al.\ \ The coding capacity of SARS-CoV-2.\ Nature. 2020 Sep 9;.\ PMID: 32906143\

\ \ genes 0 compositeTrack on\ group genes\ html weizmanOrfs\ longLabel Calu3 7hpi Ribo-seq and RNA-seq\ parent weizmanOrfs off\ priority 10\ shortLabel Calu3 7hpi\ track Calu3_07hpi\ type bigWig\ visibility hide\ igm_COVID_416 COVID 416 bigBed 9 COVID 416 1 10 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_416.bb\ longLabel COVID 416\ parent igm on\ priority 10\ shortLabel COVID 416\ track igm_COVID_416\ type bigBed 9\ igg_Ctrl_NC66 Ctrl NC66 bigBed 9 Ctrl NC66 1 10 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_NC66.bb\ longLabel Ctrl NC66\ parent igg on\ priority 10\ shortLabel Ctrl NC66\ track igg_Ctrl_NC66\ type bigBed 9\ Q0_tracks Galaxy ENA mutations in top lineages - current quarter bigBed 8 + Most frequent lineages of current quarter 3 10 0 0 0 127 127 127 0 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 allButtonPair on\ compositeTrack on\ filter.withinLineageFrequency 0.05\ filterByRange.withinLineageFrequency on\ filterLimits.withinLineageFrequency 0:2\ filterType.countries multipleListOr\ filterValues.countries EE|Estonia,GB|United Kingdom,GR|Greece,IE|Ireland,ZA|South Africa\ filterValuesDefault.countries EE,GB,GR,ZA\ html galaxyEna\ longLabel Most frequent lineages of current quarter\ parent galaxyEna\ priority 10\ shortLabel Galaxy ENA mutations in top lineages - current quarter\ track Q0_tracks\ type bigBed 8 +\ visibility pack\ L_bind_avg L_bind_avg bigWig DMS data for RBD Binding 1 10 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/L_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel L_bind_avg\ track L_bind_avg\ type bigWig\ visibility dense\ L_expr_avg_Expression L_expr_avg bigWig DMS data for RBD expression 1 10 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/L_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel L_expr_avg\ track L_expr_avg_Expression\ type bigWig\ visibility dense\ LY-CoV016Total MAB LY-CoV016 total bigWig Bloom antibody escape - Total Score - LY-CoV016 1 10 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/LY-CoV016.tot.bw\ longLabel Bloom antibody escape - Total Score - LY-CoV016\ parent bloomEscTotal on\ shortLabel MAB LY-CoV016 total\ track LY-CoV016Total\ type bigWig\ visibility dense\ problematicSitesMask Mask bigBed Problematic sites where masking is recommended for analysis 0 10 255 0 0 255 127 127 0 0 0 map 1 bigDataUrl /gbdb/wuhCor1/problematicSites/problematicSitesMask.bb\ color 255,0,0\ longLabel Problematic sites where masking is recommended for analysis\ parent problematicSites\ priority 10\ shortLabel Mask\ track problematicSitesMask\ type bigBed\ sarsCov2PhyloPubAllMinAf01 Min AF 1% vcfTabix Nucleotide Substitution Mutations with Alternate Allele Frequency >= 1% in Public Sequences 0 10 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/sarsCov2PhyloPub/public.all.minAf.01.vcf.gz\ longLabel Nucleotide Substitution Mutations with Alternate Allele Frequency >= 1% in Public Sequences\ parent sarsCov2PhyloPub on\ priority 10\ shortLabel Min AF 1%\ track sarsCov2PhyloPubAllMinAf01\ type vcfTabix\ sarsCov2PhyloMinAf01 Min alt AF 1% vcfTabix Nucleotide Substitution Mutations with Alternate Allele Frequency >= 1% in GISAID EpiCov TM Sequences 0 10 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/sarsCov2Phylo/gisaid.minAf.01.vcf.gz\ longLabel Nucleotide Substitution Mutations with Alternate Allele Frequency >= 1% in GISAID EpiCov TM Sequences\ parent sarsCov2Phylo on\ priority 10\ shortLabel Min alt AF 1%\ track sarsCov2PhyloMinAf01\ type vcfTabix\ unipCov2Mut Mutations bigBed 12 + UniProt Amino Acid Mutations 1 10 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/unipMutCov2.bb\ group uniprot\ html uniprotCov2\ longLabel UniProt Amino Acid Mutations\ mouseOverField comments\ priority 10\ shortLabel Mutations\ track unipCov2Mut\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#pathology_and_biotech" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$" variationId="http://www.uniprot.org/uniprot/$$"\ visibility dense\ variantNucMuts_B_1_1_529 Omicron BA.1 Nuc Muts bigBed 4 Omicron VOC (BA.1 SA Nov-2021) nucleotide mutations identifed from GISAID sequences (Nov 2021) 1 10 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/B.1.1.529_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (BA.1 SA Nov-2021) nucleotide mutations identifed from GISAID sequences (Nov 2021)\ parent variantMuts off\ priority 121\ shortLabel Omicron BA.1 Nuc Muts\ subGroups variant=J_BA1 mutation=NUC designation=VOC\ track variantNucMuts_B_1_1_529\ url https://outbreak.info/situation-reports?pango=BA.1\ urlLabel BA.1 Situation Report at outbreak.info\ IgM_Z-score-_COVID-19_patients_P6 P6 bigBed 9 P6 1 10 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgM_Z-score-_COVID-19_patients/P6.bb\ longLabel P6\ parent IgM_Z-score-_COVID-19_patients on\ shortLabel P6\ track IgM_Z-score-_COVID-19_patients_P6\ type bigBed 9\ P6 P6 bigBed 9 P6 1 10 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbm/IgG_Z-score-_COVID-19_patients/P6.bb\ longLabel P6\ parent IgG_Z-score-_COVID-19_patients on\ shortLabel P6\ track P6\ type bigBed 9\ gordon2 Protein Interact. bigBed 5 + Human Interacting Proteins from Gordon et al. (* = druggable) 0 10 1 50 32 128 152 143 1 0 0

Description

\

\ This track shows data from Gordon et al, 2020 "A\ SARS-CoV-2-Human Protein-Protein Interaction Map Reveals Drug Targets and\ Potential Drug-Repurposing".\ \

\ The authors cloned, tagged, and expressed 26\ of the mature proteins expressed by SARS-CoV-2 in 293 human cells and used\ affinity purification mass spectrometry (AP-MS) to identify human proteins that\ interact with viral proteins.

332 high confidence interactions are reported\ betwen human and viral proteins. \ \

Display Conventions and Configuration

\

\ On the viral genome the coordinates of the viral protein are marked and labeled with the \ name of the human interactor.

\

\ The bed file also includes the MIST score * 1000. \ A MIST score of 1 is a highly reproducible and specific interaction, scores \ are multipled by 1000 for display purposes.

\

\ The set of interactions displayed here includes only the 332 "high confidence" \ interactions that met criteria for significance (using a MIST cutoff >= 0.7 as well \ as "a SAINTexpress BFDR <= 0.05 and an average spectral count >= 2.").

\ \

Methods

\

\ The tab delimited version of supplementary table 2 was downloaded from \ biorxiv. This table lists the viral protein that served as the "bait" and \ Uniprot identifiers of human proteins that were captured as "prey".

\

\ A version of the "UniProt Mature Protein Products (Polypeptide Chains)" track was then manually \ modified to rename ORFs to match paper nomenclature as indicated by Figure 1.

\

\ The table was then joined and MIST scores multipled by 1000 to produce a track reporting \ interactions.

\ \

References

\ Gordon et al, \ "A SARS-CoV-2-Human Protein-Protein Interaction Map Reveals Drug Targets and Potential Drug-Repurposing", \ Biorxiv 2020

\ genes 1 bigDataUrl /gbdb/wuhCor1/bbi/gordon.bb\ color 1,50,32\ group genes\ longLabel Human Interacting Proteins from Gordon et al. (* = druggable)\ mouseOverField drug\ priority 10\ scoreMax 1000\ scoreMin 600\ shortLabel Protein Interact.\ track gordon2\ type bigBed 5 +\ urls pmid=https://www.ncbi.nlm.nih.gov/pubmed/$$\ useScore on\ Vero6_05hpi Vero6 5hpi bigWig Vero6 5hpi Ribo-seq and RNA-seq 0 10 0 0 0 127 127 127 0 0 0

Description

\

\ The Weizman ORFs (Open Reading Frames) track shows previously unannotated ORF\ predictions based on Ribo-Seq and RNA-seq data. It is a collection of\ tracks (super track) \ that contains not only the predicted gene models, but also\ data supporting them.

\ \

Display Conventions and Configuration

\ The Predicted ORFs track shows the predicted exons. All other tracks show the signal as \ a x-y plot with bars.\ \

Methods

\

\ Methods from Finkel et al:

\

\ To capture the full SARS-CoV-2 coding capacity, we applied a suite of ribosome\ profiling approaches to Vero cells infected with SARS-CoV-2 for 5 and 24 hours,\ and Calu3 cells infected for 7 hours. For each time point we prepared three\ different ribosome-profiling libraries, each one in two biological replicates.\ Two Ribo-seq libraries facilitate mapping of translation initiation sites, by\ treating cells with lactimidomycin (LTM) or harringtonine (Harr), two drugs\ with distinct mechanisms that prevent 80S ribosomes at translation initiation\ sites from elongating. The third Ribo-seq library was prepared from cells\ treated with the translation elongation inhibitor cycloheximide (CHX), and\ gives a snap-shot of actively translating ribosomes across the body of the\ translated ORF. In parallel, RNA-sequencing was applied to map viral\ transcripts.

\

\ The ORF prediction was done by using two computational tools, PRICE and\ ORF-RATER, that rely on different features of ribosome profiling data, and by\ manual inspection of the data. The predictions are based on Ribo-seq libraries\ from two time points (5 and 7 hpi) of two different cell lines (Vero E6 and\ Calu3 cells), infected with separate virus isolates.

\

\ The Ribo-Seq data of the 24 hours samples do not show the expected profile of\ read distribution on viral genes and therefore were not used for the procedure\ of ORF predictions.

\

For more details see the paper in the References section below.

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Finkel Y, Mizrahi O, Nachshon A, Weingarten-Gabbay S, Morgenstern D, Yahalom-Ronen Y, Tamir H,\ Achdout H, Stein D, Israeli O et al.\ \ The coding capacity of SARS-CoV-2.\ Nature. 2020 Sep 9;.\ PMID: 32906143\

\ \ genes 0 compositeTrack on\ group genes\ html weizmanOrfs\ longLabel Vero6 5hpi Ribo-seq and RNA-seq\ parent weizmanOrfs off\ priority 10\ shortLabel Vero6 5hpi\ track Vero6_05hpi\ type bigWig\ visibility hide\ igm_Ctrl_LC177 Ctrl LC177 bigBed 9 Ctrl LC177 1 11 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_LC177.bb\ longLabel Ctrl LC177\ parent igm on\ priority 11\ shortLabel Ctrl LC177\ track igm_Ctrl_LC177\ type bigBed 9\ igg_Ctrl_NC97 Ctrl NC97 bigBed 9 Ctrl NC97 1 11 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_NC97.bb\ longLabel Ctrl NC97\ parent igg on\ priority 11\ shortLabel Ctrl NC97\ track igg_Ctrl_NC97\ type bigBed 9\ M_bind_avg M_bind_avg bigWig DMS data for RBD Binding 1 11 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/M_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel M_bind_avg\ track M_bind_avg\ type bigWig\ visibility dense\ M_expr_avg_Expression M_expr_avg bigWig DMS data for RBD expression 1 11 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/M_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel M_expr_avg\ track M_expr_avg_Expression\ type bigWig\ visibility dense\ rCR3022Total MAB rCR3022 total bigWig Bloom antibody escape - Total Score - rCR3022 1 11 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/rCR3022.tot.bw\ longLabel Bloom antibody escape - Total Score - rCR3022\ parent bloomEscTotal on\ shortLabel MAB rCR3022 total\ track rCR3022Total\ type bigWig\ visibility dense\ variantAaMuts_BA_2 Omicron BA.2 AA Muts bigBed 4 Omicron BA.2 amino acid mutations from GISAID sequences (Sep 22, 2023) 1 11 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.2 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BA.2_prot.bb\ color 219,40,35\ longLabel Omicron BA.2 amino acid mutations from GISAID sequences (Sep 22, 2023)\ parent variantMuts on\ priority 12\ shortLabel Omicron BA.2 AA Muts\ subGroups variant=K_BA2 mutation=AA designation=VOC\ track variantAaMuts_BA_2\ url https://outbreak.info/situation-reports?pango=BA.2\ urlLabel BA.2 Situation Report at outbreak.info\ unipCov2Other Other Annot. bigGenePred UniProt Other Annotations 0 11 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/unipOtherCov2.bb\ dataVersion /gbdb/$D/uniprot/version.txt\ group uniprot\ html uniprotCov2\ longLabel UniProt Other Annotations\ mouseOverField comments\ priority 11\ shortLabel Other Annot.\ track unipCov2Other\ type bigGenePred\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ unipCov2Struct Structure bigGenePred UniProt Protein Primary/Secondary Structure Annotations 0 11 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/unipStructCov2.bb\ dataVersion /gbdb/$D/uniprot/version.txt\ group uniprot\ html uniprotCov2\ longLabel UniProt Protein Primary/Secondary Structure Annotations\ mouseOverField comments\ priority 11\ shortLabel Structure\ track unipCov2Struct\ type bigGenePred\ urls uniProtId="http://www.uniprot.org/uniprot/$$#structure" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ CHX_5hr_1 Vero6 CHX 5hr 1 bigWig Vero6 CHX 5hr 1 2 11 4 90 141 129 172 198 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_chx_05hr_1.bw\ color 4,90,141\ longLabel Vero6 CHX 5hr 1\ maxHeightPixels 124:32:5\ parent Vero6_05hpi on\ priority 11\ shortLabel Vero6 CHX 5hr 1\ track CHX_5hr_1\ type bigWig\ viewLimits 0:10\ visibility full\ igm_Ctrl_LC169 Ctrl LC169 bigBed 9 Ctrl LC169 1 12 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_LC169.bb\ longLabel Ctrl LC169\ parent igm on\ priority 12\ shortLabel Ctrl LC169\ track igm_Ctrl_LC169\ type bigBed 9\ igg_Ctrl_LC174 Ctrl LC174 bigBed 9 Ctrl LC174 1 12 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_LC174.bb\ longLabel Ctrl LC174\ parent igg on\ priority 12\ shortLabel Ctrl LC174\ track igg_Ctrl_LC174\ type bigBed 9\ REGN10933Total MAB REGN10933 total bigWig Bloom antibody escape - Total Score - REGN10933 1 12 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/REGN10933.tot.bw\ longLabel Bloom antibody escape - Total Score - REGN10933\ parent bloomEscTotal on\ shortLabel MAB REGN10933 total\ track REGN10933Total\ type bigWig\ visibility dense\ N_bind_avg N_bind_avg bigWig DMS data for RBD Binding 1 12 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/N_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel N_bind_avg\ track N_bind_avg\ type bigWig\ visibility dense\ N_expr_avg_Expression N_expr_avg bigWig DMS data for RBD expression 1 12 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/N_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel N_expr_avg\ track N_expr_avg_Expression\ type bigWig\ visibility dense\ variantNucMuts_BA_2 Omicron BA.2 Nuc Muts bigBed 4 Omicron VOC (BA.2) nucleotide mutations identifed from GISAID sequences (Sep 2023) 1 12 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.2 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BA.2_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (BA.2) nucleotide mutations identifed from GISAID sequences (Sep 2023)\ parent variantMuts off\ priority 122\ shortLabel Omicron BA.2 Nuc Muts\ subGroups variant=K_BA2 mutation=NUC designation=VOC\ track variantNucMuts_BA_2\ url https://outbreak.info/situation-reports?pango=BA.2\ urlLabel BA.2 Situation Report at outbreak.info\ unipCov2Repeat Repeats bigGenePred UniProt Repeats 0 12 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows protein sequence annotations from the UniProt/SwissProt database,\ mapped to genomic coordinates. \ The data has been curated from scientific publications by the UniProt/SwissProt staff.\ The annotations are spread over multiple tracks, based on their "feature type" in UniProt:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProtProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) annotations mapped using this track.
UCSC Alignment, TrEMBLProtein sequences from TrEMBL mapped onto the genome. All other tracks\ are (start,end) annotations mapped using this track. This track is\ hidden by default. To show it, click its checkbox on the track description\ page.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations
\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse-over a feature to see the full UniProt annotation comment. For variants, the mouse-over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

Methods

\ \

\ UniProt sequences were aligned to UCSC/Gencode transcript sequences first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser\ source code, the main script used to build this track can be found on \ GitHub.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/uniprot/unipStructCov2.bb -chrom=NC_045512v2 -start=0 -end=29903 stdout \

\ \ Please refer to our\ mailing list archives\ for questions or our\ Data Access FAQ\ for more information. \

\ \

Credits

\

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Phil Berman, UCSC.\ Thanks to UniProt for making all data available for download.

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/unipRepeatCov2.bb\ dataVersion /gbdb/$D/uniprot/version.txt\ group uniprot\ html uniprotCov2\ longLabel UniProt Repeats\ mouseOverField comments\ priority 12\ shortLabel Repeats\ track unipCov2Repeat\ type bigGenePred\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ CHX_5hr_2 Vero6 CHX 5hr 2 bigWig Vero6 CHX 5hr 2 0 12 4 90 141 129 172 198 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_chx_05hr_2.bw\ color 4,90,141\ longLabel Vero6 CHX 5hr 2\ maxHeightPixels 124:32:5\ parent Vero6_05hpi off\ priority 12\ shortLabel Vero6 CHX 5hr 2\ track CHX_5hr_2\ type bigWig\ viewLimits 0:10\ visibility hide\ strainPhastCons44way Bat PhastCons wig 0 1 44 bat virus strains Basewise Conservation by PhastCons 1 13 70 130 70 130 70 70 0 0 0 compGeno 0 altColor 130,70,70\ autoScale off\ color 70,130,70\ configurable on\ longLabel 44 bat virus strains Basewise Conservation by PhastCons\ maxHeightPixels 100:40:11\ noInherit on\ parent strainCons44wayViewphastcons on\ priority 13\ shortLabel Bat PhastCons\ spanList 1\ subGroups view=phastcons\ track strainPhastCons44way\ type wig 0 1\ windowingFunction mean\ igm_Ctrl_LC168 Ctrl LC168 bigBed 9 Ctrl LC168 1 13 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_LC168.bb\ longLabel Ctrl LC168\ parent igm on\ priority 13\ shortLabel Ctrl LC168\ track igm_Ctrl_LC168\ type bigBed 9\ igg_Ctrl_NC64 Ctrl NC64 bigBed 9 Ctrl NC64 1 13 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_NC64.bb\ longLabel Ctrl NC64\ parent igg on\ priority 13\ shortLabel Ctrl NC64\ track igg_Ctrl_NC64\ type bigBed 9\ REGN10933-REGN10987Total MAB REGN10933-REGN10987 total bigWig Bloom antibody escape - Total Score - REGN10933-REGN10987 1 13 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/REGN10933-REGN10987.tot.bw\ longLabel Bloom antibody escape - Total Score - REGN10933-REGN10987\ parent bloomEscTotal on\ shortLabel MAB REGN10933-REGN10987 total\ track REGN10933-REGN10987Total\ type bigWig\ visibility dense\ variantAaMuts_BA_2_75 Omicron BA.2.75 AA Muts bigBed 4 Omicron BA.2.75 amino acid mutations from 287 GISAID sequences (Sep 22, 2023) 1 13 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.2.75 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BA.2.75_prot.bb\ color 219,40,35\ longLabel Omicron BA.2.75 amino acid mutations from 287 GISAID sequences (Sep 22, 2023)\ parent variantMuts off\ priority 16\ shortLabel Omicron BA.2.75 AA Muts\ subGroups variant=O_BA275 mutation=AA designation=VOC\ track variantAaMuts_BA_2_75\ url https://outbreak.info/situation-reports?pango=BA.2.75\ urlLabel BA.2.75 Situation Report at outbreak.info\ P_bind_avg P_bind_avg bigWig DMS data for RBD Binding 1 13 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/P_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel P_bind_avg\ track P_bind_avg\ type bigWig\ visibility dense\ P_expr_avg_Expression P_expr_avg bigWig DMS data for RBD expression 1 13 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/P_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel P_expr_avg\ track P_expr_avg_Expression\ type bigWig\ visibility dense\ strainPhastCons119way PhastCons wig 0 1 119 virus strains Basewise Conservation by PhastCons 1 13 70 130 70 130 70 70 0 0 0 compGeno 0 altColor 130,70,70\ autoScale off\ color 70,130,70\ configurable on\ longLabel 119 virus strains Basewise Conservation by PhastCons\ maxHeightPixels 100:40:11\ noInherit on\ parent strainCons119wayViewphastcons on\ priority 13\ shortLabel PhastCons\ spanList 1\ subGroups view=phastcons\ track strainPhastCons119way\ type wig 0 1\ windowingFunction mean\ Harr_5hr_1 Vero6 Harr 5hr 1 bigWig Vero6 Harr 5hr 1 2 13 179 0 0 217 127 127 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_harr_05hr_1.bw\ color 179,0,0\ longLabel Vero6 Harr 5hr 1\ maxHeightPixels 124:32:5\ parent Vero6_05hpi on\ priority 13\ shortLabel Vero6 Harr 5hr 1\ track Harr_5hr_1\ type bigWig\ viewLimits 0:10\ visibility full\ igm_Ctrl_LC182 Ctrl LC182 bigBed 9 Ctrl LC182 1 14 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_LC182.bb\ longLabel Ctrl LC182\ parent igm on\ priority 14\ shortLabel Ctrl LC182\ track igm_Ctrl_LC182\ type bigBed 9\ igg_Ctrl_NC63 Ctrl NC63 bigBed 9 Ctrl NC63 1 14 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_NC63.bb\ longLabel Ctrl NC63\ parent igg on\ priority 14\ shortLabel Ctrl NC63\ track igg_Ctrl_NC63\ type bigBed 9\ REGN10987Total MAB REGN10987 total bigWig Bloom antibody escape - Total Score - REGN10987 1 14 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/REGN10987.tot.bw\ longLabel Bloom antibody escape - Total Score - REGN10987\ parent bloomEscTotal on\ shortLabel MAB REGN10987 total\ track REGN10987Total\ type bigWig\ visibility dense\ variantNucMuts_BA_2_75 Omicron BA.2.75 Nuc Muts bigBed 4 Omicron VOC (BA.2.75) nucleotide mutations identifed from GISAID sequences (Sep 2023) 1 14 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.2.75 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BA.2.75_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (BA.2.75) nucleotide mutations identifed from GISAID sequences (Sep 2023)\ parent variantMuts off\ priority 126\ shortLabel Omicron BA.2.75 Nuc Muts\ subGroups variant=O_BA275 mutation=NUC designation=VOC\ track variantNucMuts_BA_2_75\ url https://outbreak.info/situation-reports?pango=BA.2.75\ urlLabel BA.2.75 Situation Report at outbreak.info\ Q_bind_avg Q_bind_avg bigWig DMS data for RBD Binding 1 14 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/Q_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel Q_bind_avg\ track Q_bind_avg\ type bigWig\ visibility dense\ Q_expr_avg_Expression Q_expr_avg bigWig DMS data for RBD expression 1 14 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/Q_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel Q_expr_avg\ track Q_expr_avg_Expression\ type bigWig\ visibility dense\ Harr_5hr_2 Vero6 Harr 5hr 2 bigWig Vero6 Harr 5hr 2 0 14 179 0 0 217 127 127 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_harr_05hr_2.bw\ color 179,0,0\ longLabel Vero6 Harr 5hr 2\ maxHeightPixels 124:32:5\ parent Vero6_05hpi off\ priority 14\ shortLabel Vero6 Harr 5hr 2\ track Harr_5hr_2\ type bigWig\ viewLimits 0:10\ visibility hide\ igm_COVID_528 COVID 528 bigBed 9 COVID 528 1 15 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_528.bb\ longLabel COVID 528\ parent igm on\ priority 15\ shortLabel COVID 528\ track igm_COVID_528\ type bigBed 9\ igg_Ctrl_LC175 Ctrl LC175 bigBed 9 Ctrl LC175 1 15 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_LC175.bb\ longLabel Ctrl LC175\ parent igg on\ priority 15\ shortLabel Ctrl LC175\ track igg_Ctrl_LC175\ type bigBed 9\ variantAaMutsV2_C_37 Lambda AA Muts bigBed 4 Lambda VOI (C.37 Peru Mar-2020) amino acid mutations in 3000 GISAID sequences (Sep 10, 2021) 1 15 230 128 51 242 191 153 0 0 0 https://outbreak.info/situation-reports?pango=C.37 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantAaMuts_C.37_2021_09_10.bb\ color 230,128,51\ longLabel Lambda VOI (C.37 Peru Mar-2020) amino acid mutations in 3000 GISAID sequences (Sep 10, 2021)\ parent variantMuts off\ priority 5\ shortLabel Lambda AA Muts\ subGroups variant=L_C37 mutation=AA designation=VOI\ track variantAaMutsV2_C_37\ url https://outbreak.info/situation-reports?pango=C.37\ urlLabel C.37 Situation Report at outbreak.info\ R_bind_avg R_bind_avg bigWig DMS data for RBD Binding 1 15 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/R_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel R_bind_avg\ track R_bind_avg\ type bigWig\ visibility dense\ R_expr_avg_Expression R_expr_avg bigWig DMS data for RBD expression 1 15 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/R_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel R_expr_avg\ track R_expr_avg_Expression\ type bigWig\ visibility dense\ A_21Total Serum: A, day 021 bigWig Bloom antibody escape - Total Score - Subject A, Day 21 1 15 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/A_21.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject A, Day 21\ parent bloomEscTotal on\ shortLabel Serum: A, day 021\ track A_21Total\ type bigWig\ visibility dense\ LTM_5hr_1 Vero6 LTM 5hr 1 bigWig Vero6 LTM 5hr 1 2 15 35 139 69 145 197 162 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_ltm_05hr_1.bw\ color 35,139,69\ longLabel Vero6 LTM 5hr 1\ maxHeightPixels 124:32:5\ parent Vero6_05hpi on\ priority 15\ shortLabel Vero6 LTM 5hr 1\ track LTM_5hr_1\ type bigWig\ viewLimits 0:10\ visibility full\ igm_Ctrl_NC65 Ctrl NC65 bigBed 9 Ctrl NC65 1 16 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_NC65.bb\ longLabel Ctrl NC65\ parent igm on\ priority 16\ shortLabel Ctrl NC65\ track igm_Ctrl_NC65\ type bigBed 9\ igg_Ctrl_NC95 Ctrl NC95 bigBed 9 Ctrl NC95 1 16 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_NC95.bb\ longLabel Ctrl NC95\ parent igg on\ priority 16\ shortLabel Ctrl NC95\ track igg_Ctrl_NC95\ type bigBed 9\ variantNucMutsV2_C_37 Lambda Nuc Muts bigBed 4 Lambda VOI (C.37 Peru Mar-2020) nucleotide mutations in 3000 GISAID sequences (Sep 10, 2021) 1 16 230 128 51 242 191 153 0 0 0 https://outbreak.info/situation-reports?pango=C.37 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantNucMuts_C.37_2021_09_10.bb\ color 230,128,51\ longLabel Lambda VOI (C.37 Peru Mar-2020) nucleotide mutations in 3000 GISAID sequences (Sep 10, 2021)\ parent variantMuts off\ priority 115\ shortLabel Lambda Nuc Muts\ subGroups variant=L_C37 mutation=NUC designation=VOI\ track variantNucMutsV2_C_37\ url https://outbreak.info/situation-reports?pango=C.37\ urlLabel C.37 Situation Report at outbreak.info\ S_bind_avg S_bind_avg bigWig DMS data for RBD Binding 1 16 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/S_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel S_bind_avg\ track S_bind_avg\ type bigWig\ visibility dense\ S_expr_avg_Expression S_expr_avg bigWig DMS data for RBD expression 1 16 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/S_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel S_expr_avg\ track S_expr_avg_Expression\ type bigWig\ visibility dense\ A_45Total Serum: A, day 045 bigWig Bloom antibody escape - Total Score - Subject A, Day 45 1 16 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/A_45.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject A, Day 45\ parent bloomEscTotal on\ shortLabel Serum: A, day 045\ track A_45Total\ type bigWig\ visibility dense\ LTM_5hr_2 Vero6 LTM 5hr 2 bigWig Vero6 LTM 5hr 2 0 16 35 139 69 145 197 162 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_ltm_05hr_2.bw\ color 35,139,69\ longLabel Vero6 LTM 5hr 2\ maxHeightPixels 124:32:5\ parent Vero6_05hpi off\ priority 16\ shortLabel Vero6 LTM 5hr 2\ track LTM_5hr_2\ type bigWig\ viewLimits 0:10\ visibility hide\ igg_Ctrl_NC98 Ctrl NC98 bigBed 9 Ctrl NC98 1 17 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_NC98.bb\ longLabel Ctrl NC98\ parent igg on\ priority 17\ shortLabel Ctrl NC98\ track igg_Ctrl_NC98\ type bigBed 9\ igm_Ctrl_NC98 Ctrl NC98 bigBed 9 Ctrl NC98 1 17 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_NC98.bb\ longLabel Ctrl NC98\ parent igm on\ priority 17\ shortLabel Ctrl NC98\ track igm_Ctrl_NC98\ type bigBed 9\ variantAaMutsV2_B_1_621 Mu AA Muts bigBed 4 Mu VOI (B.1.621 Columbia Jan-2021) amino acid mutations in 3000 GISAID sequences (Sep 10, 2021) 1 17 226 86 43 240 170 149 0 0 0 https://outbreak.info/situation-reports?pango=B.1.621 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantAaMuts_B.1.621_2021_09_10.bb\ color 226,86,43\ longLabel Mu VOI (B.1.621 Columbia Jan-2021) amino acid mutations in 3000 GISAID sequences (Sep 10, 2021)\ parent variantMuts off\ priority 6\ shortLabel Mu AA Muts\ subGroups variant=M_B1621 mutation=AA designation=VOI\ track variantAaMutsV2_B_1_621\ url https://outbreak.info/situation-reports?pango=B.1.621\ urlLabel B.1.621 Situation Report at outbreak.info\ A_120Total Serum: A, day 120 bigWig Bloom antibody escape - Total Score - Subject A, Day 120 1 17 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/A_120.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject A, Day 120\ parent bloomEscTotal on\ shortLabel Serum: A, day 120\ track A_120Total\ type bigWig\ visibility dense\ T_bind_avg T_bind_avg bigWig DMS data for RBD Binding 1 17 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/T_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel T_bind_avg\ track T_bind_avg\ type bigWig\ visibility dense\ T_expr_avg_Expression T_expr_avg bigWig DMS data for RBD expression 1 17 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/T_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel T_expr_avg\ track T_expr_avg_Expression\ type bigWig\ visibility dense\ mRNA-seq_5hr_1 Vero6 mRNA 5hr 1 bigWig Vero6 mRNA 5hr 1 2 17 63 0 125 159 127 190 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/mrna_05hr_1.bw\ color 63,0,125\ longLabel Vero6 mRNA 5hr 1\ maxHeightPixels 124:32:5\ parent Vero6_05hpi on\ priority 17\ shortLabel Vero6 mRNA 5hr 1\ track mRNA-seq_5hr_1\ type bigWig\ viewLimits 0:10\ visibility full\ igg_Ctrl_LC180 Ctrl LC180 bigBed 9 Ctrl LC180 1 18 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/Ctrl_LC180.bb\ longLabel Ctrl LC180\ parent igg on\ priority 18\ shortLabel Ctrl LC180\ track igg_Ctrl_LC180\ type bigBed 9\ igm_Ctrl_NC63 Ctrl NC63 bigBed 9 Ctrl NC63 1 18 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_NC63.bb\ longLabel Ctrl NC63\ parent igm on\ priority 18\ shortLabel Ctrl NC63\ track igm_Ctrl_NC63\ type bigBed 9\ variantNucMutsV2_B_1_621 Mu Nuc Muts bigBed 4 Mu VOI (B.1.621 Columbia Jan-2021) nucleotide mutations in 3000 GISAID sequences (Sep 10, 2021) 1 18 226 86 43 240 170 149 0 0 0 https://outbreak.info/situation-reports?pango=B.1.621 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantNucMuts_B.1.621_2021_09_10.bb\ color 226,86,43\ longLabel Mu VOI (B.1.621 Columbia Jan-2021) nucleotide mutations in 3000 GISAID sequences (Sep 10, 2021)\ parent variantMuts off\ priority 116\ shortLabel Mu Nuc Muts\ subGroups variant=M_B1621 mutation=NUC designation=VOI\ track variantNucMutsV2_B_1_621\ url https://outbreak.info/situation-reports?pango=B.1.621\ urlLabel B.1.621 Situation Report at outbreak.info\ B_26Total Serum: B, day 026 bigWig Bloom antibody escape - Total Score - Subject B, Day 26 1 18 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/B_26.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject B, Day 26\ parent bloomEscTotal on\ shortLabel Serum: B, day 026\ track B_26Total\ type bigWig\ visibility dense\ V_bind_avg V_bind_avg bigWig DMS data for RBD Binding 1 18 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/V_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel V_bind_avg\ track V_bind_avg\ type bigWig\ visibility dense\ V_expr_avg_Expression V_expr_avg bigWig DMS data for RBD expression 1 18 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/V_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel V_expr_avg\ track V_expr_avg_Expression\ type bigWig\ visibility dense\ mRNA-seq_5hr_2 Vero6 mRNA 5hr 2 bigWig Vero6 mRNA 5hr 2 0 18 63 0 125 159 127 190 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/mrna_05hr_2.bw\ color 63,0,125\ longLabel Vero6 mRNA 5hr 2\ maxHeightPixels 124:32:5\ parent Vero6_05hpi off\ priority 18\ shortLabel Vero6 mRNA 5hr 2\ track mRNA-seq_5hr_2\ type bigWig\ viewLimits 0:10\ visibility hide\ igg_COVID_607 COVID 607 bigBed 9 COVID 607 1 19 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_607.bb\ longLabel COVID 607\ parent igg on\ priority 19\ shortLabel COVID 607\ track igg_COVID_607\ type bigBed 9\ igm_Ctrl_LC180 Ctrl LC180 bigBed 9 Ctrl LC180 1 19 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_LC180.bb\ longLabel Ctrl LC180\ parent igm on\ priority 19\ shortLabel Ctrl LC180\ track igm_Ctrl_LC180\ type bigBed 9\ variantAaMuts_XBB_1_5 Omicron XBB.1.5 AA Muts bigBed 4 Omicron XBB.1.5 amino acid mutations from GISAID sequences (Sep 22, 2023) 1 19 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=XBB.1.5 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/XBB.1.5_prot.bb\ color 219,40,35\ longLabel Omicron XBB.1.5 amino acid mutations from GISAID sequences (Sep 22, 2023)\ parent variantMuts on\ priority 19\ shortLabel Omicron XBB.1.5 AA Muts\ subGroups variant=R_XBB15 mutation=AA designation=VOI\ track variantAaMuts_XBB_1_5\ url https://outbreak.info/situation-reports?pango=XBB.1.5\ urlLabel XBB.1.5 Situation Report at outbreak.info\ B_113Total Serum: B, day 113 bigWig Bloom antibody escape - Total Score - Subject B, Day 113 1 19 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/B_113.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject B, Day 113\ parent bloomEscTotal on\ shortLabel Serum: B, day 113\ track B_113Total\ type bigWig\ visibility dense\ W_bind_avg W_bind_avg bigWig DMS data for RBD Binding 1 19 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/W_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel W_bind_avg\ track W_bind_avg\ type bigWig\ visibility dense\ W_expr_avg_Expression W_expr_avg bigWig DMS data for RBD expression 1 19 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/W_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel W_expr_avg\ track W_expr_avg_Expression\ type bigWig\ visibility dense\ nextstrainFreq19B 19B bigWig Nextstrain, 19B clade: Alternate allele frequency 1 20 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples19B.bigWig\ longLabel Nextstrain, 19B clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 20\ shortLabel 19B\ subGroups view=newClades\ track nextstrainFreq19B\ type bigWig\ visibility dense\ nextstrainSamples19B 19B Mutations vcfTabix Mutations in Clade 19B Nextstrain Subset of GISAID EpiCoV TM Samples 0 20 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples19B.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain19B.nh\ longLabel Mutations in Clade 19B Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades\ priority 20\ shortLabel 19B Mutations\ subGroups view=newClades\ track nextstrainSamples19B\ galaxyEnaQ1Ay-4-2 AY.4.2 mutations bigBed 8 + Mutations (amino acid level) in AY.4.2 between 2021-10-05 and 2022-01-05 1 20 140 8 0 197 131 127 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q1/01_AY.4.2_data.bb\ color 140,8,0\ html galaxyEna\ longLabel Mutations (amino acid level) in AY.4.2 between 2021-10-05 and 2022-01-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q1_tracks on\ priority 20\ shortLabel AY.4.2 mutations\ spectrum on\ track galaxyEnaQ1Ay-4-2\ type bigBed 8 +\ galaxyEnaQ2Ay-5 AY.5 mutations bigBed 8 + Mutations (amino acid level) in AY.5 between 2021-07-05 and 2021-10-05 1 20 0 28 127 127 141 191 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q2/01_AY.5_data.bb\ color 0,28,127\ html galaxyEna\ longLabel Mutations (amino acid level) in AY.5 between 2021-07-05 and 2021-10-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q2_tracks on\ priority 20\ shortLabel AY.5 mutations\ spectrum on\ track galaxyEnaQ2Ay-5\ type bigBed 8 +\ galaxyEnaQ3B-1-1-7 B.1.1.7 mutations bigBed 8 + Mutations (amino acid level) in B.1.1.7 between 2021-04-05 and 2021-07-05 1 20 162 53 130 208 154 192 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q3/01_B.1.1.7_data.bb\ color 162,53,130\ html galaxyEna\ longLabel Mutations (amino acid level) in B.1.1.7 between 2021-04-05 and 2021-07-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q3_tracks on\ priority 20\ shortLabel B.1.1.7 mutations\ spectrum on\ track galaxyEnaQ3B-1-1-7\ type bigBed 8 +\ galaxyEnaQ0Ba-1-1 BA.1.1 mutations bigBed 8 + Mutations (amino acid level) in BA.1.1 between 2022-01-05 and 2022-03-06 3 20 184 133 10 219 194 132 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q0/01_BA.1.1_data.bb\ color 184,133,10\ html galaxyEna\ longLabel Mutations (amino acid level) in BA.1.1 between 2022-01-05 and 2022-03-06\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q0_tracks on\ priority 20\ shortLabel BA.1.1 mutations\ spectrum on\ track galaxyEnaQ0Ba-1-1\ type bigBed 8 +\ problematicSitesCaution Caution bigBed Problematic sites where caution is recommended for analysis 0 20 240 160 0 247 207 127 0 0 0 map 1 bigDataUrl /gbdb/wuhCor1/problematicSites/problematicSitesCaution.bb\ color 240,160,0\ longLabel Problematic sites where caution is recommended for analysis\ parent problematicSites\ priority 20\ shortLabel Caution\ track problematicSitesCaution\ type bigBed\ igg_COVID_533 COVID 533 bigBed 9 COVID 533 1 20 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_533.bb\ longLabel COVID 533\ parent igg on\ priority 20\ shortLabel COVID 533\ track igg_COVID_533\ type bigBed 9\ igm_Ctrl_NC64 Ctrl NC64 bigBed 9 Ctrl NC64 1 20 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_NC64.bb\ longLabel Ctrl NC64\ parent igm on\ priority 20\ shortLabel Ctrl NC64\ track igm_Ctrl_NC64\ type bigBed 9\ Q1_tracks Galaxy ENA mutations in top lineages - a quarter ago bigBed 8 + Most frequent lineages of a quarter ago 1 20 0 0 0 127 127 127 0 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 allButtonPair on\ compositeTrack on\ filter.withinLineageFrequency 0.05\ filterByRange.withinLineageFrequency on\ filterLimits.withinLineageFrequency 0:2\ filterType.countries multipleListOr\ filterValues.countries EE|Estonia,GB|United Kingdom,GR|Greece,IE|Ireland,ZA|South Africa\ filterValuesDefault.countries EE,GB,GR,ZA\ html galaxyEna\ longLabel Most frequent lineages of a quarter ago\ parent galaxyEna\ priority 20\ shortLabel Galaxy ENA mutations in top lineages - a quarter ago\ track Q1_tracks\ type bigBed 8 +\ visibility dense\ sarsCov2PhyloPubAllMinAf001 Min AF 0.1% vcfTabix Nucleotide Substitution Mutations with Alternate Allele Frequency >= 0.1% in Public Sequences 0 20 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/sarsCov2PhyloPub/public.all.minAf.001.vcf.gz\ longLabel Nucleotide Substitution Mutations with Alternate Allele Frequency >= 0.1% in Public Sequences\ parent sarsCov2PhyloPub off\ priority 20\ shortLabel Min AF 0.1%\ track sarsCov2PhyloPubAllMinAf001\ type vcfTabix\ sarsCov2PhyloMinAf001 Min alt AF 0.1% vcfTabix Nucleotide Substitution Mutations with Alternate Allele Frequency >= 0.1% in GISAID EpiCov TM Sequences 0 20 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/sarsCov2Phylo/gisaid.minAf.001.vcf.gz\ longLabel Nucleotide Substitution Mutations with Alternate Allele Frequency >= 0.1% in GISAID EpiCov TM Sequences\ parent sarsCov2Phylo off\ priority 20\ shortLabel Min alt AF 0.1%\ track sarsCov2PhyloMinAf001\ type vcfTabix\ variantNucMuts_XBB_1_5 Omicron XBB.1.5 Nuc Muts bigBed 4 Omicron VOC (XBB.1.5) nucleotide mutations identifed from GISAID sequences (Sep 2023) 1 20 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=XBB.1.5 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/XBB.1.5_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (XBB.1.5) nucleotide mutations identifed from GISAID sequences (Sep 2023)\ parent variantMuts off\ priority 129\ shortLabel Omicron XBB.1.5 Nuc Muts\ subGroups variant=R_XBB15 mutation=NUC designation=VOI\ track variantNucMuts_XBB_1_5\ url https://outbreak.info/situation-reports?pango=XBB.1.5\ urlLabel XBB.1.5 Situation Report at outbreak.info\ PhyloCSF_smooth PhyloCSF bigWig PhyloCSF 0 20 0 0 0 127 127 127 0 0 0


\

Description

\

These tracks show evolutionary protein-coding potential as determined by \ PhyloCSF [1] to help identify conserved, functional, protein-coding regions of genomes. \ PhyloCSF examines evolutionary signatures characteristic of alignments of conserved coding \ regions, such as the high frequencies of synonymous codon substitutions and conservative \ amino acid substitutions, and the low frequencies of other missense and nonsense substitutions \ (CSF = Codon Substitution Frequencies). PhyloCSF provides more information than conservation of \ the amino acid sequence, because it distinguishes the different codons that code for the same \ amino acid. One of PhyloCSF's main current applications is to help distinguish protein-coding \ and non-coding RNAs represented among novel transcript models obtained from high-throughput \ transcriptome sequencing. More information on PhyloCSF can be found on the \ PhyloCSF wiki.

\


\

The Smoothed PhyloCSF track shows the PhyloCSF score for each codon in \ each of 6 frames, smoothed using an HMM. Regions in which most codons have score greater than \ 0 are likely to be protein-coding in that frame. No score is shown when the relative branch \ length is less than 0.1 (see PhyloCSF Power).

\


\

The PhyloCSF Power track shows the branch length score at each codon, \ i.e., the ratio of the branch length of the species present in the local alignment to the \ total branch length of all species in the full genome alignment. It is an indication of the \ statistical power available to PhyloCSF. Codons with branch length score less than 0.1 have \ been excluded altogether (from all tracks) because PhyloCSF does not have sufficient power to \ get a meaningful score at these codons. Codons with branch length score greater than 0.1 but \ much less than 1 should be considered less certain.

\


\

Caveats

\ \


\

Methods

\


\

Tracks were constructed as described in \ Mudge et al. 2019 and Jungreis et al. 2020. In brief, PhyloCSF was run with the "fixed" strategy \ on every codon in every frame \ on each strand in the wuhCor1/SARS-CoV-2 assembly using an alignment of 44 Sarbecovirus genomes, \ using the PhyloCSF parameters for 29mammals with the tree replaced with a tree of the 44 \ Sarbecovirus genomes.

\

The scores were smoothed using a Hidden Markov Model (HMM) with 4 states, \ one representing coding regions and three representing non-coding regions. The emission of each \ codon is its PhyloCSF score. The ratio of the emissions probabilities for the coding and \ non-coding models are computed from the PhyloCSF score, since it represents the log-likelihood \ ratio of the alignment under the coding and non-coding models. The three non-coding states have \ the same emissions probabilities but different transition probabilities (they can only transition \ to coding) to better capture the multimodal distribution of gaps between same-frame coding exons. \ These transition probabilities represent the best approximation of this gap distribution as a \ mixture model of three exponential distributions, computed using Expectation Maximization. The \ HMM defines a probability that each codon is coding, based on the PhyloCSF scores of that codon \ and nearby codons on the same strand in the same frame, without taking into account start codons, \ stop codons, or potential splice sites. PhyloCSF+1 shows the log-odds that codons in frame 1 \ (sometimes called frame 0) on the '+' strand are in the coding state according to the HMM, and \ similarly for strand '-' and frames 2 and 3.

\ \

Data Access

\

\ The raw bigWig data can be explored interactively with the\ Table Browser, combined with other datasets in the\ Data Integrator tool, or downloaded directly from\ the download server.\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

Credits and Citations

\


\

Questions about the algorithm itself should be directed to \ Irwin Jungreis.\ If you use the PhyloCSF browser tracks, please cite Mudge et al. 2019 and\ Jungreis et al. 2020.

\


\

References

\

\ Lin MF, Jungreis I, Kellis M.\ \ PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions.\ Bioinformatics. 2011 Jul 1;27(13):i275-82.\ PMID: 21685081; PMC: PMC3117341\

\

\ Mudge JM, Jungreis I, Hunt T, Gonzalez JM, Wright JC, Kay M, Davidson C, Fitzgerald S, Seal R,\ Tweedie S et al.\ \ Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps\ elucidate 118 GWAS loci.\ Genome Res. 2019 Dec;29(12):2073-2087.\ PMID: 31537640; PMC: PMC6886504\

\ \

\ Jungreis I, Sealfon R, Kellis M.\ \ Sarbecovirus comparative genomics elucidates gene content of SARS-CoV-2 and functional impact of\ COVID-19 pandemic mutations.\ bioRxiv. 2020 Jun 3;.\ PMID: 32577641; PMC: PMC7302193\

\ compGeno 0 autoScale off\ graphTypeDefault bar\ group compGeno\ longLabel PhyloCSF\ maxHeightPixels 60:30:12\ priority 20\ shortLabel PhyloCSF\ superTrack on\ track PhyloCSF_smooth\ type bigWig\ viewLimits -15:15\ yLineMark 0\ yLineOnOff on\ PhyloCSFpower PhyloCSF Power bigWig Relative branch length of local alignment, a measure of PhyloCSF statistical power 0 20 0 0 0 127 127 127 0 0 0

PhyloCSF Track Hub

\

Description

\


\

These tracks show evolutionary protein-coding potential as determined by PhyloCSF [1] to help identify conserved, functional, protein-coding regions of genomes. PhyloCSF examines evolutionary signatures characteristic of alignments of conserved coding regions, such as the high frequencies of synonymous codon substitutions and conservative amino acid substitutions, and the low frequencies of other missense and nonsense substitutions (CSF = Codon Substitution Frequencies). PhyloCSF provides more information than conservation of the amino acid sequence, because it distinguishes the different codons that code for the same amino acid. One of PhyloCSF's main current applications is to help distinguish protein-coding and non-coding RNAs represented among novel transcript models obtained from high-throughput transcriptome sequencing. More information on PhyloCSF can be found on the PhyloCSF wiki.

\


\

The Smoothed PhyloCSF track shows the PhyloCSF score for each codon in each of 6 frames, smoothed using an HMM. Regions in which most codons have score greater than 0 are likely to be protein-coding in that frame. No score is shown when the relative branch length is less than 0.1 (see PhyloCSF Power).

\


\

The PhyloCSF Power track shows the branch length score at each codon, i.e., the ratio of the branch length of the species present in the local alignment to the total branch length of all species in the full genome alignment. It is an indication of the statistical power available to PhyloCSF. Codons with branch length score less than 0.1 have been excluded altogether (from all tracks) because PhyloCSF does not have sufficient power to get a meaningful score at these codons. Codons with branch length score greater than 0.1 but much less than 1 should be considered less certain.

\


\

Caveats

\ \


\

Methods

\


\

Tracks were constructed as described in Mudge et al. 2019 [2] and Jungreis et al. 2020 [3]. In brief, PhyloCSF was run with the "fixed" strategy on every codon in every frame on each strand in the wuhCor1/SARS-CoV-2 assembly using an alignment of 44 Sarbecovirus genomes, using the PhyloCSF parameters for 29mammals with the tree replaced with a tree of the 44 Sarbecovirus genomes. The scores were smoothed using a Hidden Markov Model (HMM) with 4 states, one representing coding regions and three representing non-coding regions. The emission of each codon is its PhyloCSF score. The ratio of the emissions probabilities for the coding and non-coding models are computed from the PhyloCSF score, since it represents the log-likelihood ratio of the alignment under the coding and non-coding models. The three non-coding states have the same emissions probabilities but different transition probabilities (they can only transition to coding) to better capture the multimodal distribution of gaps between same-frame coding exons. These transition probabilities represent the best approximation of this gap distribution as a mixture model of three exponential distributions, computed using Expectation Maximization. The HMM defines a probability that each codon is coding, based on the PhyloCSF scores of that codon and nearby codons on the same strand in the same frame, without taking into account start codons, stop codons, or potential splice sites. PhyloCSF+0 shows the log-odds that codons in frame 0 on the '+' strand are in the coding state according to the HMM, and similarly for strand '-' and frames 1 and 2.

\


\

Credits

\


\

Questions about the algorithm itself should be directed to Irwin Jungreis.

\


\

Citing the PhyloCSF Tracks

\


\

If you use the PhyloCSF browser tracks, please cite Mudge et al. 2019 [2] and Jungreis et al. 2020 [3].

\


\

References

\


\

[1] Lin MF, Jungreis I, and Kellis M (2011). PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions. Bioinformatics 27:i275-i282 (ISMB/ECCB 2011).

\

[2] Mudge JM, Jungreis I, Hunt T, Gonzalez JM, Wright J, Kay M, Davidson C, Fitzgerald S, Seal R, Tweedie S, He L, Waterhouse RM, Li Y, Bruford E, Choudhary J, Frankish A, Kellis M (2019). Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci. Genome Research gr-246462. doi: 10.1101/gr.246462.118.

\

[3] Jungreis I, Saelfon R and Kellis M (2020). Sarbecovirus comparative genomics elucidates gene content of SARS-CoV-2 and functional impact of COVID-19 pandemic mutations. Biorxiv 2020.

\ compGeno 0 autoScale off\ bigDataUrl /gbdb/wuhCor1/bbi/phylocsf/PhyloCSFpower.bw\ color 0,0,0\ graphTypeDefault bar\ group compGeno\ longLabel Relative branch length of local alignment, a measure of PhyloCSF statistical power\ maxHeightPixels 60:30:12\ parent PhyloCSF_smooth\ shortLabel PhyloCSF Power\ track PhyloCSFpower\ type bigWig\ viewLimits 0:1\ visibility hide\ yLineMark 0.5\ yLineOnOff on\ C_32Total Serum: C, day 032 bigWig Bloom antibody escape - Total Score - Subject C, Day 32 1 20 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/C_32.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject C, Day 32\ parent bloomEscTotal on\ shortLabel Serum: C, day 032\ track C_32Total\ type bigWig\ visibility dense\ PhyloCSF_plus_1 Smoothed PhyloCSF+1 bigWig Smoothed PhyloCSF Strand + Frame 1 2 20 0 175 0 127 215 127 0 0 0 compGeno 0 bigDataUrl /gbdb/wuhCor1/bbi/phylocsf/PhyloCSF+1.bw\ color 0,175,0\ graphTypeDefault bar\ longLabel Smoothed PhyloCSF Strand + Frame 1\ parent PhyloCSF_smooth\ priority 20\ shortLabel Smoothed PhyloCSF+1\ track PhyloCSF_plus_1\ visibility full\ PhyloCSF_plus_2 Smoothed PhyloCSF+2 bigWig Smoothed PhyloCSF Strand + Frame 2 2 20 0 175 0 127 215 127 0 0 0 compGeno 0 bigDataUrl /gbdb/wuhCor1/bbi/phylocsf/PhyloCSF+2.bw\ color 0,175,0\ graphTypeDefault bar\ longLabel Smoothed PhyloCSF Strand + Frame 2\ parent PhyloCSF_smooth\ priority 20\ shortLabel Smoothed PhyloCSF+2\ track PhyloCSF_plus_2\ visibility full\ PhyloCSF_plus_3 Smoothed PhyloCSF+3 bigWig Smoothed PhyloCSF Strand + Frame 3 2 20 0 175 0 127 215 127 0 0 0 compGeno 0 bigDataUrl /gbdb/wuhCor1/bbi/phylocsf/PhyloCSF+3.bw\ color 0,175,0\ graphTypeDefault bar\ longLabel Smoothed PhyloCSF Strand + Frame 3\ parent PhyloCSF_smooth\ priority 20\ shortLabel Smoothed PhyloCSF+3\ track PhyloCSF_plus_3\ visibility full\ PhyloCSF_minus_1 Smoothed PhyloCSF-1 bigWig Smoothed PhyloCSF Strand - Frame 1 0 20 200 0 0 227 127 127 0 0 0 compGeno 0 bigDataUrl /gbdb/wuhCor1/bbi/phylocsf/PhyloCSF-1.bw\ color 200,0,0\ graphTypeDefault bar\ longLabel Smoothed PhyloCSF Strand - Frame 1\ parent PhyloCSF_smooth\ priority 20\ shortLabel Smoothed PhyloCSF-1\ track PhyloCSF_minus_1\ visibility hide\ PhyloCSF_minus_2 Smoothed PhyloCSF-2 bigWig Smoothed PhyloCSF Strand - Frame 2 0 20 200 0 0 227 127 127 0 0 0 compGeno 0 bigDataUrl /gbdb/wuhCor1/bbi/phylocsf/PhyloCSF-2.bw\ color 200,0,0\ graphTypeDefault bar\ longLabel Smoothed PhyloCSF Strand - Frame 2\ parent PhyloCSF_smooth\ priority 20\ shortLabel Smoothed PhyloCSF-2\ track PhyloCSF_minus_2\ visibility hide\ PhyloCSF_minus_3 Smoothed PhyloCSF-3 bigWig Smoothed PhyloCSF Strand - Frame 3 0 20 200 0 0 227 127 127 0 0 0 compGeno 0 bigDataUrl /gbdb/wuhCor1/bbi/phylocsf/PhyloCSF-3.bw\ color 200,0,0\ graphTypeDefault bar\ longLabel Smoothed PhyloCSF Strand - Frame 3\ parent PhyloCSF_smooth\ priority 20\ shortLabel Smoothed PhyloCSF-3\ track PhyloCSF_minus_3\ visibility hide\ weizmanOrfs Weizman ORFs bed New ORFs based on RNA-seq and Ribo-seq by the Weizman Institute 0 20 0 0 0 127 127 127 0 0 0

Description

\

\ The Weizman ORFs (Open Reading Frames) track shows previously unannotated ORF\ predictions based on Ribo-Seq and RNA-seq data. It is a collection of\ tracks (super track) \ that contains not only the predicted gene models, but also\ data supporting them.

\ \

Display Conventions and Configuration

\ The Predicted ORFs track shows the predicted exons. All other tracks show the signal as \ a x-y plot with bars.\ \

Methods

\

\ Methods from Finkel et al:

\

\ To capture the full SARS-CoV-2 coding capacity, we applied a suite of ribosome\ profiling approaches to Vero cells infected with SARS-CoV-2 for 5 and 24 hours,\ and Calu3 cells infected for 7 hours. For each time point we prepared three\ different ribosome-profiling libraries, each one in two biological replicates.\ Two Ribo-seq libraries facilitate mapping of translation initiation sites, by\ treating cells with lactimidomycin (LTM) or harringtonine (Harr), two drugs\ with distinct mechanisms that prevent 80S ribosomes at translation initiation\ sites from elongating. The third Ribo-seq library was prepared from cells\ treated with the translation elongation inhibitor cycloheximide (CHX), and\ gives a snap-shot of actively translating ribosomes across the body of the\ translated ORF. In parallel, RNA-sequencing was applied to map viral\ transcripts.

\

\ The ORF prediction was done by using two computational tools, PRICE and\ ORF-RATER, that rely on different features of ribosome profiling data, and by\ manual inspection of the data. The predictions are based on Ribo-seq libraries\ from two time points (5 and 7 hpi) of two different cell lines (Vero E6 and\ Calu3 cells), infected with separate virus isolates.

\

\ The Ribo-Seq data of the 24 hours samples do not show the expected profile of\ read distribution on viral genes and therefore were not used for the procedure\ of ORF predictions.

\

For more details see the paper in the References section below.

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Finkel Y, Mizrahi O, Nachshon A, Weingarten-Gabbay S, Morgenstern D, Yahalom-Ronen Y, Tamir H,\ Achdout H, Stein D, Israeli O et al.\ \ The coding capacity of SARS-CoV-2.\ Nature. 2020 Sep 9;.\ PMID: 32906143\

\ \ genes 1 group genes\ longLabel New ORFs based on RNA-seq and Ribo-seq by the Weizman Institute\ priority 20\ shortLabel Weizman ORFs\ superTrack on\ track weizmanOrfs\ type bed\ visibility hide\ Y_bind_avg Y_bind_avg bigWig DMS data for RBD Binding 1 20 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/Y_bind_avg.bw\ longLabel DMS data for RBD Binding\ parent Starr_Bloom_bind\ priority 1\ shortLabel Y_bind_avg\ track Y_bind_avg\ type bigWig\ visibility dense\ Y_expr_avg_Expression Y_expr_avg bigWig DMS data for RBD expression 1 20 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bbi/bloom/expr/Y_expr_avg.bw\ longLabel DMS data for RBD expression\ parent Starr_Bloom\ priority 1\ shortLabel Y_expr_avg\ track Y_expr_avg_Expression\ type bigWig\ visibility dense\ CHX_7hr_1 Calu3 CHX 7hr 1 bigWig Calu3 CHX 7hr 1 2 21 4 90 141 129 172 198 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_chx_07hr_1.bw\ color 4,90,141\ longLabel Calu3 CHX 7hr 1\ maxHeightPixels 124:32:5\ parent Calu3_07hpi on\ priority 21\ shortLabel Calu3 CHX 7hr 1\ track CHX_7hr_1\ type bigWig\ viewLimits 0:10\ visibility full\ igg_COVID_605 COVID 605 bigBed 9 COVID 605 1 21 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_605.bb\ longLabel COVID 605\ parent igg on\ priority 21\ shortLabel COVID 605\ track igg_COVID_605\ type bigBed 9\ igm_Ctrl_NC96 Ctrl NC96 bigBed 9 Ctrl NC96 1 21 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_NC96.bb\ longLabel Ctrl NC96\ parent igm on\ priority 21\ shortLabel Ctrl NC96\ track igm_Ctrl_NC96\ type bigBed 9\ variantAaMuts_XBB_1_16 Omicron XBB.1.16 AA Muts bigBed 4 Omicron XBB.1.16 amino acid mutations from GISAID sequences (Sep 22, 2023) 1 21 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=XBB.1.16 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/XBB.1.16_prot.bb\ color 219,40,35\ longLabel Omicron XBB.1.16 amino acid mutations from GISAID sequences (Sep 22, 2023)\ parent variantMuts on\ priority 20\ shortLabel Omicron XBB.1.16 AA Muts\ subGroups variant=S_XBB116 mutation=AA designation=VOI\ track variantAaMuts_XBB_1_16\ url https://outbreak.info/situation-reports?pango=XBB.1.16\ urlLabel XBB.1.16 Situation Report at outbreak.info\ C_104Total Serum: C, day 104 bigWig Bloom antibody escape - Total Score - Subject C, Day 104 1 21 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/C_104.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject C, Day 104\ parent bloomEscTotal on\ shortLabel Serum: C, day 104\ track C_104Total\ type bigWig\ visibility dense\ CHX_7hr_2 Calu3 CHX 7hr 2 bigWig Calu3 CHX 7hr 2 0 22 4 90 141 129 172 198 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_chx_07hr_2.bw\ color 4,90,141\ longLabel Calu3 CHX 7hr 2\ maxHeightPixels 124:32:5\ parent Calu3_07hpi off\ priority 22\ shortLabel Calu3 CHX 7hr 2\ track CHX_7hr_2\ type bigWig\ viewLimits 0:10\ visibility hide\ igg_COVID_15 COVID 15 bigBed 9 COVID 15 1 22 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_15.bb\ longLabel COVID 15\ parent igg on\ priority 22\ shortLabel COVID 15\ track igg_COVID_15\ type bigBed 9\ igm_COVID_4372 COVID 4372 bigBed 9 COVID 4372 1 22 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_4372.bb\ longLabel COVID 4372\ parent igm on\ priority 22\ shortLabel COVID 4372\ track igm_COVID_4372\ type bigBed 9\ variantNucMuts_XBB_1_16 Omicron XBB.1.16 Nuc Muts bigBed 4 Omicron VOC (XBB.1.16) nucleotide mutations identifed from GISAID sequences (Sep 2023) 1 22 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=XBB.1.16 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/XBB.1.16_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (XBB.1.16) nucleotide mutations identifed from GISAID sequences (Sep 2023)\ parent variantMuts off\ priority 130\ shortLabel Omicron XBB.1.16 Nuc Muts\ subGroups variant=S_XBB116 mutation=NUC designation=VOI\ track variantNucMuts_XBB_1_16\ url https://outbreak.info/situation-reports?pango=XBB.1.16\ urlLabel XBB.1.16 Situation Report at outbreak.info\ D_33Total Serum: D, day 033 bigWig Bloom antibody escape - Total Score - Subject D, Day 33 1 22 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/D_33.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject D, Day 33\ parent bloomEscTotal on\ shortLabel Serum: D, day 033\ track D_33Total\ type bigWig\ visibility dense\ Harr_7hr_1 Calu3 Harr 7hr 1 bigWig Calu3 Harr 7hr 1 2 23 179 0 0 217 127 127 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_harr_07hr_1.bw\ color 179,0,0\ longLabel Calu3 Harr 7hr 1\ maxHeightPixels 124:32:5\ parent Calu3_07hpi on\ priority 23\ shortLabel Calu3 Harr 7hr 1\ track Harr_7hr_1\ type bigWig\ viewLimits 0:10\ visibility full\ igg_COVID_17 COVID 17 bigBed 9 COVID 17 1 23 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_17.bb\ longLabel COVID 17\ parent igg on\ priority 23\ shortLabel COVID 17\ track igg_COVID_17\ type bigBed 9\ igm_COVID_531 COVID 531 bigBed 9 COVID 531 1 23 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_531.bb\ longLabel COVID 531\ parent igm on\ priority 23\ shortLabel COVID 531\ track igm_COVID_531\ type bigBed 9\ variantAaMuts_EG_5_1 Omicron EG.5.1 AA Muts bigBed 4 Omicron EG.5.1 amino acid mutations from GISAID sequences (Sep 22, 2023) 1 23 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=EG.5.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/EG.5.1_prot.bb\ color 219,40,35\ longLabel Omicron EG.5.1 amino acid mutations from GISAID sequences (Sep 22, 2023)\ parent variantMuts on\ priority 24\ shortLabel Omicron EG.5.1 AA Muts\ subGroups variant=W_EG51 mutation=AA designation=VOI\ track variantAaMuts_EG_5_1\ url https://outbreak.info/situation-reports?pango=EG.5.1\ urlLabel EG.5.1 Situation Report at outbreak.info\ D_76Total Serum: D, day 076 bigWig Bloom antibody escape - Total Score - Subject D, Day 76 1 23 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/D_76.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject D, Day 76\ parent bloomEscTotal on\ shortLabel Serum: D, day 076\ track D_76Total\ type bigWig\ visibility dense\ Harr_7hr_2 Calu3 Harr 7hr 2 bigWig Calu3 Harr 7hr 2 0 24 179 0 0 217 127 127 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_harr_07hr_2.bw\ color 179,0,0\ longLabel Calu3 Harr 7hr 2\ maxHeightPixels 124:32:5\ parent Calu3_07hpi off\ priority 24\ shortLabel Calu3 Harr 7hr 2\ track Harr_7hr_2\ type bigWig\ viewLimits 0:10\ visibility hide\ igg_COVID_414 COVID 414 bigBed 9 COVID 414 1 24 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_414.bb\ longLabel COVID 414\ parent igg on\ priority 24\ shortLabel COVID 414\ track igg_COVID_414\ type bigBed 9\ igm_Ctrl_NC95 Ctrl NC95 bigBed 9 Ctrl NC95 1 24 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_NC95.bb\ longLabel Ctrl NC95\ parent igm on\ priority 24\ shortLabel Ctrl NC95\ track igm_Ctrl_NC95\ type bigBed 9\ variantNucMuts_EG_5_1 Omicron EG.5.1 Nuc Muts bigBed 4 Omicron VOC (EG.5.1) nucleotide mutations identifed from GISAID sequences (Sep 2023) 1 24 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=EG.5.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/EG.5.1_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (EG.5.1) nucleotide mutations identifed from GISAID sequences (Sep 2023)\ parent variantMuts off\ priority 134\ shortLabel Omicron EG.5.1 Nuc Muts\ subGroups variant=W_EG51 mutation=NUC designation=VOI\ track variantNucMuts_EG_5_1\ url https://outbreak.info/situation-reports?pango=EG.5.1\ urlLabel EG.5.1 Situation Report at outbreak.info\ E_28Total Serum: E, day 028 bigWig Bloom antibody escape - Total Score - Subject E, Day 28 1 24 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/E_28.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject E, Day 28\ parent bloomEscTotal on\ shortLabel Serum: E, day 028\ track E_28Total\ type bigWig\ visibility dense\ LTM_7hr_1 Calu3 LTM 7hr 1 bigWig Calu3 LTM 7hr 1 2 25 35 139 69 145 197 162 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_ltm_07hr_1.bw\ color 35,139,69\ longLabel Calu3 LTM 7hr 1\ maxHeightPixels 124:32:5\ parent Calu3_07hpi on\ priority 25\ shortLabel Calu3 LTM 7hr 1\ track LTM_7hr_1\ type bigWig\ viewLimits 0:10\ visibility full\ igg_COVID_4372 COVID 4372 bigBed 9 COVID 4372 1 25 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_4372.bb\ longLabel COVID 4372\ parent igg on\ priority 25\ shortLabel COVID 4372\ track igg_COVID_4372\ type bigBed 9\ igm_Ctrl_NC66 Ctrl NC66 bigBed 9 Ctrl NC66 1 25 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_NC66.bb\ longLabel Ctrl NC66\ parent igm on\ priority 25\ shortLabel Ctrl NC66\ track igm_Ctrl_NC66\ type bigBed 9\ variantAaMuts_XBB_1_5_70 Omicron XBB.1.5.70 AA Muts bigBed 4 Omicron XBB.1.5.70 amino acid mutations from GISAID sequences (Jan 29, 2024) 1 25 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=XBB.1.5.70 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/XBB.1.5.70_prot.bb\ color 219,40,35\ longLabel Omicron XBB.1.5.70 amino acid mutations from GISAID sequences (Jan 29, 2024)\ parent variantMuts off\ priority 25\ shortLabel Omicron XBB.1.5.70 AA Muts\ subGroups variant=X_XBB1570 mutation=AA designation=VOI\ track variantAaMuts_XBB_1_5_70\ url https://outbreak.info/situation-reports?pango=XBB.1.5.70\ urlLabel XBB.1.5.70 Situation Report at outbreak.info\ E_104Total Serum: E, day 104 bigWig Bloom antibody escape - Total Score - Subject E, Day 104 1 25 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/E_104.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject E, Day 104\ parent bloomEscTotal on\ shortLabel Serum: E, day 104\ track E_104Total\ type bigWig\ visibility dense\ LTM_7hr_2 Calu3 LTM 7hr 2 bigWig Calu3 LTM 7hr 2 0 26 35 139 69 145 197 162 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_ltm_07hr_2.bw\ color 35,139,69\ longLabel Calu3 LTM 7hr 2\ maxHeightPixels 124:32:5\ parent Calu3_07hpi off\ priority 26\ shortLabel Calu3 LTM 7hr 2\ track LTM_7hr_2\ type bigWig\ viewLimits 0:10\ visibility hide\ igg_COVID_534 COVID 534 bigBed 9 COVID 534 1 26 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_534.bb\ longLabel COVID 534\ parent igg on\ priority 26\ shortLabel COVID 534\ track igg_COVID_534\ type bigBed 9\ igm_Ctrl_NC97 Ctrl NC97 bigBed 9 Ctrl NC97 1 26 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_NC97.bb\ longLabel Ctrl NC97\ parent igm on\ priority 26\ shortLabel Ctrl NC97\ track igm_Ctrl_NC97\ type bigBed 9\ variantNucMuts_XBB_1_5_70 Omicron XBB.1.5.70 Nuc Muts bigBed 4 Omicron VOC (XBB.1.5.70) nucleotide mutations identifed from GISAID sequences (Jan 2024) 1 26 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=XBB.1.5.70 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/XBB.1.5.70_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (XBB.1.5.70) nucleotide mutations identifed from GISAID sequences (Jan 2024)\ parent variantMuts off\ priority 135\ shortLabel Omicron XBB.1.5.70 Nuc Muts\ subGroups variant=X_XBB1570 mutation=NUC designation=VOI\ track variantNucMuts_XBB_1_5_70\ url https://outbreak.info/situation-reports?pango=XBB.1.5.70\ urlLabel XBB.1.5.70 Situation Report at outbreak.info\ F_48Total Serum: F, day 048 bigWig Bloom antibody escape - Total Score - Subject F, Day 48 1 26 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/F_48.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject F, Day 48\ parent bloomEscTotal on\ shortLabel Serum: F, day 048\ track F_48Total\ type bigWig\ visibility dense\ mRNA-seq_7hr_1 Calu3 mRNA 7hr 1 bigWig Calu3 mRNA 7hr 1 2 27 63 0 125 159 127 190 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/mrna_07hr_1.bw\ color 63,0,125\ longLabel Calu3 mRNA 7hr 1\ maxHeightPixels 124:32:5\ parent Calu3_07hpi on\ priority 27\ shortLabel Calu3 mRNA 7hr 1\ track mRNA-seq_7hr_1\ type bigWig\ viewLimits 0:10\ visibility full\ igg_COVID_16 COVID 16 bigBed 9 COVID 16 1 27 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_16.bb\ longLabel COVID 16\ parent igg on\ priority 27\ shortLabel COVID 16\ track igg_COVID_16\ type bigBed 9\ igm_Ctrl_LC171 Ctrl LC171 bigBed 9 Ctrl LC171 1 27 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_LC171.bb\ longLabel Ctrl LC171\ parent igm on\ priority 27\ shortLabel Ctrl LC171\ track igm_Ctrl_LC171\ type bigBed 9\ variantAaMuts_HK_3 Omicron HK.3 AA Muts bigBed 4 Omicron HK.3 amino acid mutations from GISAID sequences (Jan 29, 2024) 1 27 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=HK.3 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/HK.3_prot.bb\ color 219,40,35\ longLabel Omicron HK.3 amino acid mutations from GISAID sequences (Jan 29, 2024)\ parent variantMuts off\ priority 26\ shortLabel Omicron HK.3 AA Muts\ subGroups variant=Y_HK3 mutation=AA designation=VOI\ track variantAaMuts_HK_3\ url https://outbreak.info/situation-reports?pango=HK.3\ urlLabel HK.3 Situation Report at outbreak.info\ F_115Total Serum: F, day 115 bigWig Bloom antibody escape - Total Score - Subject F, Day 115 1 27 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/F_115.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject F, Day 115\ parent bloomEscTotal on\ shortLabel Serum: F, day 115\ track F_115Total\ type bigWig\ visibility dense\ mRNA-seq_7hr_2 Calu3 mRNA 7hr 2 bigWig Calu3 mRNA 7hr 2 0 28 63 0 125 159 127 190 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/mrna_07hr_2.bw\ color 63,0,125\ longLabel Calu3 mRNA 7hr 2\ maxHeightPixels 124:32:5\ parent Calu3_07hpi off\ priority 28\ shortLabel Calu3 mRNA 7hr 2\ track mRNA-seq_7hr_2\ type bigWig\ viewLimits 0:10\ visibility hide\ igm_COVID_523 COVID 523 bigBed 9 COVID 523 1 28 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_523.bb\ longLabel COVID 523\ parent igm on\ priority 28\ shortLabel COVID 523\ track igm_COVID_523\ type bigBed 9\ igg_COVID_608 COVID 608 bigBed 9 COVID 608 1 28 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_608.bb\ longLabel COVID 608\ parent igg on\ priority 28\ shortLabel COVID 608\ track igg_COVID_608\ type bigBed 9\ variantNucMuts_HK_3 Omicron HK.3 Nuc Muts bigBed 4 Omicron VOC (HK.3) nucleotide mutations identifed from GISAID sequences (Jan 2024) 1 28 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=HK.3 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/HK.3_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (HK.3) nucleotide mutations identifed from GISAID sequences (Jan 2024)\ parent variantMuts off\ priority 136\ shortLabel Omicron HK.3 Nuc Muts\ subGroups variant=Y_HK3 mutation=NUC designation=VOI\ track variantNucMuts_HK_3\ url https://outbreak.info/situation-reports?pango=HK.3\ urlLabel HK.3 Situation Report at outbreak.info\ G_18Total Serum: G, day 018 bigWig Bloom antibody escape - Total Score - Subject G, Day 18 1 28 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/G_18.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject G, Day 18\ parent bloomEscTotal on\ shortLabel Serum: G, day 018\ track G_18Total\ type bigWig\ visibility dense\ igg_COVID_523 COVID 523 bigBed 9 COVID 523 1 29 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_523.bb\ longLabel COVID 523\ parent igg on\ priority 29\ shortLabel COVID 523\ track igg_COVID_523\ type bigBed 9\ igm_COVID_527 COVID 527 bigBed 9 COVID 527 1 29 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_527.bb\ longLabel COVID 527\ parent igm on\ priority 29\ shortLabel COVID 527\ track igm_COVID_527\ type bigBed 9\ variantAaMuts_JN_1 Omicron JN.1 AA Muts bigBed 4 Omicron JN.1 amino acid mutations from GISAID sequences (Jan 29, 2024) 1 29 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=JN.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/JN.1_prot.bb\ color 219,40,35\ longLabel Omicron JN.1 amino acid mutations from GISAID sequences (Jan 29, 2024)\ parent variantMuts on\ priority 28\ shortLabel Omicron JN.1 AA Muts\ subGroups variant=ZA_JN1 mutation=AA designation=VOI\ track variantAaMuts_JN_1\ url https://outbreak.info/situation-reports?pango=JN.1\ urlLabel JN.1 Situation Report at outbreak.info\ G_94Total Serum: G, day 094 bigWig Bloom antibody escape - Total Score - Subject G, Day 94 1 29 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/G_94.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject G, Day 94\ parent bloomEscTotal on\ shortLabel Serum: G, day 094\ track G_94Total\ type bigWig\ visibility dense\ nextstrainFreq20A 20A bigWig Nextstrain, 20A clade: Alternate allele frequency 1 30 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20A.bigWig\ longLabel Nextstrain, 20A clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 30\ shortLabel 20A\ subGroups view=newClades\ track nextstrainFreq20A\ type bigWig\ visibility dense\ nextstrainSamples20A 20A Mutations vcfTabix Mutations in Clade 20A Nextstrain Subset of GISAID EpiCoV TM Samples 0 30 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20A.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain20A.nh\ longLabel Mutations in Clade 20A Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades\ priority 30\ shortLabel 20A Mutations\ subGroups view=newClades\ track nextstrainSamples20A\ sarsCov2PhyloFull All (slow!) vcfTabix All Nucleotide Substitution Mutations in GISAID EpiCov TM Sequences (slow at whole-genome scale) 0 30 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/sarsCov2Phylo/gisaid.vcf.gz\ longLabel All Nucleotide Substitution Mutations in GISAID EpiCov TM Sequences (slow at whole-genome scale)\ parent sarsCov2Phylo off\ priority 30\ shortLabel All (slow!)\ track sarsCov2PhyloFull\ type vcfTabix\ abEscape Antibody Escape bigBed Escape from serum or monoclonal antibodies: Whelan, Bloom and Rappuoli groups 0 30 0 0 0 127 127 127 0 0 0

Description

\

\ The subtracks of this track show mutations that lead to escape from patient serum antibodies or monoclonal\ antibodies. Most of the mutations assayed were in the receptor binding domain (RBD) of the S protein. \ The data shown here were imported from different studies, listed below. The\ Bloom lab papers used deep mutational scanning data to measure the effect of all \ possible mutations in the Spike RBD using a yeast surface display system.

\
    \
  1. Bloom lab - patients A-K: antibodies in sera from the Hospitalized or Ambulatory Adults with Respiratory Viral\ Infections (HAARVI) cohort, described in \ Greaney et al., Biorxiv 2021.
  2. \ \
  3. Bloom lab - 10 antibodies: A selection of ten monoclonal antibodies, described\ in Greaney et al, Cell Host Microbe 2020.\
  4. \ \
  5. Bloom lab - 4 treatment antibodies: Four monoclonal antibodies licensed for treatment.\ The results were described in Starr et al,\ Biorxiv 2021.\
  6. \ \
  7. Whelan lab - 21 antibodies: a selection screen of 21 neutralizing monoclonal\ antibodies (mAbs) against the receptor binding domain (RBD) generated 48 escape\ mutants. The results were described in \ Liu et al, Biorxiv 2020.\ \
  8. Rappuoli lab - serum from one patient: three mutations obtained by passaging of \ cells in neutralizing serum from a single patient, described in Andreano et al, Biorxiv 2021.\
  9. McCoy lab - mutations tested on monoclonal antibodies and patient sera, described in \ Rees-Spear et al, Biorxiv 2021.\
\

\ \

\ For the Bloom lab data, we show just a summary of the data. Better and detailed structural\ visualizations are available from the authors via dms-view using the following links:\ patient sera,\ 10 monoclonal antibodies,\ 4 treatment antibodies.\

\ \

Display Conventions and Configuration

\

Bloom lab data

\

Scores represent the "escape fraction" (discussed at length in the Methods \ of the paper) which "represent the fraction of a given variant that escape antibody \ binding, and should in principle range from 0 to 1.". \ "Note that the magnitude of the measured effects of mutations on antibody escape depends on\ the antibody concentration and the flow cytometry gates applied, meaning that the\ escape fractions are comparable across sites for any given antibody, but are not precisely\ comparable among antibodies without external calibration."

\ \ \

A higher score indicates a greater level of escape.

\ \

The data summarized to protein positions are shown as 36 subtracks, one per sample, that indicate\ the maximum score per amino acid position that was assayed as shades of color\ or, in full mode, as a x-y barplot. Blue subtracks show data from monoclonal\ antibodies, red ones from patient sera. By configuring the current track (click\ on "Antibody escape" under the image), one can display the total sum of all\ scores per amino acid.

\ \

The data is summarized as two x-y barplots, as the average values per amino acid,\ again in red (sera) and blue (MABs). Finally, another summary track has one feature\ per position where the score exceeds 0.18. These features are clickable and the details page\ show the exact amino acid changes and their scores. \ \

Whelan lab data

\

Features are labeled with the nucleotide and protein coordinates and the name of the antibody. \ Click a feature or mouse-over a feature to show these annotations.

\ \

Rappuoli lab data

\

The three mutations are labeled with the protein coordinates.

\ \

McCoy lab data

\

Features are labeled with the amino acid mutation coordinates. \ Click a feature or mouse-over a feature to show a description on the specific mutation.

\ \

Methods

\

\ Patient sera: data was downloaded from the jbloomlab Github file and parsed into bedGraph format.

\ \

10 Antibodies: Table S1 from Starr et al, was downloaded and parsed into bedGraph format.

\ \

4 treatment antibodies: Data was downloaded from the jbloomlab Github file and parsed into bedGraph format using the total and maximum values.

\ \

21 Antibodies: Table 2 from Liu et al 2020, was copied manually and converted to bedGraph format.

\ \

For the Rappuoli lab, the mutations were manually copied from the text.

\ \ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Greaney AJ, Loes AN, Crawford K, Starr T, Malone K, Chu H, Bloom JD.\ \ Comprehensive mapping of mutations to the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human serum antibodies\ .\ Biorxiv. 2021 Jan 04;.\

\ \ Greaney AJ, Starr TN, Gilchuk P, Zost SJ, Binshtein E, Loes AN, Hilton SK, Huddleston J, Eguia R,\ Crawford KHD et al.\ \ Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody\ Recognition.\ Cell Host Microbe. 2020 Nov 19;.\ PMID: 33259788; PMC: PMC7676316\

\ \

\ Zhuoming Liu, Laura A. VanBlargan, Paul W. Rothlauf, Louis-Marie Bloyet, Rita E. Chen, Spencer Stumpf, Haiyan Zhao, John M. Errico, Elitza S. Theel, Ali H. Ellebedy, Daved H. Fremont, Michael S. Diamond, Sean P. J. Whelan\ \ Landscape analysis of escape variants identifies SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization.\ Biorxiv. 2020 \

\ \

\ Starr TN, Greaney AJ, Addetia A, Hannon WW, Choudhary MC, Dingens AS, Li JZ, Bloom JD.\ \ Prospective mapping of viral mutations that escape antibodies used to treat COVID-19.\ bioRxiv. 2020 Dec 1;.\ PMID: 33299993; PMC: PMC7724661\

\

\ Andreano E, Piccini G, Licastro D, Casalino L, Johnson NV, Paciello I, Monego SD, Pantano E,\ Manganaro N, Manenti A et al.\ \ SARS-CoV-2 escape <i>in vitro</i> from a highly neutralizing COVID-19 convalescent\ plasma.\ bioRxiv. 2020 Dec 28;.\ PMID: 33398278; PMC: PMC7781313\

\ immu 1 group immu\ longLabel Escape from serum or monoclonal antibodies: Whelan, Bloom and Rappuoli groups\ priority 30\ shortLabel Antibody Escape\ superTrack on show\ track abEscape\ type bigBed\ galaxyEnaQ2Ay-6 AY.6 mutations bigBed 8 + Mutations (amino acid level) in AY.6 between 2021-07-05 and 2021-10-05 1 30 0 99 116 127 177 185 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q2/02_AY.6_data.bb\ color 0,99,116\ html galaxyEna\ longLabel Mutations (amino acid level) in AY.6 between 2021-07-05 and 2021-10-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q2_tracks on\ priority 30\ shortLabel AY.6 mutations\ spectrum on\ track galaxyEnaQ2Ay-6\ type bigBed 8 +\ galaxyEnaQ3B-1-617-2 B.1.617.2 mutations bigBed 8 + Mutations (amino acid level) in B.1.617.2 between 2021-04-05 and 2021-07-05 1 30 89 47 13 172 151 134 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q3/02_B.1.617.2_data.bb\ color 89,47,13\ html galaxyEna\ longLabel Mutations (amino acid level) in B.1.617.2 between 2021-04-05 and 2021-07-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q3_tracks on\ priority 30\ shortLabel B.1.617.2 mutations\ spectrum on\ track galaxyEnaQ3B-1-617-2\ type bigBed 8 +\ galaxyEnaQ1Ba-1-17 BA.1.17 mutations bigBed 8 + Mutations (amino acid level) in BA.1.17 between 2021-10-05 and 2022-01-05 1 30 60 60 60 157 157 157 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q1/02_BA.1.17_data.bb\ color 60,60,60\ html galaxyEna\ longLabel Mutations (amino acid level) in BA.1.17 between 2021-10-05 and 2022-01-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q1_tracks on\ priority 30\ shortLabel BA.1.17 mutations\ spectrum on\ track galaxyEnaQ1Ba-1-17\ type bigBed 8 +\ galaxyEnaQ0Ba-1-17 BA.1.17 mutations bigBed 8 + Mutations (amino acid level) in BA.1.17 between 2022-01-05 and 2022-03-01 3 30 60 60 60 157 157 157 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q0/02_BA.1.17_data.bb\ color 60,60,60\ html galaxyEna\ longLabel Mutations (amino acid level) in BA.1.17 between 2022-01-05 and 2022-03-01\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q0_tracks on\ priority 30\ shortLabel BA.1.17 mutations\ spectrum on\ track galaxyEnaQ0Ba-1-17\ type bigBed 8 +\ igg_COVID_408 COVID 408 bigBed 9 COVID 408 1 30 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_408.bb\ longLabel COVID 408\ parent igg on\ priority 30\ shortLabel COVID 408\ track igg_COVID_408\ type bigBed 9\ igm_COVID_414 COVID 414 bigBed 9 COVID 414 1 30 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_414.bb\ longLabel COVID 414\ parent igm on\ priority 30\ shortLabel COVID 414\ track igm_COVID_414\ type bigBed 9\ Q2_tracks Galaxy ENA mutations in top lineages - two quarters ago bigBed 8 + Most frequent lineages of two quarters ago 1 30 0 0 0 127 127 127 0 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 allButtonPair on\ compositeTrack on\ filter.withinLineageFrequency 0.05\ filterByRange.withinLineageFrequency on\ filterLimits.withinLineageFrequency 0:2\ filterType.countries multipleListOr\ filterValues.countries EE|Estonia,GB|United Kingdom,GR|Greece,IE|Ireland,ZA|South Africa\ filterValuesDefault.countries EE,GB,GR,ZA\ html galaxyEna\ longLabel Most frequent lineages of two quarters ago\ parent galaxyEna\ priority 30\ shortLabel Galaxy ENA mutations in top lineages - two quarters ago\ track Q2_tracks\ type bigBed 8 +\ visibility dense\ sarsCov2PhyloPubAllFull No Min AF (slow!) vcfTabix All Nucleotide Substitution Mutations in Public Sequences (slow at whole-genome scale) 0 30 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/sarsCov2PhyloPub/public.all.vcf.gz\ longLabel All Nucleotide Substitution Mutations in Public Sequences (slow at whole-genome scale)\ parent sarsCov2PhyloPub off\ priority 30\ shortLabel No Min AF (slow!)\ track sarsCov2PhyloPubAllFull\ type vcfTabix\ variantNucMuts_JN_1 Omicron JN.1 Nuc Muts bigBed 4 Omicron VOC (JN.1) nucleotide mutations identifed from GISAID sequences (Jan 2024) 1 30 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=JN.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/JN.1_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (JN.1) nucleotide mutations identifed from GISAID sequences (Jan 2024)\ parent variantMuts off\ priority 138\ shortLabel Omicron JN.1 Nuc Muts\ subGroups variant=ZA_JN1 mutation=NUC designation=VOI\ track variantNucMuts_JN_1\ url https://outbreak.info/situation-reports?pango=JN.1\ urlLabel JN.1 Situation Report at outbreak.info\ pdb PDB Structures bigBed 12 Protein Data Bank (PDB) Sequence Matches 0 30 0 0 0 127 127 127 0 0 0 https://www.ebi.ac.uk/pdbe/pdbe-kb/covid19/$$

Description

\ \

This track shows alignments of sequences with known protein structures in the \ Protein Data Bank (PDB).\ The PDB protein sequence has to match the genome over at least 80% of its length,\ so somewhat similar sequences, e.g. from the SARS 2003 outbreak, are also shown.

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of PDB matches are labeled with the accession number. \ A click on them shows a standard feature detail page with the PDB page integrated into it. \ The protein structure is shown on the PDB page.

\ \

Methods

\

\ PDB sequences were downloaded from the \ PDB website and aligned \ with BLAT. Only alignments with a minimum identity of 80%\ that span at least 80% of the query sequence were kept.

\ \

References

\

\ Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE.\ \ The Protein Data Bank.\ Nucleic Acids Res. 2000 Jan 1;28(1):235-42.\ PMID: 10592235; PMC: PMC102472\

\ genes 1 bigDataUrl /gbdb/wuhCor1/bbi/pdb.bb\ exonNumbers off\ group genes\ iframeOptions width='1000' height='800' scrolling='yes' style='margin-top:10px'\ iframeUrl https://www.rcsb.org/structure/$$\ longLabel Protein Data Bank (PDB) Sequence Matches\ priority 30\ shortLabel PDB Structures\ track pdb\ type bigBed 12\ url https://www.ebi.ac.uk/pdbe/pdbe-kb/covid19/$$\ urlLabel Link to PDB Covid Portal at the EBI\ visibility hide\ H_61Total Serum: H, day 061 bigWig Bloom antibody escape - Total Score - Subject H, Day 61 1 30 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/H_61.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject H, Day 61\ parent bloomEscTotal on\ shortLabel Serum: H, day 061\ track H_61Total\ type bigWig\ visibility dense\ Vero6_24hpi Vero6 24hpi bigWig Vero6 24hpi Ribo-seq and RNA-seq 0 30 0 0 0 127 127 127 0 0 0

Description

\

\ The Weizman ORFs (Open Reading Frames) track shows previously unannotated ORF\ predictions based on Ribo-Seq and RNA-seq data. It is a collection of\ tracks (super track) \ that contains not only the predicted gene models, but also\ data supporting them.

\ \

Display Conventions and Configuration

\ The Predicted ORFs track shows the predicted exons. All other tracks show the signal as \ a x-y plot with bars.\ \

Methods

\

\ Methods from Finkel et al:

\

\ To capture the full SARS-CoV-2 coding capacity, we applied a suite of ribosome\ profiling approaches to Vero cells infected with SARS-CoV-2 for 5 and 24 hours,\ and Calu3 cells infected for 7 hours. For each time point we prepared three\ different ribosome-profiling libraries, each one in two biological replicates.\ Two Ribo-seq libraries facilitate mapping of translation initiation sites, by\ treating cells with lactimidomycin (LTM) or harringtonine (Harr), two drugs\ with distinct mechanisms that prevent 80S ribosomes at translation initiation\ sites from elongating. The third Ribo-seq library was prepared from cells\ treated with the translation elongation inhibitor cycloheximide (CHX), and\ gives a snap-shot of actively translating ribosomes across the body of the\ translated ORF. In parallel, RNA-sequencing was applied to map viral\ transcripts.

\

\ The ORF prediction was done by using two computational tools, PRICE and\ ORF-RATER, that rely on different features of ribosome profiling data, and by\ manual inspection of the data. The predictions are based on Ribo-seq libraries\ from two time points (5 and 7 hpi) of two different cell lines (Vero E6 and\ Calu3 cells), infected with separate virus isolates.

\

\ The Ribo-Seq data of the 24 hours samples do not show the expected profile of\ read distribution on viral genes and therefore were not used for the procedure\ of ORF predictions.

\

For more details see the paper in the References section below.

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Finkel Y, Mizrahi O, Nachshon A, Weingarten-Gabbay S, Morgenstern D, Yahalom-Ronen Y, Tamir H,\ Achdout H, Stein D, Israeli O et al.\ \ The coding capacity of SARS-CoV-2.\ Nature. 2020 Sep 9;.\ PMID: 32906143\

\ \ genes 0 compositeTrack on\ group genes\ html weizmanOrfs\ longLabel Vero6 24hpi Ribo-seq and RNA-seq\ parent weizmanOrfs off\ priority 30\ shortLabel Vero6 24hpi\ track Vero6_24hpi\ type bigWig\ visibility hide\ bloomEscSerumAvg Bloom Serum Average bigWig Bloom Lab: S RBD-mutation patient serum antibody escape - average score across serum samples (patients A-K) 1 30.1 255 0 0 255 127 127 0 0 0

Description

\

\ The subtracks of this track show mutations that lead to escape from patient serum antibodies or monoclonal\ antibodies. Most of the mutations assayed were in the receptor binding domain (RBD) of the S protein. \ The data shown here were imported from different studies, listed below. The\ Bloom lab papers used deep mutational scanning data to measure the effect of all \ possible mutations in the Spike RBD using a yeast surface display system.

\
    \
  1. Bloom lab - patients A-K: antibodies in sera from the Hospitalized or Ambulatory Adults with Respiratory Viral\ Infections (HAARVI) cohort, described in \ Greaney et al., Biorxiv 2021.
  2. \ \
  3. Bloom lab - 10 antibodies: A selection of ten monoclonal antibodies, described\ in Greaney et al, Cell Host Microbe 2020.\
  4. \ \
  5. Bloom lab - 4 treatment antibodies: Four monoclonal antibodies licensed for treatment.\ The results were described in Starr et al,\ Biorxiv 2021.\
  6. \ \
  7. Whelan lab - 21 antibodies: a selection screen of 21 neutralizing monoclonal\ antibodies (mAbs) against the receptor binding domain (RBD) generated 48 escape\ mutants. The results were described in \ Liu et al, Biorxiv 2020.\ \
  8. Rappuoli lab - serum from one patient: three mutations obtained by passaging of \ cells in neutralizing serum from a single patient, described in Andreano et al, Biorxiv 2021.\
  9. McCoy lab - mutations tested on monoclonal antibodies and patient sera, described in \ Rees-Spear et al, Biorxiv 2021.\
\

\ \

\ For the Bloom lab data, we show just a summary of the data. Better and detailed structural\ visualizations are available from the authors via dms-view using the following links:\ patient sera,\ 10 monoclonal antibodies,\ 4 treatment antibodies.\

\ \

Display Conventions and Configuration

\

Bloom lab data

\

Scores represent the "escape fraction" (discussed at length in the Methods \ of the paper) which "represent the fraction of a given variant that escape antibody \ binding, and should in principle range from 0 to 1.". \ "Note that the magnitude of the measured effects of mutations on antibody escape depends on\ the antibody concentration and the flow cytometry gates applied, meaning that the\ escape fractions are comparable across sites for any given antibody, but are not precisely\ comparable among antibodies without external calibration."

\ \ \

A higher score indicates a greater level of escape.

\ \

The data summarized to protein positions are shown as 36 subtracks, one per sample, that indicate\ the maximum score per amino acid position that was assayed as shades of color\ or, in full mode, as a x-y barplot. Blue subtracks show data from monoclonal\ antibodies, red ones from patient sera. By configuring the current track (click\ on "Antibody escape" under the image), one can display the total sum of all\ scores per amino acid.

\ \

The data is summarized as two x-y barplots, as the average values per amino acid,\ again in red (sera) and blue (MABs). Finally, another summary track has one feature\ per position where the score exceeds 0.18. These features are clickable and the details page\ show the exact amino acid changes and their scores. \ \

Whelan lab data

\

Features are labeled with the nucleotide and protein coordinates and the name of the antibody. \ Click a feature or mouse-over a feature to show these annotations.

\ \

Rappuoli lab data

\

The three mutations are labeled with the protein coordinates.

\ \

McCoy lab data

\

Features are labeled with the amino acid mutation coordinates. \ Click a feature or mouse-over a feature to show a description on the specific mutation.

\ \

Methods

\

\ Patient sera: data was downloaded from the jbloomlab Github file and parsed into bedGraph format.

\ \

10 Antibodies: Table S1 from Starr et al, was downloaded and parsed into bedGraph format.

\ \

4 treatment antibodies: Data was downloaded from the jbloomlab Github file and parsed into bedGraph format using the total and maximum values.

\ \

21 Antibodies: Table 2 from Liu et al 2020, was copied manually and converted to bedGraph format.

\ \

For the Rappuoli lab, the mutations were manually copied from the text.

\ \ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Greaney AJ, Loes AN, Crawford K, Starr T, Malone K, Chu H, Bloom JD.\ \ Comprehensive mapping of mutations to the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human serum antibodies\ .\ Biorxiv. 2021 Jan 04;.\

\ \ Greaney AJ, Starr TN, Gilchuk P, Zost SJ, Binshtein E, Loes AN, Hilton SK, Huddleston J, Eguia R,\ Crawford KHD et al.\ \ Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody\ Recognition.\ Cell Host Microbe. 2020 Nov 19;.\ PMID: 33259788; PMC: PMC7676316\

\ \

\ Zhuoming Liu, Laura A. VanBlargan, Paul W. Rothlauf, Louis-Marie Bloyet, Rita E. Chen, Spencer Stumpf, Haiyan Zhao, John M. Errico, Elitza S. Theel, Ali H. Ellebedy, Daved H. Fremont, Michael S. Diamond, Sean P. J. Whelan\ \ Landscape analysis of escape variants identifies SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization.\ Biorxiv. 2020 \

\ \

\ Starr TN, Greaney AJ, Addetia A, Hannon WW, Choudhary MC, Dingens AS, Li JZ, Bloom JD.\ \ Prospective mapping of viral mutations that escape antibodies used to treat COVID-19.\ bioRxiv. 2020 Dec 1;.\ PMID: 33299993; PMC: PMC7724661\

\

\ Andreano E, Piccini G, Licastro D, Casalino L, Johnson NV, Paciello I, Monego SD, Pantano E,\ Manganaro N, Manenti A et al.\ \ SARS-CoV-2 escape <i>in vitro</i> from a highly neutralizing COVID-19 convalescent\ plasma.\ bioRxiv. 2020 Dec 28;.\ PMID: 33398278; PMC: PMC7781313\

\ immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/serumAvg.bw\ color 255,0,0\ html abEscape\ longLabel Bloom Lab: S RBD-mutation patient serum antibody escape - average score across serum samples (patients A-K)\ maxHeightPixels 100:28:8\ parent abEscape\ priority 30.1\ shortLabel Bloom Serum Average\ track bloomEscSerumAvg\ type bigWig\ viewLimits 0:26\ visibility dense\ bloomEscMabAvg Bloom MAB Average bigWig Bloom Lab: S RBD-mutation monoclonal antibody escape - average score across all 13 MAB samples 0 30.2 0 0 255 127 127 255 0 0 0

Description

\

\ The subtracks of this track show mutations that lead to escape from patient serum antibodies or monoclonal\ antibodies. Most of the mutations assayed were in the receptor binding domain (RBD) of the S protein. \ The data shown here were imported from different studies, listed below. The\ Bloom lab papers used deep mutational scanning data to measure the effect of all \ possible mutations in the Spike RBD using a yeast surface display system.

\
    \
  1. Bloom lab - patients A-K: antibodies in sera from the Hospitalized or Ambulatory Adults with Respiratory Viral\ Infections (HAARVI) cohort, described in \ Greaney et al., Biorxiv 2021.
  2. \ \
  3. Bloom lab - 10 antibodies: A selection of ten monoclonal antibodies, described\ in Greaney et al, Cell Host Microbe 2020.\
  4. \ \
  5. Bloom lab - 4 treatment antibodies: Four monoclonal antibodies licensed for treatment.\ The results were described in Starr et al,\ Biorxiv 2021.\
  6. \ \
  7. Whelan lab - 21 antibodies: a selection screen of 21 neutralizing monoclonal\ antibodies (mAbs) against the receptor binding domain (RBD) generated 48 escape\ mutants. The results were described in \ Liu et al, Biorxiv 2020.\ \
  8. Rappuoli lab - serum from one patient: three mutations obtained by passaging of \ cells in neutralizing serum from a single patient, described in Andreano et al, Biorxiv 2021.\
  9. McCoy lab - mutations tested on monoclonal antibodies and patient sera, described in \ Rees-Spear et al, Biorxiv 2021.\
\

\ \

\ For the Bloom lab data, we show just a summary of the data. Better and detailed structural\ visualizations are available from the authors via dms-view using the following links:\ patient sera,\ 10 monoclonal antibodies,\ 4 treatment antibodies.\

\ \

Display Conventions and Configuration

\

Bloom lab data

\

Scores represent the "escape fraction" (discussed at length in the Methods \ of the paper) which "represent the fraction of a given variant that escape antibody \ binding, and should in principle range from 0 to 1.". \ "Note that the magnitude of the measured effects of mutations on antibody escape depends on\ the antibody concentration and the flow cytometry gates applied, meaning that the\ escape fractions are comparable across sites for any given antibody, but are not precisely\ comparable among antibodies without external calibration."

\ \ \

A higher score indicates a greater level of escape.

\ \

The data summarized to protein positions are shown as 36 subtracks, one per sample, that indicate\ the maximum score per amino acid position that was assayed as shades of color\ or, in full mode, as a x-y barplot. Blue subtracks show data from monoclonal\ antibodies, red ones from patient sera. By configuring the current track (click\ on "Antibody escape" under the image), one can display the total sum of all\ scores per amino acid.

\ \

The data is summarized as two x-y barplots, as the average values per amino acid,\ again in red (sera) and blue (MABs). Finally, another summary track has one feature\ per position where the score exceeds 0.18. These features are clickable and the details page\ show the exact amino acid changes and their scores. \ \

Whelan lab data

\

Features are labeled with the nucleotide and protein coordinates and the name of the antibody. \ Click a feature or mouse-over a feature to show these annotations.

\ \

Rappuoli lab data

\

The three mutations are labeled with the protein coordinates.

\ \

McCoy lab data

\

Features are labeled with the amino acid mutation coordinates. \ Click a feature or mouse-over a feature to show a description on the specific mutation.

\ \

Methods

\

\ Patient sera: data was downloaded from the jbloomlab Github file and parsed into bedGraph format.

\ \

10 Antibodies: Table S1 from Starr et al, was downloaded and parsed into bedGraph format.

\ \

4 treatment antibodies: Data was downloaded from the jbloomlab Github file and parsed into bedGraph format using the total and maximum values.

\ \

21 Antibodies: Table 2 from Liu et al 2020, was copied manually and converted to bedGraph format.

\ \

For the Rappuoli lab, the mutations were manually copied from the text.

\ \ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Greaney AJ, Loes AN, Crawford K, Starr T, Malone K, Chu H, Bloom JD.\ \ Comprehensive mapping of mutations to the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human serum antibodies\ .\ Biorxiv. 2021 Jan 04;.\

\ \ Greaney AJ, Starr TN, Gilchuk P, Zost SJ, Binshtein E, Loes AN, Hilton SK, Huddleston J, Eguia R,\ Crawford KHD et al.\ \ Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody\ Recognition.\ Cell Host Microbe. 2020 Nov 19;.\ PMID: 33259788; PMC: PMC7676316\

\ \

\ Zhuoming Liu, Laura A. VanBlargan, Paul W. Rothlauf, Louis-Marie Bloyet, Rita E. Chen, Spencer Stumpf, Haiyan Zhao, John M. Errico, Elitza S. Theel, Ali H. Ellebedy, Daved H. Fremont, Michael S. Diamond, Sean P. J. Whelan\ \ Landscape analysis of escape variants identifies SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization.\ Biorxiv. 2020 \

\ \

\ Starr TN, Greaney AJ, Addetia A, Hannon WW, Choudhary MC, Dingens AS, Li JZ, Bloom JD.\ \ Prospective mapping of viral mutations that escape antibodies used to treat COVID-19.\ bioRxiv. 2020 Dec 1;.\ PMID: 33299993; PMC: PMC7724661\

\

\ Andreano E, Piccini G, Licastro D, Casalino L, Johnson NV, Paciello I, Monego SD, Pantano E,\ Manganaro N, Manenti A et al.\ \ SARS-CoV-2 escape <i>in vitro</i> from a highly neutralizing COVID-19 convalescent\ plasma.\ bioRxiv. 2020 Dec 28;.\ PMID: 33398278; PMC: PMC7781313\

\ immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/mabAvg.bw\ color 0,0,255\ html abEscape\ longLabel Bloom Lab: S RBD-mutation monoclonal antibody escape - average score across all 13 MAB samples\ maxHeightPixels 100:28:8\ parent abEscape\ priority 30.2\ shortLabel Bloom MAB Average\ track bloomEscMabAvg\ type bigWig\ viewLimits 0:13\ visibility hide\ bloomEscTop Bloom Strong Mutations bigBed 9 + Bloom Lab: Strong S RBD-mutation antibody escape - positions with max score > 0.18 - shading = number of samples where found 0 30.3 0 0 0 127 127 127 1 0 0

Description

\

\ The subtracks of this track show mutations that lead to escape from patient serum antibodies or monoclonal\ antibodies. Most of the mutations assayed were in the receptor binding domain (RBD) of the S protein. \ The data shown here were imported from different studies, listed below. The\ Bloom lab papers used deep mutational scanning data to measure the effect of all \ possible mutations in the Spike RBD using a yeast surface display system.

\
    \
  1. Bloom lab - patients A-K: antibodies in sera from the Hospitalized or Ambulatory Adults with Respiratory Viral\ Infections (HAARVI) cohort, described in \ Greaney et al., Biorxiv 2021.
  2. \ \
  3. Bloom lab - 10 antibodies: A selection of ten monoclonal antibodies, described\ in Greaney et al, Cell Host Microbe 2020.\
  4. \ \
  5. Bloom lab - 4 treatment antibodies: Four monoclonal antibodies licensed for treatment.\ The results were described in Starr et al,\ Biorxiv 2021.\
  6. \ \
  7. Whelan lab - 21 antibodies: a selection screen of 21 neutralizing monoclonal\ antibodies (mAbs) against the receptor binding domain (RBD) generated 48 escape\ mutants. The results were described in \ Liu et al, Biorxiv 2020.\ \
  8. Rappuoli lab - serum from one patient: three mutations obtained by passaging of \ cells in neutralizing serum from a single patient, described in Andreano et al, Biorxiv 2021.\
  9. McCoy lab - mutations tested on monoclonal antibodies and patient sera, described in \ Rees-Spear et al, Biorxiv 2021.\
\

\ \

\ For the Bloom lab data, we show just a summary of the data. Better and detailed structural\ visualizations are available from the authors via dms-view using the following links:\ patient sera,\ 10 monoclonal antibodies,\ 4 treatment antibodies.\

\ \

Display Conventions and Configuration

\

Bloom lab data

\

Scores represent the "escape fraction" (discussed at length in the Methods \ of the paper) which "represent the fraction of a given variant that escape antibody \ binding, and should in principle range from 0 to 1.". \ "Note that the magnitude of the measured effects of mutations on antibody escape depends on\ the antibody concentration and the flow cytometry gates applied, meaning that the\ escape fractions are comparable across sites for any given antibody, but are not precisely\ comparable among antibodies without external calibration."

\ \ \

A higher score indicates a greater level of escape.

\ \

The data summarized to protein positions are shown as 36 subtracks, one per sample, that indicate\ the maximum score per amino acid position that was assayed as shades of color\ or, in full mode, as a x-y barplot. Blue subtracks show data from monoclonal\ antibodies, red ones from patient sera. By configuring the current track (click\ on "Antibody escape" under the image), one can display the total sum of all\ scores per amino acid.

\ \

The data is summarized as two x-y barplots, as the average values per amino acid,\ again in red (sera) and blue (MABs). Finally, another summary track has one feature\ per position where the score exceeds 0.18. These features are clickable and the details page\ show the exact amino acid changes and their scores. \ \

Whelan lab data

\

Features are labeled with the nucleotide and protein coordinates and the name of the antibody. \ Click a feature or mouse-over a feature to show these annotations.

\ \

Rappuoli lab data

\

The three mutations are labeled with the protein coordinates.

\ \

McCoy lab data

\

Features are labeled with the amino acid mutation coordinates. \ Click a feature or mouse-over a feature to show a description on the specific mutation.

\ \

Methods

\

\ Patient sera: data was downloaded from the jbloomlab Github file and parsed into bedGraph format.

\ \

10 Antibodies: Table S1 from Starr et al, was downloaded and parsed into bedGraph format.

\ \

4 treatment antibodies: Data was downloaded from the jbloomlab Github file and parsed into bedGraph format using the total and maximum values.

\ \

21 Antibodies: Table 2 from Liu et al 2020, was copied manually and converted to bedGraph format.

\ \

For the Rappuoli lab, the mutations were manually copied from the text.

\ \ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Greaney AJ, Loes AN, Crawford K, Starr T, Malone K, Chu H, Bloom JD.\ \ Comprehensive mapping of mutations to the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human serum antibodies\ .\ Biorxiv. 2021 Jan 04;.\

\ \ Greaney AJ, Starr TN, Gilchuk P, Zost SJ, Binshtein E, Loes AN, Hilton SK, Huddleston J, Eguia R,\ Crawford KHD et al.\ \ Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody\ Recognition.\ Cell Host Microbe. 2020 Nov 19;.\ PMID: 33259788; PMC: PMC7676316\

\ \

\ Zhuoming Liu, Laura A. VanBlargan, Paul W. Rothlauf, Louis-Marie Bloyet, Rita E. Chen, Spencer Stumpf, Haiyan Zhao, John M. Errico, Elitza S. Theel, Ali H. Ellebedy, Daved H. Fremont, Michael S. Diamond, Sean P. J. Whelan\ \ Landscape analysis of escape variants identifies SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization.\ Biorxiv. 2020 \

\ \

\ Starr TN, Greaney AJ, Addetia A, Hannon WW, Choudhary MC, Dingens AS, Li JZ, Bloom JD.\ \ Prospective mapping of viral mutations that escape antibodies used to treat COVID-19.\ bioRxiv. 2020 Dec 1;.\ PMID: 33299993; PMC: PMC7724661\

\

\ Andreano E, Piccini G, Licastro D, Casalino L, Johnson NV, Paciello I, Monego SD, Pantano E,\ Manganaro N, Manenti A et al.\ \ SARS-CoV-2 escape <i>in vitro</i> from a highly neutralizing COVID-19 convalescent\ plasma.\ bioRxiv. 2020 Dec 28;.\ PMID: 33398278; PMC: PMC7781313\

\ immu 1 bigDataUrl /gbdb/wuhCor1/bloomEsc/top.bb\ html abEscape\ longLabel Bloom Lab: Strong S RBD-mutation antibody escape - positions with max score > 0.18 - shading = number of samples where found\ mouseOverField _mouseOver\ parent abEscape\ priority 30.3\ scoreMax 15\ scoreMin 0\ shortLabel Bloom Strong Mutations\ spectrum on\ track bloomEscTop\ type bigBed 9 +\ visibility hide\ bloomEscMax Bloom Max Escape bigWig Bloom Lab: S RBD-mutation antibody escape - maximum escape score per amino acid - 13 MABs and serum from 11 patients (A-K) 0 30.7 0 0 0 127 127 127 0 0 0

Description

\

\ The subtracks of this track show mutations that lead to escape from patient serum antibodies or monoclonal\ antibodies. Most of the mutations assayed were in the receptor binding domain (RBD) of the S protein. \ The data shown here were imported from different studies, listed below. The\ Bloom lab papers used deep mutational scanning data to measure the effect of all \ possible mutations in the Spike RBD using a yeast surface display system.

\
    \
  1. Bloom lab - patients A-K: antibodies in sera from the Hospitalized or Ambulatory Adults with Respiratory Viral\ Infections (HAARVI) cohort, described in \ Greaney et al., Biorxiv 2021.
  2. \ \
  3. Bloom lab - 10 antibodies: A selection of ten monoclonal antibodies, described\ in Greaney et al, Cell Host Microbe 2020.\
  4. \ \
  5. Bloom lab - 4 treatment antibodies: Four monoclonal antibodies licensed for treatment.\ The results were described in Starr et al,\ Biorxiv 2021.\
  6. \ \
  7. Whelan lab - 21 antibodies: a selection screen of 21 neutralizing monoclonal\ antibodies (mAbs) against the receptor binding domain (RBD) generated 48 escape\ mutants. The results were described in \ Liu et al, Biorxiv 2020.\ \
  8. Rappuoli lab - serum from one patient: three mutations obtained by passaging of \ cells in neutralizing serum from a single patient, described in Andreano et al, Biorxiv 2021.\
  9. McCoy lab - mutations tested on monoclonal antibodies and patient sera, described in \ Rees-Spear et al, Biorxiv 2021.\
\

\ \

\ For the Bloom lab data, we show just a summary of the data. Better and detailed structural\ visualizations are available from the authors via dms-view using the following links:\ patient sera,\ 10 monoclonal antibodies,\ 4 treatment antibodies.\

\ \

Display Conventions and Configuration

\

Bloom lab data

\

Scores represent the "escape fraction" (discussed at length in the Methods \ of the paper) which "represent the fraction of a given variant that escape antibody \ binding, and should in principle range from 0 to 1.". \ "Note that the magnitude of the measured effects of mutations on antibody escape depends on\ the antibody concentration and the flow cytometry gates applied, meaning that the\ escape fractions are comparable across sites for any given antibody, but are not precisely\ comparable among antibodies without external calibration."

\ \ \

A higher score indicates a greater level of escape.

\ \

The data summarized to protein positions are shown as 36 subtracks, one per sample, that indicate\ the maximum score per amino acid position that was assayed as shades of color\ or, in full mode, as a x-y barplot. Blue subtracks show data from monoclonal\ antibodies, red ones from patient sera. By configuring the current track (click\ on "Antibody escape" under the image), one can display the total sum of all\ scores per amino acid.

\ \

The data is summarized as two x-y barplots, as the average values per amino acid,\ again in red (sera) and blue (MABs). Finally, another summary track has one feature\ per position where the score exceeds 0.18. These features are clickable and the details page\ show the exact amino acid changes and their scores. \ \

Whelan lab data

\

Features are labeled with the nucleotide and protein coordinates and the name of the antibody. \ Click a feature or mouse-over a feature to show these annotations.

\ \

Rappuoli lab data

\

The three mutations are labeled with the protein coordinates.

\ \

McCoy lab data

\

Features are labeled with the amino acid mutation coordinates. \ Click a feature or mouse-over a feature to show a description on the specific mutation.

\ \

Methods

\

\ Patient sera: data was downloaded from the jbloomlab Github file and parsed into bedGraph format.

\ \

10 Antibodies: Table S1 from Starr et al, was downloaded and parsed into bedGraph format.

\ \

4 treatment antibodies: Data was downloaded from the jbloomlab Github file and parsed into bedGraph format using the total and maximum values.

\ \

21 Antibodies: Table 2 from Liu et al 2020, was copied manually and converted to bedGraph format.

\ \

For the Rappuoli lab, the mutations were manually copied from the text.

\ \ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Greaney AJ, Loes AN, Crawford K, Starr T, Malone K, Chu H, Bloom JD.\ \ Comprehensive mapping of mutations to the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human serum antibodies\ .\ Biorxiv. 2021 Jan 04;.\

\ \ Greaney AJ, Starr TN, Gilchuk P, Zost SJ, Binshtein E, Loes AN, Hilton SK, Huddleston J, Eguia R,\ Crawford KHD et al.\ \ Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody\ Recognition.\ Cell Host Microbe. 2020 Nov 19;.\ PMID: 33259788; PMC: PMC7676316\

\ \

\ Zhuoming Liu, Laura A. VanBlargan, Paul W. Rothlauf, Louis-Marie Bloyet, Rita E. Chen, Spencer Stumpf, Haiyan Zhao, John M. Errico, Elitza S. Theel, Ali H. Ellebedy, Daved H. Fremont, Michael S. Diamond, Sean P. J. Whelan\ \ Landscape analysis of escape variants identifies SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization.\ Biorxiv. 2020 \

\ \

\ Starr TN, Greaney AJ, Addetia A, Hannon WW, Choudhary MC, Dingens AS, Li JZ, Bloom JD.\ \ Prospective mapping of viral mutations that escape antibodies used to treat COVID-19.\ bioRxiv. 2020 Dec 1;.\ PMID: 33299993; PMC: PMC7724661\

\

\ Andreano E, Piccini G, Licastro D, Casalino L, Johnson NV, Paciello I, Monego SD, Pantano E,\ Manganaro N, Manenti A et al.\ \ SARS-CoV-2 escape <i>in vitro</i> from a highly neutralizing COVID-19 convalescent\ plasma.\ bioRxiv. 2020 Dec 28;.\ PMID: 33398278; PMC: PMC7781313\

\ immu 0 compositeTrack on\ html abEscape\ longLabel Bloom Lab: S RBD-mutation antibody escape - maximum escape score per amino acid - 13 MABs and serum from 11 patients (A-K)\ parent abEscape\ priority 30.7\ shortLabel Bloom Max Escape\ track bloomEscMax\ type bigWig\ viewLimits 0:1.0\ visibility hide\ A_120Max Serum: A, day 120 bigWig Bloom antibody escape - Max Score - A_120 1 30.71 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/A_120.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - A_120\ parent bloomEscMax on\ priority 30.71\ shortLabel Serum: A, day 120\ track A_120Max\ type bigWig\ visibility dense\ A_21Max Serum: A, day 021 bigWig Bloom antibody escape - Max Score - A_21 1 30.72 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/A_21.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - A_21\ parent bloomEscMax on\ priority 30.72\ shortLabel Serum: A, day 021\ track A_21Max\ type bigWig\ visibility dense\ A_45Max Serum: A, day 045 bigWig Bloom antibody escape - Max Score - A_45 1 30.73 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/A_45.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - A_45\ parent bloomEscMax on\ priority 30.73\ shortLabel Serum: A, day 045\ track A_45Max\ type bigWig\ visibility dense\ B_113Max Serum: B, day 113 bigWig Bloom antibody escape - Max Score - B_113 1 30.74 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/B_113.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - B_113\ parent bloomEscMax on\ priority 30.74\ shortLabel Serum: B, day 113\ track B_113Max\ type bigWig\ visibility dense\ B_26Max Serum: B, day 026 bigWig Bloom antibody escape - Max Score - B_26 1 30.75 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/B_26.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - B_26\ parent bloomEscMax on\ priority 30.75\ shortLabel Serum: B, day 026\ track B_26Max\ type bigWig\ visibility dense\ C_104Max Serum: C, day 104 bigWig Bloom antibody escape - Max Score - C_104 1 30.76 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/C_104.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - C_104\ parent bloomEscMax on\ priority 30.76\ shortLabel Serum: C, day 104\ track C_104Max\ type bigWig\ visibility dense\ C_32Max Serum: C, day 032 bigWig Bloom antibody escape - Max Score - C_32 1 30.77 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/C_32.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - C_32\ parent bloomEscMax on\ priority 30.77\ shortLabel Serum: C, day 032\ track C_32Max\ type bigWig\ visibility dense\ D_33Max Serum: D, day 033 bigWig Bloom antibody escape - Max Score - D_33 1 30.78 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/D_33.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - D_33\ parent bloomEscMax on\ priority 30.78\ shortLabel Serum: D, day 033\ track D_33Max\ type bigWig\ visibility dense\ D_76Max Serum: D, day 076 bigWig Bloom antibody escape - Max Score - D_76 1 30.79 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/D_76.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - D_76\ parent bloomEscMax on\ priority 30.79\ shortLabel Serum: D, day 076\ track D_76Max\ type bigWig\ visibility dense\ E_104Max Serum: E, day 104 bigWig Bloom antibody escape - Max Score - E_104 1 30.8 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/E_104.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - E_104\ parent bloomEscMax on\ priority 30.80\ shortLabel Serum: E, day 104\ track E_104Max\ type bigWig\ visibility dense\ E_28Max Serum: E, day 028 bigWig Bloom antibody escape - Max Score - E_28 1 30.81 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/E_28.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - E_28\ parent bloomEscMax on\ priority 30.81\ shortLabel Serum: E, day 028\ track E_28Max\ type bigWig\ visibility dense\ F_115Max Serum: F, day 115 bigWig Bloom antibody escape - Max Score - F_115 1 30.82 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/F_115.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - F_115\ parent bloomEscMax on\ priority 30.82\ shortLabel Serum: F, day 115\ track F_115Max\ type bigWig\ visibility dense\ F_48Max Serum: F, day 048 bigWig Bloom antibody escape - Max Score - F_48 1 30.83 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/F_48.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - F_48\ parent bloomEscMax on\ priority 30.83\ shortLabel Serum: F, day 048\ track F_48Max\ type bigWig\ visibility dense\ G_18Max Serum: G, day 018 bigWig Bloom antibody escape - Max Score - G_18 1 30.84 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/G_18.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - G_18\ parent bloomEscMax on\ priority 30.84\ shortLabel Serum: G, day 018\ track G_18Max\ type bigWig\ visibility dense\ G_94Max Serum: G, day 094 bigWig Bloom antibody escape - Max Score - G_94 1 30.85 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/G_94.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - G_94\ parent bloomEscMax on\ priority 30.85\ shortLabel Serum: G, day 094\ track G_94Max\ type bigWig\ visibility dense\ H_152Max Serum: H, day 152 bigWig Bloom antibody escape - Max Score - H_152 1 30.86 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/H_152.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - H_152\ parent bloomEscMax on\ priority 30.86\ shortLabel Serum: H, day 152\ track H_152Max\ type bigWig\ visibility dense\ H_61Max Serum: H, day 061 bigWig Bloom antibody escape - Max Score - H_61 1 30.87 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/H_61.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - H_61\ parent bloomEscMax on\ priority 30.87\ shortLabel Serum: H, day 061\ track H_61Max\ type bigWig\ visibility dense\ I_102Max Serum: I, day 102 bigWig Bloom antibody escape - Max Score - I_102 1 30.88 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/I_102.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - I_102\ parent bloomEscMax on\ priority 30.88\ shortLabel Serum: I, day 102\ track I_102Max\ type bigWig\ visibility dense\ I_26Max Serum: I, day 026 bigWig Bloom antibody escape - Max Score - I_26 1 30.89 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/I_26.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - I_26\ parent bloomEscMax on\ priority 30.89\ shortLabel Serum: I, day 026\ track I_26Max\ type bigWig\ visibility dense\ J_121Max Serum: J, day 121 bigWig Bloom antibody escape - Max Score - J_121 1 30.9 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/J_121.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - J_121\ parent bloomEscMax on\ priority 30.90\ shortLabel Serum: J, day 121\ track J_121Max\ type bigWig\ visibility dense\ J_15Max Serum: J, day 015 bigWig Bloom antibody escape - Max Score - J_15 1 30.91 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/J_15.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - J_15\ parent bloomEscMax on\ priority 30.91\ shortLabel Serum: J, day 015\ track J_15Max\ type bigWig\ visibility dense\ K_103Max Serum: K, day 103 bigWig Bloom antibody escape - Max Score - K_103 1 30.92 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/K_103.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - K_103\ parent bloomEscMax on\ priority 30.92\ shortLabel Serum: K, day 103\ track K_103Max\ type bigWig\ visibility dense\ K_29Max Serum: K, day 029 bigWig Bloom antibody escape - Max Score - K_29 1 30.93 255 0 0 255 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/K_29.max.bw\ color 255,0,0\ longLabel Bloom antibody escape - Max Score - K_29\ parent bloomEscMax on\ priority 30.93\ shortLabel Serum: K, day 029\ track K_29Max\ type bigWig\ visibility dense\ LY-CoV016Max MAB LY-CoV016 bigWig Bloom antibody escape - Max Score - MAB LY-CoV016 1 30.94 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/LY-CoV016.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB LY-CoV016\ parent bloomEscMax on\ priority 30.94\ shortLabel MAB LY-CoV016\ track LY-CoV016Max\ type bigWig\ visibility dense\ REGN10933Max MAB REGN10933 bigWig Bloom antibody escape - Max Score - MAB REGN10933 1 30.95 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/REGN10933.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB REGN10933\ parent bloomEscMax on\ priority 30.95\ shortLabel MAB REGN10933\ track REGN10933Max\ type bigWig\ visibility dense\ REGN10933-REGN10987Max MAB REGN10933-REGN10987 bigWig Bloom antibody escape - Max Score - MAB REGN10933-REGN10987 1 30.96 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/REGN10933-REGN10987.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB REGN10933-REGN10987\ parent bloomEscMax on\ priority 30.96\ shortLabel MAB REGN10933-REGN10987\ track REGN10933-REGN10987Max\ type bigWig\ visibility dense\ REGN10987Max MAB REGN10987 bigWig Bloom antibody escape - Max Score - MAB REGN10987 1 30.97 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/REGN10987.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB REGN10987\ parent bloomEscMax on\ priority 30.97\ shortLabel MAB REGN10987\ track REGN10987Max\ type bigWig\ visibility dense\ rCR3022Max MAB rCR3022 bigWig Bloom antibody escape - Max Score - MAB rCR3022 1 30.98 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/rCR3022.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB rCR3022\ parent bloomEscMax on\ priority 30.98\ shortLabel MAB rCR3022\ track rCR3022Max\ type bigWig\ visibility dense\ COV2-2050Max MAB COV2-2050 bigWig Bloom antibody escape - Max Score - MAB COV2-2050 1 30.99 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2050.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB COV2-2050\ parent bloomEscMax on\ priority 30.99\ shortLabel MAB COV2-2050\ track COV2-2050Max\ type bigWig\ visibility dense\ igm_COVID_11 COVID 11 bigBed 9 COVID 11 1 31 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_11.bb\ longLabel COVID 11\ parent igm on\ priority 31\ shortLabel COVID 11\ track igm_COVID_11\ type bigBed 9\ igg_COVID_410 COVID 410 bigBed 9 COVID 410 1 31 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_410.bb\ longLabel COVID 410\ parent igg on\ priority 31\ shortLabel COVID 410\ track igg_COVID_410\ type bigBed 9\ variantAaMuts_BA_2_86 Omicron BA.2.86 AA Muts bigBed 4 Omicron BA.2.86 amino acid mutations from GISAID sequences (Jan 29, 2024) 1 31 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.2.86 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BA.2.86_prot.bb\ color 219,40,35\ longLabel Omicron BA.2.86 amino acid mutations from GISAID sequences (Jan 29, 2024)\ parent variantMuts off\ priority 27\ shortLabel Omicron BA.2.86 AA Muts\ subGroups variant=Z_BA286 mutation=AA designation=VOI\ track variantAaMuts_BA_2_86\ url https://outbreak.info/situation-reports?pango=BA.2.86\ urlLabel BA.2.86 Situation Report at outbreak.info\ contacts PDB Ligand Contacts bigBed 12 + Potential contact residues in PDB structures of viral proteins 0 31 0 0 0 127 127 127 0 0 0 http://soe.ucsc.edu/~afyfe/covid/hub/t12.html?struct=$$

Description

\ \

This track shows potential contact residues for ligands, inferred from\ structures in the PDB database by Alastair Fyfe, UCSC.\

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of contact residues are highlighted with thick blocks, \ that look identical to exons. Contact residues of the same PDB structure\ are connected by thin intron lines.\

\

\ To display the 3D structure viewer with these contact residues highlighted,\ follow the outlink at the top of the details page of any feature.\

\ \

Methods

\

\ PDB SEQRES protein sequences were aligned to the genome with tblastn.\ Ligands were determined manually, all amino\ acids closer than 3.5 Angstroms were obtained as described below.\ The positions of these close amino acids on the genome were determined \ using the tblastn alignment and are highlighted on the genome with exon blocks. \

\

\ The find nearby amino acids, inter-atom distances were calculated with the libraries\ GEMMI and \ Clipper \ at a threshold of 3.5 . Distances between atoms \ whose proximity is due to crystal packing or symmetry are listed but cannot be\ inspected in the viewer. These contacts are flagged by a check in the "Symm?"\ column. They include atoms in adjacent unit cells and atoms brought into\ proximity by application of a symmetry transformation to the asymmetric unit\ (ASU), but not atoms related by non-crystallographic symmetry (NCS).

For\ each amino acid in the contact table the "Codon" column gives the codon's\ nucleotide values and position in the \ NC_045512.2 reference\ sequence. Amino acids that could not be aligned to the reference, for example\ insertions or deletions engineered to facilitate protein expression, are\ flagged by "-".

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/contacts.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\

Credits

\ Track created by Alastair Fyfe, UCSC, mentored by David Haussler.\ genes 1 bigDataUrl /gbdb/wuhCor1/bbi/contacts.bb\ group genes\ longLabel Potential contact residues in PDB structures of viral proteins\ mouseOverField _mouseOver\ priority 31\ shortLabel PDB Ligand Contacts\ track contacts\ type bigBed 12 +\ url http://soe.ucsc.edu/~afyfe/covid/hub/t12.html?struct=$$\ urlLabel Link to 3D structure viewer:\ visibility hide\ H_152Total Serum: H, day 152 bigWig Bloom antibody escape - Total Score - Subject H, Day 152 1 31 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/H_152.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject H, Day 152\ parent bloomEscTotal on\ shortLabel Serum: H, day 152\ track H_152Total\ type bigWig\ visibility dense\ CHX_24hr_1 Vero6 CHX 24hr 1 bigWig Vero6 CHX 24hr 1 2 31 4 90 141 129 172 198 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_chx_24hr_1.bw\ color 4,90,141\ longLabel Vero6 CHX 24hr 1\ maxHeightPixels 124:32:5\ parent Vero6_24hpi on\ priority 31\ shortLabel Vero6 CHX 24hr 1\ track CHX_24hr_1\ type bigWig\ viewLimits 0:10\ visibility full\ COV2-2082Max MAB COV2-2082 bigWig Bloom antibody escape - Max Score - MAB COV2-2082 1 31.01 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2082.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB COV2-2082\ parent bloomEscMax on\ priority 31.01\ shortLabel MAB COV2-2082\ track COV2-2082Max\ type bigWig\ visibility dense\ COV2-2094Max MAB COV2-2094 bigWig Bloom antibody escape - Max Score - MAB COV2-2094 1 31.02 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2094.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB COV2-2094\ parent bloomEscMax on\ priority 31.02\ shortLabel MAB COV2-2094\ track COV2-2094Max\ type bigWig\ visibility dense\ COV2-2096Max MAB COV2-2096 bigWig Bloom antibody escape - Max Score - MAB COV2-2096 1 31.03 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2096.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB COV2-2096\ parent bloomEscMax on\ priority 31.03\ shortLabel MAB COV2-2096\ track COV2-2096Max\ type bigWig\ visibility dense\ COV2-2165Max MAB COV2-2165 bigWig Bloom antibody escape - Max Score - MAB COV2-2165 1 31.03 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2165.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB COV2-2165\ parent bloomEscMax on\ priority 31.03\ shortLabel MAB COV2-2165\ track COV2-2165Max\ type bigWig\ visibility dense\ COV2-2479Max MAB COV2-2479 bigWig Bloom antibody escape - Max Score - MAB COV2-2479 1 31.04 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2479.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB COV2-2479\ parent bloomEscMax on\ priority 31.04\ shortLabel MAB COV2-2479\ track COV2-2479Max\ type bigWig\ visibility dense\ COV2-2499Max MAB COV2-2499 bigWig Bloom antibody escape - Max Score - MAB COV2-2499 1 31.05 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2499.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB COV2-2499\ parent bloomEscMax on\ priority 31.05\ shortLabel MAB COV2-2499\ track COV2-2499Max\ type bigWig\ visibility dense\ COV2-2677Max MAB COV2-2677 bigWig Bloom antibody escape - Max Score - MAB COV2-2677 1 31.05 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2677.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB COV2-2677\ parent bloomEscMax on\ priority 31.05\ shortLabel MAB COV2-2677\ track COV2-2677Max\ type bigWig\ visibility dense\ COV2-2832Max MAB COV2-2832 bigWig Bloom antibody escape - Max Score - MAB COV2-2832 1 31.06 0 0 255 127 127 255 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/COV2-2832.max.bw\ color 0,0,255\ longLabel Bloom antibody escape - Max Score - MAB COV2-2832\ parent bloomEscMax on\ priority 31.06\ shortLabel MAB COV2-2832\ track COV2-2832Max\ type bigWig\ visibility dense\ bloomEscTotal Bloom Total Escape bigWig Bloom Lab: S RBD-mutation antibody escape - total escape score per amino acid - 13 MABs and serum from 11 patients (A-K) 0 32 0 0 0 127 127 127 0 0 0 immu 0 autoScale group\ color 0,0,0\ compositeTrack on\ longLabel Bloom Lab: S RBD-mutation antibody escape - total escape score per amino acid - 13 MABs and serum from 11 patients (A-K)\ parent abEscape\ priority 32.00\ shortLabel Bloom Total Escape\ track bloomEscTotal\ type bigWig\ visibility hide\ igm_COVID_17 COVID 17 bigBed 9 COVID 17 1 32 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_17.bb\ longLabel COVID 17\ parent igm on\ priority 32\ shortLabel COVID 17\ track igm_COVID_17\ type bigBed 9\ igg_COVID_4022 COVID 4022 bigBed 9 COVID 4022 1 32 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_4022.bb\ longLabel COVID 4022\ parent igg on\ priority 32\ shortLabel COVID 4022\ track igg_COVID_4022\ type bigBed 9\ variantNucMuts_BA_2_86 Omicron BA.2.86 Nuc Muts bigBed 4 Omicron VOC (BA.2.86) nucleotide mutations identifed from GISAID sequences (Jan 2024) 1 32 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.2.86 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BA.2.86_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (BA.2.86) nucleotide mutations identifed from GISAID sequences (Jan 2024)\ parent variantMuts off\ priority 137\ shortLabel Omicron BA.2.86 Nuc Muts\ subGroups variant=Z_BA286 mutation=NUC designation=VOI\ track variantNucMuts_BA_2_86\ url https://outbreak.info/situation-reports?pango=BA.2.86\ urlLabel BA.2.86 Situation Report at outbreak.info\ I_26Total Serum: I, day 026 bigWig Bloom antibody escape - Total Score - Subject I, Day 26 1 32 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/I_26.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject I, Day 26\ parent bloomEscTotal on\ shortLabel Serum: I, day 026\ track I_26Total\ type bigWig\ visibility dense\ CHX_24hr_2 Vero6 CHX 24hr 2 bigWig Vero6 CHX 24hr 2 0 32 4 90 141 129 172 198 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_chx_24hr_2.bw\ color 4,90,141\ longLabel Vero6 CHX 24hr 2\ maxHeightPixels 124:32:5\ parent Vero6_24hpi on\ priority 32\ shortLabel Vero6 CHX 24hr 2\ track CHX_24hr_2\ type bigWig\ viewLimits 0:10\ visibility hide\ igg_COVID_409 COVID 409 bigBed 9 COVID 409 1 33 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_409.bb\ longLabel COVID 409\ parent igg on\ priority 33\ shortLabel COVID 409\ track igg_COVID_409\ type bigBed 9\ igm_COVID_744 COVID 744 bigBed 9 COVID 744 1 33 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_744.bb\ longLabel COVID 744\ parent igm on\ priority 33\ shortLabel COVID 744\ track igm_COVID_744\ type bigBed 9\ variantAaMutsV2_B_1_429 Epsilon AA Muts bigBed 4 Epsilon VUM (B.1.429 USA Mar-2020) amino acid mutations in 1360 GISAID sequences (Feb 5, 2021) 1 33 160 190 89 207 222 172 0 0 0 https://outbreak.info/situation-reports?pango=B.1.429 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantAaMuts_B.1.429_2021_02_05.bb\ color 160,190,89\ longLabel Epsilon VUM (B.1.429 USA Mar-2020) amino acid mutations in 1360 GISAID sequences (Feb 5, 2021)\ parent variantMuts off\ priority 7\ shortLabel Epsilon AA Muts\ subGroups variant=E_B1429 mutation=AA designation=VUM\ track variantAaMutsV2_B_1_429\ url https://outbreak.info/situation-reports?pango=B.1.429\ urlLabel B.1.429 Situation Report at outbreak.info\ I_102Total Serum: I, day 102 bigWig Bloom antibody escape - Total Score - Subject I, Day 102 1 33 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/I_102.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject I, Day 102\ parent bloomEscTotal on\ shortLabel Serum: I, day 102\ track I_102Total\ type bigWig\ visibility dense\ Harr_24hr_1 Vero6 Harr 24hr 1 bigWig Vero6 Harr 24hr 1 2 33 179 0 0 217 127 127 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_harr_24hr_1.bw\ color 179,0,0\ longLabel Vero6 Harr 24hr 1\ maxHeightPixels 124:32:5\ parent Vero6_24hpi on\ priority 33\ shortLabel Vero6 Harr 24hr 1\ track Harr_24hr_1\ type bigWig\ viewLimits 0:10\ visibility full\ escape Whelan 21 Ab bigBed 9 + Whelan lab: RBD Mutations that lead to escape from 21 monoclonal antibodies (click to show mutation details) 0 33.1 0 0 0 127 127 127 0 0 0

Description

\

\ The subtracks of this track show mutations that lead to escape from patient serum antibodies or monoclonal\ antibodies. Most of the mutations assayed were in the receptor binding domain (RBD) of the S protein. \ The data shown here were imported from different studies, listed below. The\ Bloom lab papers used deep mutational scanning data to measure the effect of all \ possible mutations in the Spike RBD using a yeast surface display system.

\
    \
  1. Bloom lab - patients A-K: antibodies in sera from the Hospitalized or Ambulatory Adults with Respiratory Viral\ Infections (HAARVI) cohort, described in \ Greaney et al., Biorxiv 2021.
  2. \ \
  3. Bloom lab - 10 antibodies: A selection of ten monoclonal antibodies, described\ in Greaney et al, Cell Host Microbe 2020.\
  4. \ \
  5. Bloom lab - 4 treatment antibodies: Four monoclonal antibodies licensed for treatment.\ The results were described in Starr et al,\ Biorxiv 2021.\
  6. \ \
  7. Whelan lab - 21 antibodies: a selection screen of 21 neutralizing monoclonal\ antibodies (mAbs) against the receptor binding domain (RBD) generated 48 escape\ mutants. The results were described in \ Liu et al, Biorxiv 2020.\ \
  8. Rappuoli lab - serum from one patient: three mutations obtained by passaging of \ cells in neutralizing serum from a single patient, described in Andreano et al, Biorxiv 2021.\
  9. McCoy lab - mutations tested on monoclonal antibodies and patient sera, described in \ Rees-Spear et al, Biorxiv 2021.\
\

\ \

\ For the Bloom lab data, we show just a summary of the data. Better and detailed structural\ visualizations are available from the authors via dms-view using the following links:\ patient sera,\ 10 monoclonal antibodies,\ 4 treatment antibodies.\

\ \

Display Conventions and Configuration

\

Bloom lab data

\

Scores represent the "escape fraction" (discussed at length in the Methods \ of the paper) which "represent the fraction of a given variant that escape antibody \ binding, and should in principle range from 0 to 1.". \ "Note that the magnitude of the measured effects of mutations on antibody escape depends on\ the antibody concentration and the flow cytometry gates applied, meaning that the\ escape fractions are comparable across sites for any given antibody, but are not precisely\ comparable among antibodies without external calibration."

\ \ \

A higher score indicates a greater level of escape.

\ \

The data summarized to protein positions are shown as 36 subtracks, one per sample, that indicate\ the maximum score per amino acid position that was assayed as shades of color\ or, in full mode, as a x-y barplot. Blue subtracks show data from monoclonal\ antibodies, red ones from patient sera. By configuring the current track (click\ on "Antibody escape" under the image), one can display the total sum of all\ scores per amino acid.

\ \

The data is summarized as two x-y barplots, as the average values per amino acid,\ again in red (sera) and blue (MABs). Finally, another summary track has one feature\ per position where the score exceeds 0.18. These features are clickable and the details page\ show the exact amino acid changes and their scores. \ \

Whelan lab data

\

Features are labeled with the nucleotide and protein coordinates and the name of the antibody. \ Click a feature or mouse-over a feature to show these annotations.

\ \

Rappuoli lab data

\

The three mutations are labeled with the protein coordinates.

\ \

McCoy lab data

\

Features are labeled with the amino acid mutation coordinates. \ Click a feature or mouse-over a feature to show a description on the specific mutation.

\ \

Methods

\

\ Patient sera: data was downloaded from the jbloomlab Github file and parsed into bedGraph format.

\ \

10 Antibodies: Table S1 from Starr et al, was downloaded and parsed into bedGraph format.

\ \

4 treatment antibodies: Data was downloaded from the jbloomlab Github file and parsed into bedGraph format using the total and maximum values.

\ \

21 Antibodies: Table 2 from Liu et al 2020, was copied manually and converted to bedGraph format.

\ \

For the Rappuoli lab, the mutations were manually copied from the text.

\ \ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Greaney AJ, Loes AN, Crawford K, Starr T, Malone K, Chu H, Bloom JD.\ \ Comprehensive mapping of mutations to the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human serum antibodies\ .\ Biorxiv. 2021 Jan 04;.\

\ \ Greaney AJ, Starr TN, Gilchuk P, Zost SJ, Binshtein E, Loes AN, Hilton SK, Huddleston J, Eguia R,\ Crawford KHD et al.\ \ Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody\ Recognition.\ Cell Host Microbe. 2020 Nov 19;.\ PMID: 33259788; PMC: PMC7676316\

\ \

\ Zhuoming Liu, Laura A. VanBlargan, Paul W. Rothlauf, Louis-Marie Bloyet, Rita E. Chen, Spencer Stumpf, Haiyan Zhao, John M. Errico, Elitza S. Theel, Ali H. Ellebedy, Daved H. Fremont, Michael S. Diamond, Sean P. J. Whelan\ \ Landscape analysis of escape variants identifies SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization.\ Biorxiv. 2020 \

\ \

\ Starr TN, Greaney AJ, Addetia A, Hannon WW, Choudhary MC, Dingens AS, Li JZ, Bloom JD.\ \ Prospective mapping of viral mutations that escape antibodies used to treat COVID-19.\ bioRxiv. 2020 Dec 1;.\ PMID: 33299993; PMC: PMC7724661\

\

\ Andreano E, Piccini G, Licastro D, Casalino L, Johnson NV, Paciello I, Monego SD, Pantano E,\ Manganaro N, Manenti A et al.\ \ SARS-CoV-2 escape <i>in vitro</i> from a highly neutralizing COVID-19 convalescent\ plasma.\ bioRxiv. 2020 Dec 28;.\ PMID: 33398278; PMC: PMC7781313\

\ immu 1 bigDataUrl /gbdb/wuhCor1/escape/escape.bb\ html abEscape\ longLabel Whelan lab: RBD Mutations that lead to escape from 21 monoclonal antibodies (click to show mutation details)\ mouseOverField _mouseOver\ noScoreFilter on\ parent abEscape\ priority 33.1\ shortLabel Whelan 21 Ab\ track escape\ type bigBed 9 +\ visibility hide\ rappuoli Rappuoli Serum Escape bigBed 4 Rappuoli lab: S Mutations that lead to escape from neutralizing antibodies from plasma of a single patient 0 33.2 0 0 0 127 127 127 0 0 0

Description

\

\ The subtracks of this track show mutations that lead to escape from patient serum antibodies or monoclonal\ antibodies. Most of the mutations assayed were in the receptor binding domain (RBD) of the S protein. \ The data shown here were imported from different studies, listed below. The\ Bloom lab papers used deep mutational scanning data to measure the effect of all \ possible mutations in the Spike RBD using a yeast surface display system.

\
    \
  1. Bloom lab - patients A-K: antibodies in sera from the Hospitalized or Ambulatory Adults with Respiratory Viral\ Infections (HAARVI) cohort, described in \ Greaney et al., Biorxiv 2021.
  2. \ \
  3. Bloom lab - 10 antibodies: A selection of ten monoclonal antibodies, described\ in Greaney et al, Cell Host Microbe 2020.\
  4. \ \
  5. Bloom lab - 4 treatment antibodies: Four monoclonal antibodies licensed for treatment.\ The results were described in Starr et al,\ Biorxiv 2021.\
  6. \ \
  7. Whelan lab - 21 antibodies: a selection screen of 21 neutralizing monoclonal\ antibodies (mAbs) against the receptor binding domain (RBD) generated 48 escape\ mutants. The results were described in \ Liu et al, Biorxiv 2020.\ \
  8. Rappuoli lab - serum from one patient: three mutations obtained by passaging of \ cells in neutralizing serum from a single patient, described in Andreano et al, Biorxiv 2021.\
  9. McCoy lab - mutations tested on monoclonal antibodies and patient sera, described in \ Rees-Spear et al, Biorxiv 2021.\
\

\ \

\ For the Bloom lab data, we show just a summary of the data. Better and detailed structural\ visualizations are available from the authors via dms-view using the following links:\ patient sera,\ 10 monoclonal antibodies,\ 4 treatment antibodies.\

\ \

Display Conventions and Configuration

\

Bloom lab data

\

Scores represent the "escape fraction" (discussed at length in the Methods \ of the paper) which "represent the fraction of a given variant that escape antibody \ binding, and should in principle range from 0 to 1.". \ "Note that the magnitude of the measured effects of mutations on antibody escape depends on\ the antibody concentration and the flow cytometry gates applied, meaning that the\ escape fractions are comparable across sites for any given antibody, but are not precisely\ comparable among antibodies without external calibration."

\ \ \

A higher score indicates a greater level of escape.

\ \

The data summarized to protein positions are shown as 36 subtracks, one per sample, that indicate\ the maximum score per amino acid position that was assayed as shades of color\ or, in full mode, as a x-y barplot. Blue subtracks show data from monoclonal\ antibodies, red ones from patient sera. By configuring the current track (click\ on "Antibody escape" under the image), one can display the total sum of all\ scores per amino acid.

\ \

The data is summarized as two x-y barplots, as the average values per amino acid,\ again in red (sera) and blue (MABs). Finally, another summary track has one feature\ per position where the score exceeds 0.18. These features are clickable and the details page\ show the exact amino acid changes and their scores. \ \

Whelan lab data

\

Features are labeled with the nucleotide and protein coordinates and the name of the antibody. \ Click a feature or mouse-over a feature to show these annotations.

\ \

Rappuoli lab data

\

The three mutations are labeled with the protein coordinates.

\ \

McCoy lab data

\

Features are labeled with the amino acid mutation coordinates. \ Click a feature or mouse-over a feature to show a description on the specific mutation.

\ \

Methods

\

\ Patient sera: data was downloaded from the jbloomlab Github file and parsed into bedGraph format.

\ \

10 Antibodies: Table S1 from Starr et al, was downloaded and parsed into bedGraph format.

\ \

4 treatment antibodies: Data was downloaded from the jbloomlab Github file and parsed into bedGraph format using the total and maximum values.

\ \

21 Antibodies: Table 2 from Liu et al 2020, was copied manually and converted to bedGraph format.

\ \

For the Rappuoli lab, the mutations were manually copied from the text.

\ \ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Greaney AJ, Loes AN, Crawford K, Starr T, Malone K, Chu H, Bloom JD.\ \ Comprehensive mapping of mutations to the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human serum antibodies\ .\ Biorxiv. 2021 Jan 04;.\

\ \ Greaney AJ, Starr TN, Gilchuk P, Zost SJ, Binshtein E, Loes AN, Hilton SK, Huddleston J, Eguia R,\ Crawford KHD et al.\ \ Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody\ Recognition.\ Cell Host Microbe. 2020 Nov 19;.\ PMID: 33259788; PMC: PMC7676316\

\ \

\ Zhuoming Liu, Laura A. VanBlargan, Paul W. Rothlauf, Louis-Marie Bloyet, Rita E. Chen, Spencer Stumpf, Haiyan Zhao, John M. Errico, Elitza S. Theel, Ali H. Ellebedy, Daved H. Fremont, Michael S. Diamond, Sean P. J. Whelan\ \ Landscape analysis of escape variants identifies SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization.\ Biorxiv. 2020 \

\ \

\ Starr TN, Greaney AJ, Addetia A, Hannon WW, Choudhary MC, Dingens AS, Li JZ, Bloom JD.\ \ Prospective mapping of viral mutations that escape antibodies used to treat COVID-19.\ bioRxiv. 2020 Dec 1;.\ PMID: 33299993; PMC: PMC7724661\

\

\ Andreano E, Piccini G, Licastro D, Casalino L, Johnson NV, Paciello I, Monego SD, Pantano E,\ Manganaro N, Manenti A et al.\ \ SARS-CoV-2 escape <i>in vitro</i> from a highly neutralizing COVID-19 convalescent\ plasma.\ bioRxiv. 2020 Dec 28;.\ PMID: 33398278; PMC: PMC7781313\

\ immu 1 bigDataUrl /gbdb/wuhCor1/rappuoli/rappuoli.bb\ html abEscape\ longLabel Rappuoli lab: S Mutations that lead to escape from neutralizing antibodies from plasma of a single patient\ noScoreFilter on\ parent abEscape\ priority 33.2\ shortLabel Rappuoli Serum Escape\ track rappuoli\ type bigBed 4\ visibility hide\ mccoy McCoy Escape bigBed 8 + McCoy lab: S Mutation impact on neutralization by serum and mAbs 0 33.3 0 0 0 127 127 127 0 0 0

Description

\

\ The subtracks of this track show mutations that lead to escape from patient serum antibodies or monoclonal\ antibodies. Most of the mutations assayed were in the receptor binding domain (RBD) of the S protein. \ The data shown here were imported from different studies, listed below. The\ Bloom lab papers used deep mutational scanning data to measure the effect of all \ possible mutations in the Spike RBD using a yeast surface display system.

\
    \
  1. Bloom lab - patients A-K: antibodies in sera from the Hospitalized or Ambulatory Adults with Respiratory Viral\ Infections (HAARVI) cohort, described in \ Greaney et al., Biorxiv 2021.
  2. \ \
  3. Bloom lab - 10 antibodies: A selection of ten monoclonal antibodies, described\ in Greaney et al, Cell Host Microbe 2020.\
  4. \ \
  5. Bloom lab - 4 treatment antibodies: Four monoclonal antibodies licensed for treatment.\ The results were described in Starr et al,\ Biorxiv 2021.\
  6. \ \
  7. Whelan lab - 21 antibodies: a selection screen of 21 neutralizing monoclonal\ antibodies (mAbs) against the receptor binding domain (RBD) generated 48 escape\ mutants. The results were described in \ Liu et al, Biorxiv 2020.\ \
  8. Rappuoli lab - serum from one patient: three mutations obtained by passaging of \ cells in neutralizing serum from a single patient, described in Andreano et al, Biorxiv 2021.\
  9. McCoy lab - mutations tested on monoclonal antibodies and patient sera, described in \ Rees-Spear et al, Biorxiv 2021.\
\

\ \

\ For the Bloom lab data, we show just a summary of the data. Better and detailed structural\ visualizations are available from the authors via dms-view using the following links:\ patient sera,\ 10 monoclonal antibodies,\ 4 treatment antibodies.\

\ \

Display Conventions and Configuration

\

Bloom lab data

\

Scores represent the "escape fraction" (discussed at length in the Methods \ of the paper) which "represent the fraction of a given variant that escape antibody \ binding, and should in principle range from 0 to 1.". \ "Note that the magnitude of the measured effects of mutations on antibody escape depends on\ the antibody concentration and the flow cytometry gates applied, meaning that the\ escape fractions are comparable across sites for any given antibody, but are not precisely\ comparable among antibodies without external calibration."

\ \ \

A higher score indicates a greater level of escape.

\ \

The data summarized to protein positions are shown as 36 subtracks, one per sample, that indicate\ the maximum score per amino acid position that was assayed as shades of color\ or, in full mode, as a x-y barplot. Blue subtracks show data from monoclonal\ antibodies, red ones from patient sera. By configuring the current track (click\ on "Antibody escape" under the image), one can display the total sum of all\ scores per amino acid.

\ \

The data is summarized as two x-y barplots, as the average values per amino acid,\ again in red (sera) and blue (MABs). Finally, another summary track has one feature\ per position where the score exceeds 0.18. These features are clickable and the details page\ show the exact amino acid changes and their scores. \ \

Whelan lab data

\

Features are labeled with the nucleotide and protein coordinates and the name of the antibody. \ Click a feature or mouse-over a feature to show these annotations.

\ \

Rappuoli lab data

\

The three mutations are labeled with the protein coordinates.

\ \

McCoy lab data

\

Features are labeled with the amino acid mutation coordinates. \ Click a feature or mouse-over a feature to show a description on the specific mutation.

\ \

Methods

\

\ Patient sera: data was downloaded from the jbloomlab Github file and parsed into bedGraph format.

\ \

10 Antibodies: Table S1 from Starr et al, was downloaded and parsed into bedGraph format.

\ \

4 treatment antibodies: Data was downloaded from the jbloomlab Github file and parsed into bedGraph format using the total and maximum values.

\ \

21 Antibodies: Table 2 from Liu et al 2020, was copied manually and converted to bedGraph format.

\ \

For the Rappuoli lab, the mutations were manually copied from the text.

\ \ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.

\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

\ Greaney AJ, Loes AN, Crawford K, Starr T, Malone K, Chu H, Bloom JD.\ \ Comprehensive mapping of mutations to the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human serum antibodies\ .\ Biorxiv. 2021 Jan 04;.\

\ \ Greaney AJ, Starr TN, Gilchuk P, Zost SJ, Binshtein E, Loes AN, Hilton SK, Huddleston J, Eguia R,\ Crawford KHD et al.\ \ Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody\ Recognition.\ Cell Host Microbe. 2020 Nov 19;.\ PMID: 33259788; PMC: PMC7676316\

\ \

\ Zhuoming Liu, Laura A. VanBlargan, Paul W. Rothlauf, Louis-Marie Bloyet, Rita E. Chen, Spencer Stumpf, Haiyan Zhao, John M. Errico, Elitza S. Theel, Ali H. Ellebedy, Daved H. Fremont, Michael S. Diamond, Sean P. J. Whelan\ \ Landscape analysis of escape variants identifies SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization.\ Biorxiv. 2020 \

\ \

\ Starr TN, Greaney AJ, Addetia A, Hannon WW, Choudhary MC, Dingens AS, Li JZ, Bloom JD.\ \ Prospective mapping of viral mutations that escape antibodies used to treat COVID-19.\ bioRxiv. 2020 Dec 1;.\ PMID: 33299993; PMC: PMC7724661\

\

\ Andreano E, Piccini G, Licastro D, Casalino L, Johnson NV, Paciello I, Monego SD, Pantano E,\ Manganaro N, Manenti A et al.\ \ SARS-CoV-2 escape <i>in vitro</i> from a highly neutralizing COVID-19 convalescent\ plasma.\ bioRxiv. 2020 Dec 28;.\ PMID: 33398278; PMC: PMC7781313\

\ immu 1 bigDataUrl /gbdb/wuhCor1/mccoy/mccoy.bb\ html abEscape\ longLabel McCoy lab: S Mutation impact on neutralization by serum and mAbs\ mouseOverField description\ noScoreFilter on\ parent abEscape\ priority 33.3\ shortLabel McCoy Escape\ track mccoy\ type bigBed 8 +\ visibility hide\ igg_COVID_436 COVID 436 bigBed 9 COVID 436 1 34 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_436.bb\ longLabel COVID 436\ parent igg on\ priority 34\ shortLabel COVID 436\ track igg_COVID_436\ type bigBed 9\ igm_COVID_526 COVID 526 bigBed 9 COVID 526 1 34 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_526.bb\ longLabel COVID 526\ parent igm on\ priority 34\ shortLabel COVID 526\ track igm_COVID_526\ type bigBed 9\ variantNucMutsV2_B_1_429 Epsilon Nuc Muts bigBed 4 Epsilon VUM (B.1.429 USA Mar-2020) nucleotide mutations in 1360 GISAID sequences (Feb 5, 2021) 1 34 160 190 89 207 222 172 0 0 0 https://outbreak.info/situation-reports?pango=B.1.429 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantNucMuts_B.1.429_2021_02_05.bb\ color 160,190,89\ longLabel Epsilon VUM (B.1.429 USA Mar-2020) nucleotide mutations in 1360 GISAID sequences (Feb 5, 2021)\ parent variantMuts off\ priority 117\ shortLabel Epsilon Nuc Muts\ subGroups variant=E_B1429 mutation=NUC designation=VUM\ track variantNucMutsV2_B_1_429\ url https://outbreak.info/situation-reports?pango=B.1.429\ urlLabel B.1.429 Situation Report at outbreak.info\ J_15Total Serum: J, day 015 bigWig Bloom antibody escape - Total Score - Subject J, Day 15 1 34 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/J_15.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject J, Day 15\ parent bloomEscTotal on\ shortLabel Serum: J, day 015\ track J_15Total\ type bigWig\ visibility dense\ Harr_24hr_2 Vero6 Harr 24hr 2 bigWig Vero6 Harr 24hr 2 0 34 179 0 0 217 127 127 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_harr_24hr_2.bw\ color 179,0,0\ longLabel Vero6 Harr 24hr 2\ maxHeightPixels 124:32:5\ parent Vero6_24hpi on\ priority 34\ shortLabel Vero6 Harr 24hr 2\ track Harr_24hr_2\ type bigWig\ viewLimits 0:10\ visibility hide\ igg_COVID_429 COVID 429 bigBed 9 COVID 429 1 35 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_429.bb\ longLabel COVID 429\ parent igg on\ priority 35\ shortLabel COVID 429\ track igg_COVID_429\ type bigBed 9\ igm_COVID_529 COVID 529 bigBed 9 COVID 529 1 35 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_529.bb\ longLabel COVID 529\ parent igm on\ priority 35\ shortLabel COVID 529\ track igm_COVID_529\ type bigBed 9\ variantAaMutsV2_B_1_525 Eta AA Muts bigBed 4 Eta VUM (B.1.525 Dec-2020) amino acid mutations in 3000 GISAID sequences (Sep 10, 2021) 1 35 187 188 73 221 221 164 0 0 0 https://outbreak.info/situation-reports?pango=B.1.525 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantAaMuts_B.1.525_2021_09_10.bb\ color 187,188,73\ longLabel Eta VUM (B.1.525 Dec-2020) amino acid mutations in 3000 GISAID sequences (Sep 10, 2021)\ parent variantMuts off\ priority 8\ shortLabel Eta AA Muts\ subGroups variant=E_B1525 mutation=AA designation=VUM\ track variantAaMutsV2_B_1_525\ url https://outbreak.info/situation-reports?pango=B.1.525\ urlLabel B.1.525 Situation Report at outbreak.info\ J_121Total Serum: J, day 121 bigWig Bloom antibody escape - Total Score - Subject J, Day 121 1 35 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/J_121.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject J, Day 121\ parent bloomEscTotal on\ shortLabel Serum: J, day 121\ track J_121Total\ type bigWig\ visibility dense\ LTM_24hr_1 Vero6 LTM 24hr 1 bigWig Vero6 LTM 24hr 1 2 35 35 139 69 145 197 162 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_ltm_24hr_1.bw\ color 35,139,69\ longLabel Vero6 LTM 24hr 1\ maxHeightPixels 124:32:5\ parent Vero6_24hpi on\ priority 35\ shortLabel Vero6 LTM 24hr 1\ track LTM_24hr_1\ type bigWig\ viewLimits 0:10\ visibility full\ igm_COVID_18 COVID 18 bigBed 9 COVID 18 1 36 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_18.bb\ longLabel COVID 18\ parent igm on\ priority 36\ shortLabel COVID 18\ track igm_COVID_18\ type bigBed 9\ igg_COVID_502 COVID 502 bigBed 9 COVID 502 1 36 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_502.bb\ longLabel COVID 502\ parent igg on\ priority 36\ shortLabel COVID 502\ track igg_COVID_502\ type bigBed 9\ variantNucMutsV2_B_1_525 Eta Nuc Muts bigBed 4 Eta VUM (B.1.525 Dec-2020) nucleotide mutations in 3000 GISAID sequences (Sep 10, 2021) 1 36 187 188 73 221 221 164 0 0 0 https://outbreak.info/situation-reports?pango=B.1.525 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantNucMuts_B.1.525_2021_09_10.bb\ color 187,188,73\ longLabel Eta VUM (B.1.525 Dec-2020) nucleotide mutations in 3000 GISAID sequences (Sep 10, 2021)\ parent variantMuts off\ priority 118\ shortLabel Eta Nuc Muts\ subGroups variant=E_B1525 mutation=NUC designation=VUM\ track variantNucMutsV2_B_1_525\ url https://outbreak.info/situation-reports?pango=B.1.525\ urlLabel B.1.525 Situation Report at outbreak.info\ K_29Total Serum: K, day 029 bigWig Bloom antibody escape - Total Score - Subject K, Day 29 1 36 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/K_29.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject K, Day 29\ parent bloomEscTotal on\ shortLabel Serum: K, day 029\ track K_29Total\ type bigWig\ visibility dense\ LTM_24hr_2 Vero6 LTM 24hr 2 bigWig Vero6 LTM 24hr 2 0 36 35 139 69 145 197 162 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/fp_ltm_24hr_2.bw\ color 35,139,69\ longLabel Vero6 LTM 24hr 2\ maxHeightPixels 124:32:5\ parent Vero6_24hpi on\ priority 36\ shortLabel Vero6 LTM 24hr 2\ track LTM_24hr_2\ type bigWig\ viewLimits 0:10\ visibility hide\ igg_COVID_18 COVID 18 bigBed 9 COVID 18 1 37 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_18.bb\ longLabel COVID 18\ parent igg on\ priority 37\ shortLabel COVID 18\ track igg_COVID_18\ type bigBed 9\ igm_COVID_419 COVID 419 bigBed 9 COVID 419 1 37 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_419.bb\ longLabel COVID 419\ parent igm on\ priority 37\ shortLabel COVID 419\ track igm_COVID_419\ type bigBed 9\ variantAaMutsV2_B_1_526 Iota AA Muts bigBed 4 Iota VUM (B.1.526 USA Nov-2020) amino acid mutations in 3000 GISAID sequences (Sep 10, 2021) 1 37 225 159 58 240 207 156 0 0 0 https://outbreak.info/situation-reports?pango=B.1.526 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantAaMuts_B.1.526_2021_09_10.bb\ color 225,159,58\ longLabel Iota VUM (B.1.526 USA Nov-2020) amino acid mutations in 3000 GISAID sequences (Sep 10, 2021)\ parent variantMuts off\ priority 9\ shortLabel Iota AA Muts\ subGroups variant=I_B1526 mutation=AA designation=VUM\ track variantAaMutsV2_B_1_526\ url https://outbreak.info/situation-reports?pango=B.1.526\ urlLabel B.1.526 Situation Report at outbreak.info\ K_103Total Serum: K, day 103 bigWig Bloom antibody escape - Total Score - Subject K, Day 103 1 37 0 0 0 127 127 127 0 0 0 immu 0 bigDataUrl /gbdb/wuhCor1/bloomEsc/K_103.tot.bw\ longLabel Bloom antibody escape - Total Score - Subject K, Day 103\ parent bloomEscTotal on\ shortLabel Serum: K, day 103\ track K_103Total\ type bigWig\ visibility dense\ mRNA-seq_24hr_1 Vero6 mRNA 24hr 1 bigWig Vero6 mRNA 24hr 1 2 37 63 0 125 159 127 190 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/mrna_24hr_1.bw\ color 63,0,125\ longLabel Vero6 mRNA 24hr 1\ maxHeightPixels 124:32:5\ parent Vero6_24hpi on\ priority 37\ shortLabel Vero6 mRNA 24hr 1\ track mRNA-seq_24hr_1\ type bigWig\ viewLimits 0:10\ visibility full\ igg_COVID_4021 COVID 4021 bigBed 9 COVID 4021 1 38 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_4021.bb\ longLabel COVID 4021\ parent igg on\ priority 38\ shortLabel COVID 4021\ track igg_COVID_4021\ type bigBed 9\ igm_COVID_4021 COVID 4021 bigBed 9 COVID 4021 1 38 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_4021.bb\ longLabel COVID 4021\ parent igm on\ priority 38\ shortLabel COVID 4021\ track igm_COVID_4021\ type bigBed 9\ variantNucMutsV2_B_1_526 Iota Nuc Muts bigBed 4 Iota VUM (B.1.526 USA Nov-2020) nucleotide mutations in 3000 GISAID sequences (Sep 10, 2021) 1 38 225 159 58 240 207 156 0 0 0 https://outbreak.info/situation-reports?pango=B.1.526 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantNucMuts_B.1.526_2021_09_10.bb\ color 225,159,58\ longLabel Iota VUM (B.1.526 USA Nov-2020) nucleotide mutations in 3000 GISAID sequences (Sep 10, 2021)\ parent variantMuts off\ priority 119\ shortLabel Iota Nuc Muts\ subGroups variant=I_B1526 mutation=NUC designation=VUM\ track variantNucMutsV2_B_1_526\ url https://outbreak.info/situation-reports?pango=B.1.526\ urlLabel B.1.526 Situation Report at outbreak.info\ mRNA-seq_24hr_2 Vero6 mRNA 24hr 2 bigWig Vero6 mRNA 24hr 2 0 38 63 0 125 159 127 190 0 0 0 genes 0 alwaysZero on\ autoScale on\ bigDataUrl /gbdb/wuhCor1/bbi/weizmanOrfs/mrna_24hr_2.bw\ color 63,0,125\ longLabel Vero6 mRNA 24hr 2\ maxHeightPixels 124:32:5\ parent Vero6_24hpi on\ priority 38\ shortLabel Vero6 mRNA 24hr 2\ track mRNA-seq_24hr_2\ type bigWig\ viewLimits 0:10\ visibility hide\ igm_COVID_415 COVID 415 bigBed 9 COVID 415 1 39 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_415.bb\ longLabel COVID 415\ parent igm on\ priority 39\ shortLabel COVID 415\ track igm_COVID_415\ type bigBed 9\ igg_COVID_526 COVID 526 bigBed 9 COVID 526 1 39 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_526.bb\ longLabel COVID 526\ parent igg on\ priority 39\ shortLabel COVID 526\ track igg_COVID_526\ type bigBed 9\ variantAaMutsV2_B_1_617_1 Kappa AA Muts bigBed 4 Kappa VUM (B.1.617.1 India Oct-2020) amino acid mutations in 3000 GISAID sequences (Sep 10, 2021) 1 39 133 186 111 194 220 183 0 0 0 https://outbreak.info/situation-reports?pango=B.1.617.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantAaMuts_B.1.617.1_2021_09_10.bb\ color 133,186,111\ longLabel Kappa VUM (B.1.617.1 India Oct-2020) amino acid mutations in 3000 GISAID sequences (Sep 10, 2021)\ parent variantMuts off\ priority 10\ shortLabel Kappa AA Muts\ subGroups variant=K_B16171 mutation=AA designation=VUM\ track variantAaMutsV2_B_1_617_1\ url https://outbreak.info/situation-reports?pango=B.1.617.1\ urlLabel B.1.617.1 Situation Report at outbreak.info\ nextstrainFreq20B 20B bigWig Nextstrain, 20B clade: Alternate allele frequency 1 40 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20B.bigWig\ longLabel Nextstrain, 20B clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 40\ shortLabel 20B\ subGroups view=newClades\ track nextstrainFreq20B\ type bigWig\ visibility dense\ nextstrainSamples20B 20B Mutations vcfTabix Mutations in Clade 20B Nextstrain Subset of GISAID EpiCoV TM Samples 0 40 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20B.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain20B.nh\ longLabel Mutations in Clade 20B Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades\ priority 40\ shortLabel 20B Mutations\ subGroups view=newClades\ track nextstrainSamples20B\ galaxyEnaQ2Ay-120 AY.120 mutations bigBed 8 + Mutations (amino acid level) in AY.120 between 2021-07-05 and 2021-10-05 1 40 184 133 10 219 194 132 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

    \
  • Sample collection date
  • \
  • Sequencing platform, library layout and strategy (currently reanalysis is done for ampliconic paired-end Illumina and ONT data)
  • \
  • the primer scheme used for the generation of amplicons (this information is used to trim primer sequences from the data before variant calling; reanalysis can be done for any primer scheme with publicly available primer binding site information)
  • \
  • some kind of discernible batch information (e.g. a library identifier) that can be used to form batches of samples for reanalysis and batch-level reporting
  • \
\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

    \
  • the precise value for its observed frequency in the lineage and quarter
  • \
  • the intra-sample allele frequency (median and lower/upper quartile) at which the mutation has been called in the samples in which it has been detected.
  • \
  • the collection date and the collecting country of the sample, in which this mutation was first (ever) detected in the context of the lineage. Note that for older, still circulating lineages the collection date of that sample can be older than the start of the earliest quarter displayed in the genome browser (since our complete surveillance data goes back further than four quarters).
  • \
\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

    \
  • country or combination of countries in which samples of the given lineage and collection quarter with the mutation have been collected. You could for example filter all current quarter lineage tracks to show only mutations that have been found (in their respective lineage) in the UK.
  • \
  • within-lineage frequency. By default only mutations are shown that have been observed in at least 5% of the samples assigned to the given lineage in the given quarter (0.05 default filter setting). You can lower or increase that threshold as you see fit. Note however, that the underlying bigbed data of the tracks is filtered to contain only data for mutations above a threshold of 0.1% (i.e. a 0.001 hard filter is always in effect).
  • \
\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q2/03_AY.120_data.bb\ color 184,133,10\ html galaxyEna\ longLabel Mutations (amino acid level) in AY.120 between 2021-07-05 and 2021-10-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q2_tracks on\ priority 40\ shortLabel AY.120 mutations\ spectrum on\ track galaxyEnaQ2Ay-120\ type bigBed 8 +\ galaxyEnaQ1Ay-43 AY.43 mutations bigBed 8 + Mutations (amino acid level) in AY.43 between 2021-10-05 and 2022-01-05 1 40 18 113 28 136 184 141 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

    \
  • Sample collection date
  • \
  • Sequencing platform, library layout and strategy (currently reanalysis is done for ampliconic paired-end Illumina and ONT data)
  • \
  • the primer scheme used for the generation of amplicons (this information is used to trim primer sequences from the data before variant calling; reanalysis can be done for any primer scheme with publicly available primer binding site information)
  • \
  • some kind of discernible batch information (e.g. a library identifier) that can be used to form batches of samples for reanalysis and batch-level reporting
  • \
\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

    \
  • the precise value for its observed frequency in the lineage and quarter
  • \
  • the intra-sample allele frequency (median and lower/upper quartile) at which the mutation has been called in the samples in which it has been detected.
  • \
  • the collection date and the collecting country of the sample, in which this mutation was first (ever) detected in the context of the lineage. Note that for older, still circulating lineages the collection date of that sample can be older than the start of the earliest quarter displayed in the genome browser (since our complete surveillance data goes back further than four quarters).
  • \
\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

    \
  • country or combination of countries in which samples of the given lineage and collection quarter with the mutation have been collected. You could for example filter all current quarter lineage tracks to show only mutations that have been found (in their respective lineage) in the UK.
  • \
  • within-lineage frequency. By default only mutations are shown that have been observed in at least 5% of the samples assigned to the given lineage in the given quarter (0.05 default filter setting). You can lower or increase that threshold as you see fit. Note however, that the underlying bigbed data of the tracks is filtered to contain only data for mutations above a threshold of 0.1% (i.e. a 0.001 hard filter is always in effect).
  • \
\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q1/03_AY.43_data.bb\ color 18,113,28\ html galaxyEna\ longLabel Mutations (amino acid level) in AY.43 between 2021-10-05 and 2022-01-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q1_tracks on\ priority 40\ shortLabel AY.43 mutations\ spectrum on\ track galaxyEnaQ1Ay-43\ type bigBed 8 +\ galaxyEnaQ3Ay-5 AY.5 mutations bigBed 8 + Mutations (amino acid level) in AY.5 between 2021-04-05 and 2021-07-05 1 40 0 28 127 127 141 191 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

    \
  • Sample collection date
  • \
  • Sequencing platform, library layout and strategy (currently reanalysis is done for ampliconic paired-end Illumina and ONT data)
  • \
  • the primer scheme used for the generation of amplicons (this information is used to trim primer sequences from the data before variant calling; reanalysis can be done for any primer scheme with publicly available primer binding site information)
  • \
  • some kind of discernible batch information (e.g. a library identifier) that can be used to form batches of samples for reanalysis and batch-level reporting
  • \
\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

    \
  • the precise value for its observed frequency in the lineage and quarter
  • \
  • the intra-sample allele frequency (median and lower/upper quartile) at which the mutation has been called in the samples in which it has been detected.
  • \
  • the collection date and the collecting country of the sample, in which this mutation was first (ever) detected in the context of the lineage. Note that for older, still circulating lineages the collection date of that sample can be older than the start of the earliest quarter displayed in the genome browser (since our complete surveillance data goes back further than four quarters).
  • \
\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

    \
  • country or combination of countries in which samples of the given lineage and collection quarter with the mutation have been collected. You could for example filter all current quarter lineage tracks to show only mutations that have been found (in their respective lineage) in the UK.
  • \
  • within-lineage frequency. By default only mutations are shown that have been observed in at least 5% of the samples assigned to the given lineage in the given quarter (0.05 default filter setting). You can lower or increase that threshold as you see fit. Note however, that the underlying bigbed data of the tracks is filtered to contain only data for mutations above a threshold of 0.1% (i.e. a 0.001 hard filter is always in effect).
  • \
\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q3/03_AY.5_data.bb\ color 0,28,127\ html galaxyEna\ longLabel Mutations (amino acid level) in AY.5 between 2021-04-05 and 2021-07-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q3_tracks on\ priority 40\ shortLabel AY.5 mutations\ spectrum on\ track galaxyEnaQ3Ay-5\ type bigBed 8 +\ galaxyEnaQ0Ba-1 BA.1 mutations bigBed 8 + Mutations (amino acid level) in BA.1 between 2022-01-05 and 2022-03-09 3 40 162 53 130 208 154 192 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

    \
  • Sample collection date
  • \
  • Sequencing platform, library layout and strategy (currently reanalysis is done for ampliconic paired-end Illumina and ONT data)
  • \
  • the primer scheme used for the generation of amplicons (this information is used to trim primer sequences from the data before variant calling; reanalysis can be done for any primer scheme with publicly available primer binding site information)
  • \
  • some kind of discernible batch information (e.g. a library identifier) that can be used to form batches of samples for reanalysis and batch-level reporting
  • \
\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

    \
  • the precise value for its observed frequency in the lineage and quarter
  • \
  • the intra-sample allele frequency (median and lower/upper quartile) at which the mutation has been called in the samples in which it has been detected.
  • \
  • the collection date and the collecting country of the sample, in which this mutation was first (ever) detected in the context of the lineage. Note that for older, still circulating lineages the collection date of that sample can be older than the start of the earliest quarter displayed in the genome browser (since our complete surveillance data goes back further than four quarters).
  • \
\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

    \
  • country or combination of countries in which samples of the given lineage and collection quarter with the mutation have been collected. You could for example filter all current quarter lineage tracks to show only mutations that have been found (in their respective lineage) in the UK.
  • \
  • within-lineage frequency. By default only mutations are shown that have been observed in at least 5% of the samples assigned to the given lineage in the given quarter (0.05 default filter setting). You can lower or increase that threshold as you see fit. Note however, that the underlying bigbed data of the tracks is filtered to contain only data for mutations above a threshold of 0.1% (i.e. a 0.001 hard filter is always in effect).
  • \
\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q0/03_BA.1_data.bb\ color 162,53,130\ html galaxyEna\ longLabel Mutations (amino acid level) in BA.1 between 2022-01-05 and 2022-03-09\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q0_tracks on\ priority 40\ shortLabel BA.1 mutations\ spectrum on\ track galaxyEnaQ0Ba-1\ type bigBed 8 +\ cas13Crispr Cas13 CRISPR bigBed 5 + Cas13 CRISPR targets 0 40 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the in-silico design of crRNAs for Cas13 using\ the tool nCov2019_Guide_Design, as described in Abbott et al., 2020.

\ \

\ To target highly conserved regions of the SARS-CoV-2 genome, an in-silico collection of \ all 3,802 possible crRNAs were generated. After excluding crRNAs that are either predicted \ to have potential off-target binding (≤2 mismatches in the human transcriptome) or \ having poly-T sequences that may prevent crRNA expression (≥4 consecutive Ts), a set \ of 3,203 crRNAs were obtained. These crRNAs are also able to target SARS and \ MERS with ≤1 mismatch.

\ \

\ Each crRNA has been characterized with four features:\

    \
  • Efficiency is predicted using the online tool at https://gitlab.com/sanjanalab/cas13
  • \
  • Specificy is determined by the number of off-target loci in human mRNA with ≤2 \ mismatches to the crRNA
  • \
  • Generality within Coronaviridae is quantified as the percentage of Coronaviridae strains \ targeted by the given crRNA with perfect identity
  • \
  • Generality within SARS-CoV-2 is quantified as the percentage of 1,087 SARS-CoV-2 patient \ genomes downloaded on March 20, 2020 that are targeted by the given crRNA with perfect \ identity

\ \

Method

\

\ To design all possible crRNAs for the three pathogenic RNA viruses (SARS-CoV-2, SARS-CoV, \ and MERS-CoV), the reference genomes of SARS-CoV, MERS-CoV, along with SARS-CoV-2 genomes \ derived from 47 patients were first aligned by MAFFT using the --auto flag. \ crRNA candidates were identified by using a sliding window to extract all 22-nucleotide \ (nt) sequences with perfect identity among the SARS-CoV-2 genomes.

\

\ We annotated each crRNA candidate with the number of mismatches relative to the SARS-CoV \ and MERS-CoV genomes, as well as the GC content. 3,802 crRNA candidates were selected with \ perfect match against the 47 SARS-CoV-2 genomes and with ≤1 mismatch to SARS-CoV \ or MERS-CoV sequences. To characterize the specificity of 22-nt crRNAs, we ensured \ that each crRNA does not target any sequences in the human transcriptome.

\

\ We used Bowtie 1.2.2 to align crRNAs to the human transcriptome (hg38; \ including non-coding RNA) and removed crRNAs that mapped to the human \ transcriptome with ≤2 mismatches.

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/cas13Crispr.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

Credits

\

\ The predictions for this track are produced by \ Xueqiu Lin and Augustine Chemparathy in Stanley Qi lab at Stanford University

\ \

References

\

\ Abbott, Timothy R., Girija Dhamdhere, Yanxia Liu, Xueqiu Lin, Laine Goudy, Leiping Zeng, \ Augustine Chemparathy, et al.\ , 2020.\ \ Development of CRISPR as a Prophylactic Strategy to Combat Novel Coronavirus and Influenza. \ bioRxiv

\ map 1 bigDataUrl /gbdb/wuhCor1/bbi/cas13Crispr.bb\ exonNumbers off\ filter.EfficiencyScore 0\ filterLimits.EfficiencyScore 0:700\ group map\ longLabel Cas13 CRISPR targets\ priority 40\ scoreMax 700\ shortLabel Cas13 CRISPR\ spectrum on\ track cas13Crispr\ type bigBed 5 +\ visibility hide\ igg_COVID_13 COVID 13 bigBed 9 COVID 13 1 40 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_13.bb\ longLabel COVID 13\ parent igg on\ priority 40\ shortLabel COVID 13\ track igg_COVID_13\ type bigBed 9\ igm_COVID_19 COVID 19 bigBed 9 COVID 19 1 40 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_19.bb\ longLabel COVID 19\ parent igm on\ priority 40\ shortLabel COVID 19\ track igm_COVID_19\ type bigBed 9\ Q3_tracks Galaxy ENA mutations in top lineages - three quarters ago bigBed 8 + Most frequent lineages of three quarters ago 1 40 0 0 0 127 127 127 0 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

    \
  • Sample collection date
  • \
  • Sequencing platform, library layout and strategy (currently reanalysis is done for ampliconic paired-end Illumina and ONT data)
  • \
  • the primer scheme used for the generation of amplicons (this information is used to trim primer sequences from the data before variant calling; reanalysis can be done for any primer scheme with publicly available primer binding site information)
  • \
  • some kind of discernible batch information (e.g. a library identifier) that can be used to form batches of samples for reanalysis and batch-level reporting
  • \
\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

    \
  • the precise value for its observed frequency in the lineage and quarter
  • \
  • the intra-sample allele frequency (median and lower/upper quartile) at which the mutation has been called in the samples in which it has been detected.
  • \
  • the collection date and the collecting country of the sample, in which this mutation was first (ever) detected in the context of the lineage. Note that for older, still circulating lineages the collection date of that sample can be older than the start of the earliest quarter displayed in the genome browser (since our complete surveillance data goes back further than four quarters).
  • \
\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

    \
  • country or combination of countries in which samples of the given lineage and collection quarter with the mutation have been collected. You could for example filter all current quarter lineage tracks to show only mutations that have been found (in their respective lineage) in the UK.
  • \
  • within-lineage frequency. By default only mutations are shown that have been observed in at least 5% of the samples assigned to the given lineage in the given quarter (0.05 default filter setting). You can lower or increase that threshold as you see fit. Note however, that the underlying bigbed data of the tracks is filtered to contain only data for mutations above a threshold of 0.1% (i.e. a 0.001 hard filter is always in effect).
  • \
\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 allButtonPair on\ compositeTrack on\ filter.withinLineageFrequency 0.05\ filterByRange.withinLineageFrequency on\ filterLimits.withinLineageFrequency 0:2\ filterType.countries multipleListOr\ filterValues.countries EE|Estonia,GB|United Kingdom,GR|Greece,IE|Ireland,ZA|South Africa\ filterValuesDefault.countries EE,GB,GR,ZA\ html galaxyEna\ longLabel Most frequent lineages of three quarters ago\ parent galaxyEna\ priority 40\ shortLabel Galaxy ENA mutations in top lineages - three quarters ago\ track Q3_tracks\ type bigBed 8 +\ visibility dense\ variantNucMutsV2_B_1_617_1 Kappa Nuc Muts bigBed 4 Kappa VUM (B.1.617.1 India Oct-2020) nucleotide mutations in 3000 GISAID sequences (Sep 10, 2021) 1 40 133 186 111 194 220 183 0 0 0 https://outbreak.info/situation-reports?pango=B.1.617.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/variantNucMuts_B.1.617.1_2021_09_10.bb\ color 133,186,111\ longLabel Kappa VUM (B.1.617.1 India Oct-2020) nucleotide mutations in 3000 GISAID sequences (Sep 10, 2021)\ parent variantMuts off\ priority 120\ shortLabel Kappa Nuc Muts\ subGroups variant=K_B16171 mutation=NUC designation=VUM\ track variantNucMutsV2_B_1_617_1\ url https://outbreak.info/situation-reports?pango=B.1.617.1\ urlLabel B.1.617.1 Situation Report at outbreak.info\ igg_COVID_509 COVID 509 bigBed 9 COVID 509 1 41 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_509.bb\ longLabel COVID 509\ parent igg on\ priority 41\ shortLabel COVID 509\ track igg_COVID_509\ type bigBed 9\ igm_COVID_534 COVID 534 bigBed 9 COVID 534 1 41 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_534.bb\ longLabel COVID 534\ parent igm on\ priority 41\ shortLabel COVID 534\ track igm_COVID_534\ type bigBed 9\ variantAaMuts_BA_4 Omicron BA.4 AA Muts bigBed 4 Omicron BA.4 amino acid mutations from 573 GISAID sequences (May 4, 2022) 1 41 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.4 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BA.4_aa.bb\ color 219,40,35\ longLabel Omicron BA.4 amino acid mutations from 573 GISAID sequences (May 4, 2022)\ parent variantMuts off\ priority 13\ shortLabel Omicron BA.4 AA Muts\ subGroups variant=L_BA4 mutation=AA designation=VUM\ track variantAaMuts_BA_4\ url https://outbreak.info/situation-reports?pango=BA.4\ urlLabel BA.4 Situation Report at outbreak.info\ igm_COVID_14 COVID 14 bigBed 9 COVID 14 1 42 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_14.bb\ longLabel COVID 14\ parent igm on\ priority 42\ shortLabel COVID 14\ track igm_COVID_14\ type bigBed 9\ igg_COVID_19 COVID 19 bigBed 9 COVID 19 1 42 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_19.bb\ longLabel COVID 19\ parent igg on\ priority 42\ shortLabel COVID 19\ track igg_COVID_19\ type bigBed 9\ variantNucMuts_BA_4 Omicron BA.4 Nuc Muts bigBed 4 Omicron VOC (BA.4) nucleotide mutations identifed from 573 GISAID sequences (May 2022) 1 42 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.4 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BA.4_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (BA.4) nucleotide mutations identifed from 573 GISAID sequences (May 2022)\ parent variantMuts off\ priority 123\ shortLabel Omicron BA.4 Nuc Muts\ subGroups variant=L_BA4 mutation=NUC designation=VUM\ track variantNucMuts_BA_4\ url https://outbreak.info/situation-reports?pango=BA.4\ urlLabel BA.4 Situation Report at outbreak.info\ igm_COVID_406 COVID 406 bigBed 9 COVID 406 1 43 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_406.bb\ longLabel COVID 406\ parent igm on\ priority 43\ shortLabel COVID 406\ track igm_COVID_406\ type bigBed 9\ igg_COVID_744 COVID 744 bigBed 9 COVID 744 1 43 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_744.bb\ longLabel COVID 744\ parent igg on\ priority 43\ shortLabel COVID 744\ track igg_COVID_744\ type bigBed 9\ variantAaMuts_BA_5 Omicron BA.5 AA Muts bigBed 4 Omicron BA.5 amino acid mutations from 287 GISAID sequences (May 4, 2022) 1 43 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.5 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BA.5_aa.bb\ color 219,40,35\ longLabel Omicron BA.5 amino acid mutations from 287 GISAID sequences (May 4, 2022)\ parent variantMuts off\ priority 14\ shortLabel Omicron BA.5 AA Muts\ subGroups variant=M_BA5 mutation=AA designation=VUM\ track variantAaMuts_BA_5\ url https://outbreak.info/situation-reports?pango=BA.5\ urlLabel BA.5 Situation Report at outbreak.info\ igg_COVID_415 COVID 415 bigBed 9 COVID 415 1 44 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_415.bb\ longLabel COVID 415\ parent igg on\ priority 44\ shortLabel COVID 415\ track igg_COVID_415\ type bigBed 9\ igm_COVID_532 COVID 532 bigBed 9 COVID 532 1 44 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_532.bb\ longLabel COVID 532\ parent igm on\ priority 44\ shortLabel COVID 532\ track igm_COVID_532\ type bigBed 9\ variantNucMuts_BA_5 Omicron BA.5 Nuc Muts bigBed 4 Omicron VOC (BA.5) nucleotide mutations identifed from 287 GISAID sequences (May 2022) 1 44 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.5 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BA.5_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (BA.5) nucleotide mutations identifed from 287 GISAID sequences (May 2022)\ parent variantMuts off\ priority 124\ shortLabel Omicron BA.5 Nuc Muts\ subGroups variant=M_BA5 mutation=NUC designation=VUM\ track variantNucMuts_BA_5\ url https://outbreak.info/situation-reports?pango=BA.5\ urlLabel BA.5 Situation Report at outbreak.info\ igg_COVID_405 COVID 405 bigBed 9 COVID 405 1 45 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_405.bb\ longLabel COVID 405\ parent igg on\ priority 45\ shortLabel COVID 405\ track igg_COVID_405\ type bigBed 9\ igm_COVID_5333 COVID 5333 bigBed 9 COVID 5333 1 45 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_5333.bb\ longLabel COVID 5333\ parent igm on\ priority 45\ shortLabel COVID 5333\ track igm_COVID_5333\ type bigBed 9\ variantAaMuts_BA_2_12_1 Omicron BA.2.12.1 AA Muts bigBed 4 Omicron BA.2.12.1 amino acid mutations from GISAID sequences (Sep 22, 2023) 1 45 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.2.12.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BA.2.12.1_prot.bb\ color 219,40,35\ longLabel Omicron BA.2.12.1 amino acid mutations from GISAID sequences (Sep 22, 2023)\ parent variantMuts off\ priority 15\ shortLabel Omicron BA.2.12.1 AA Muts\ subGroups variant=N_BA2121 mutation=AA designation=VUM\ track variantAaMuts_BA_2_12_1\ url https://outbreak.info/situation-reports?pango=BA.2.12.1\ urlLabel BA.2.12.1 Situation Report at outbreak.info\ igg_COVID_404 COVID 404 bigBed 9 COVID 404 1 46 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_404.bb\ longLabel COVID 404\ parent igg on\ priority 46\ shortLabel COVID 404\ track igg_COVID_404\ type bigBed 9\ igm_COVID_520 COVID 520 bigBed 9 COVID 520 1 46 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_520.bb\ longLabel COVID 520\ parent igm on\ priority 46\ shortLabel COVID 520\ track igm_COVID_520\ type bigBed 9\ variantNucMuts_BA_2_12_1 Omicron BA.2.12.1 Nuc Muts bigBed 4 Omicron VOC (BA.2.12.1) nucleotide mutations identifed from GISAID sequences (Sep 2023) 1 46 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BA.2.12.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BA.2.12.1_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (BA.2.12.1) nucleotide mutations identifed from GISAID sequences (Sep 2023)\ parent variantMuts off\ priority 125\ shortLabel Omicron BA.2.12.1 Nuc Muts\ subGroups variant=N_BA2121 mutation=NUC designation=VUM\ track variantNucMuts_BA_2_12_1\ url https://outbreak.info/situation-reports?pango=BA.2.12.1\ urlLabel BA.2.12.1 Situation Report at outbreak.info\ igg_COVID_528 COVID 528 bigBed 9 COVID 528 1 47 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_528.bb\ longLabel COVID 528\ parent igg on\ priority 47\ shortLabel COVID 528\ track igg_COVID_528\ type bigBed 9\ igm_COVID_608 COVID 608 bigBed 9 COVID 608 1 47 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_608.bb\ longLabel COVID 608\ parent igm on\ priority 47\ shortLabel COVID 608\ track igm_COVID_608\ type bigBed 9\ variantAaMuts_BQ_1 Omicron BQ.1 AA Muts bigBed 4 Omicron BQ.1 amino acid mutations from GISAID sequences (Sep 22, 2023) 1 47 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BQ.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BQ.1_prot.bb\ color 219,40,35\ longLabel Omicron BQ.1 amino acid mutations from GISAID sequences (Sep 22, 2023)\ parent variantMuts off\ priority 17\ shortLabel Omicron BQ.1 AA Muts\ subGroups variant=P_BQ1 mutation=AA designation=VUM\ track variantAaMuts_BQ_1\ url https://outbreak.info/situation-reports?pango=BQ.1\ urlLabel BQ.1 Situation Report at outbreak.info\ igg_COVID_508 COVID 508 bigBed 9 COVID 508 1 48 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_508.bb\ longLabel COVID 508\ parent igg on\ priority 48\ shortLabel COVID 508\ track igg_COVID_508\ type bigBed 9\ igm_Ctrl_LC174 Ctrl LC174 bigBed 9 Ctrl LC174 1 48 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/Ctrl_LC174.bb\ longLabel Ctrl LC174\ parent igm on\ priority 48\ shortLabel Ctrl LC174\ track igm_Ctrl_LC174\ type bigBed 9\ variantNucMuts_BQ_1 Omicron BQ.1 Nuc Muts bigBed 4 Omicron VOC (BQ.1) nucleotide mutations identifed from GISAID sequences (Sep 2023) 1 48 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=BQ.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/BQ.1_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (BQ.1) nucleotide mutations identifed from GISAID sequences (Sep 2023)\ parent variantMuts off\ priority 127\ shortLabel Omicron BQ.1 Nuc Muts\ subGroups variant=P_BQ1 mutation=NUC designation=VUM\ track variantNucMuts_BQ_1\ url https://outbreak.info/situation-reports?pango=BQ.1\ urlLabel BQ.1 Situation Report at outbreak.info\ igg_COVID_406 COVID 406 bigBed 9 COVID 406 1 49 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_406.bb\ longLabel COVID 406\ parent igg on\ priority 49\ shortLabel COVID 406\ track igg_COVID_406\ type bigBed 9\ igm_COVID_508 COVID 508 bigBed 9 COVID 508 1 49 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_508.bb\ longLabel COVID 508\ parent igm on\ priority 49\ shortLabel COVID 508\ track igm_COVID_508\ type bigBed 9\ variantAaMuts_XBB Omicron XBB AA Muts bigBed 4 Omicron XBB amino acid mutations from GISAID sequences (Sep 22, 2023) 1 49 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=XBB varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/XBB_prot.bb\ color 219,40,35\ longLabel Omicron XBB amino acid mutations from GISAID sequences (Sep 22, 2023)\ parent variantMuts off\ priority 18\ shortLabel Omicron XBB AA Muts\ subGroups variant=Q_XBB mutation=AA designation=VUM\ track variantAaMuts_XBB\ url https://outbreak.info/situation-reports?pango=XBB\ urlLabel XBB Situation Report at outbreak.info\ nextstrainFreq20C 20C bigWig Nextstrain, 20C clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20C.bigWig\ longLabel Nextstrain, 20C clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 20C\ subGroups view=newClades\ track nextstrainFreq20C\ type bigWig\ visibility dense\ nextstrainSamples20C 20C Mutations vcfTabix Mutations in Clade 20C Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20C.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain20C.nh\ longLabel Mutations in Clade 20C Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades\ priority 50\ shortLabel 20C Mutations\ subGroups view=newClades\ track nextstrainSamples20C\ nextstrainFreq20D 20D bigWig Nextstrain, 20D clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20D.bigWig\ longLabel Nextstrain, 20D clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 20D\ subGroups view=newClades\ track nextstrainFreq20D\ type bigWig\ visibility dense\ nextstrainSamples20D 20D Mutations vcfTabix Mutations in Clade 20D Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20D.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain20D.nh\ longLabel Mutations in Clade 20D Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades\ priority 50\ shortLabel 20D Mutations\ subGroups view=newClades\ track nextstrainSamples20D\ nextstrainFreq20E_EU1 20E/EU1 bigWig Nextstrain, 20E_EU1 clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20E_EU1.bigWig\ longLabel Nextstrain, 20E_EU1 clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 20E/EU1\ subGroups view=newClades\ track nextstrainFreq20E_EU1\ type bigWig\ visibility dense\ nextstrainSamples20E_EU1 20E/EU1 Mutations vcfTabix Mutations in Clade 20E/EU1 Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20E_EU1.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain20E_EU1.nh\ longLabel Mutations in Clade 20E/EU1 Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades\ priority 50\ shortLabel 20E/EU1 Mutations\ subGroups view=newClades\ track nextstrainSamples20E_EU1\ nextstrainFreq20F 20F bigWig Nextstrain, 20F clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20F.bigWig\ longLabel Nextstrain, 20F clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 20F\ subGroups view=newClades\ track nextstrainFreq20F\ type bigWig\ visibility dense\ nextstrainSamples20F 20F Mutations vcfTabix Mutations in Clade 20F Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20F.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain20F.nh\ longLabel Mutations in Clade 20F Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades\ priority 50\ shortLabel 20F Mutations\ subGroups view=newClades\ track nextstrainSamples20F\ nextstrainFreq20G 20G bigWig Nextstrain, 20G clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20G.bigWig\ longLabel Nextstrain, 20G clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 20G\ subGroups view=newClades\ track nextstrainFreq20G\ type bigWig\ visibility dense\ nextstrainSamples20G 20G Mutations vcfTabix Mutations in Clade 20G Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20G.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain20G.nh\ longLabel Mutations in Clade 20G Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades\ priority 50\ shortLabel 20G Mutations\ subGroups view=newClades\ track nextstrainSamples20G\ nextstrainFreq20H_Beta 20H/Beta bigWig Nextstrain, 20H/501Y.V2/Beta clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20H_Beta_V2.bigWig\ longLabel Nextstrain, 20H/501Y.V2/Beta clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 20H/Beta\ subGroups view=newClades\ track nextstrainFreq20H_Beta\ type bigWig\ visibility dense\ nextstrainSamples20H_Beta 20H/Beta Mutations vcfTabix Mutations in Clade 20H/501Y.V2/Beta Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20H_Beta_V2.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain20H_Beta_V2.nh\ longLabel Mutations in Clade 20H/501Y.V2/Beta Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 20H/Beta Mutations\ subGroups view=newClades\ track nextstrainSamples20H_Beta\ nextstrainFreq20I_Alpha 20I/Alpha bigWig Nextstrain, 20I/501Y.V1/Alpha clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20I_Alpha_V1.bigWig\ longLabel Nextstrain, 20I/501Y.V1/Alpha clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 20I/Alpha\ subGroups view=newClades\ track nextstrainFreq20I_Alpha\ type bigWig\ visibility dense\ nextstrainSamples20I_Alpha 20I/Alpha Mutations vcfTabix Mutations in Clade 20I/501Y.V1/Alpha Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20I_Alpha_V1.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain20I_Alpha_V1.nh\ longLabel Mutations in Clade 20I/501Y.V1/Alpha Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 20I/Alpha Mutations\ subGroups view=newClades\ track nextstrainSamples20I_Alpha\ nextstrainFreq20J_Gamma 20J/Gamma bigWig Nextstrain, 20J/501Y.V3/Gamma clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20J_Gamma_V3.bigWig\ longLabel Nextstrain, 20J/501Y.V3/Gamma clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 20J/Gamma\ subGroups view=newClades\ track nextstrainFreq20J_Gamma\ type bigWig\ visibility dense\ nextstrainSamples20J_Gamma 20J/Gamma Mutations vcfTabix Mutations in Clade 20J/501Y.V3/Gamma Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples20J_Gamma_V3.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain20J_Gamma_V3.nh\ longLabel Mutations in Clade 20J/501Y.V3/Gamma Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 20J/Gamma Mutations\ subGroups view=newClades\ track nextstrainSamples20J_Gamma\ nextstrainFreq21A_Delta 21A/Delta bigWig Nextstrain, 21A/Delta clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21A_Delta.bigWig\ longLabel Nextstrain, 21A/Delta clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21A/Delta\ subGroups view=newClades\ track nextstrainFreq21A_Delta\ type bigWig\ visibility dense\ nextstrainSamples21A_Delta 21A/Delta Mutations vcfTabix Mutations in Clade 21A/Delta Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21A_Delta.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21A_Delta.nh\ longLabel Mutations in Clade 21A/Delta Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21A/Delta Mutations\ subGroups view=newClades\ track nextstrainSamples21A_Delta\ nextstrainFreq21B_Kappa 21B/Kappa bigWig Nextstrain, 21B/Kappa clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21B_Kappa.bigWig\ longLabel Nextstrain, 21B/Kappa clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21B/Kappa\ subGroups view=newClades\ track nextstrainFreq21B_Kappa\ type bigWig\ visibility dense\ nextstrainSamples21B_Kappa 21B/Kappa Mutations vcfTabix Mutations in Clade 21B/Kappa Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21B_Kappa.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21B_Kappa.nh\ longLabel Mutations in Clade 21B/Kappa Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21B/Kappa Mutations\ subGroups view=newClades\ track nextstrainSamples21B_Kappa\ nextstrainFreq21C_Epsilon 21C/Epsilon bigWig Nextstrain, 21C/Epsilon clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21C_Epsilon.bigWig\ longLabel Nextstrain, 21C/Epsilon clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21C/Epsilon\ subGroups view=newClades\ track nextstrainFreq21C_Epsilon\ type bigWig\ visibility dense\ nextstrainSamples21C_Epsilon 21C/Epsilon Mutations vcfTabix Mutations in Clade 21C/Epsilon Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21C_Epsilon.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21C_Epsilon.nh\ longLabel Mutations in Clade 21C/Epsilon Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21C/Epsilon Mutations\ subGroups view=newClades\ track nextstrainSamples21C_Epsilon\ nextstrainFreq21D_Eta 21D/Eta bigWig Nextstrain, 21D/Eta clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21D_Eta.bigWig\ longLabel Nextstrain, 21D/Eta clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21D/Eta\ subGroups view=newClades\ track nextstrainFreq21D_Eta\ type bigWig\ visibility dense\ nextstrainSamples21D_Eta 21D/Eta Mutations vcfTabix Mutations in Clade 21D/Eta Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21D_Eta.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21D_Eta.nh\ longLabel Mutations in Clade 21D/Eta Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21D/Eta Mutations\ subGroups view=newClades\ track nextstrainSamples21D_Eta\ nextstrainFreq21E_Theta 21E/Theta bigWig Nextstrain, 21E/Theta clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21E_Theta.bigWig\ longLabel Nextstrain, 21E/Theta clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21E/Theta\ subGroups view=newClades\ track nextstrainFreq21E_Theta\ type bigWig\ visibility dense\ nextstrainSamples21E_Theta 21E/Theta Mutations vcfTabix Mutations in Clade 21E/Theta Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21E_Theta.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21E_Theta.nh\ longLabel Mutations in Clade 21E/Theta Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21E/Theta Mutations\ subGroups view=newClades\ track nextstrainSamples21E_Theta\ nextstrainFreq21F_Iota 21F/Iota bigWig Nextstrain, 21F/Iota clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21F_Iota.bigWig\ longLabel Nextstrain, 21F/Iota clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21F/Iota\ subGroups view=newClades\ track nextstrainFreq21F_Iota\ type bigWig\ visibility dense\ nextstrainSamples21F_Iota 21F/Iota Mutations vcfTabix Mutations in Clade 21F/Iota Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21F_Iota.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21F_Iota.nh\ longLabel Mutations in Clade 21F/Iota Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21F/Iota Mutations\ subGroups view=newClades\ track nextstrainSamples21F_Iota\ nextstrainFreq21G_Lambda 21G/Lambda bigWig Nextstrain, 21G/Lambda clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21G_Lambda.bigWig\ longLabel Nextstrain, 21G/Lambda clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21G/Lambda\ subGroups view=newClades\ track nextstrainFreq21G_Lambda\ type bigWig\ visibility dense\ nextstrainSamples21G_Lambda 21G/Lambda Mutations vcfTabix Mutations in Clade 21G/Lambda Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21G_Lambda.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21G_Lambda.nh\ longLabel Mutations in Clade 21G/Lambda Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21G/Lambda Mutations\ subGroups view=newClades\ track nextstrainSamples21G_Lambda\ nextstrainFreq21H_Mu 21H/Mu bigWig Nextstrain, 21H/Mu clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21H_Mu.bigWig\ longLabel Nextstrain, 21H/Mu clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21H/Mu\ subGroups view=newClades\ track nextstrainFreq21H_Mu\ type bigWig\ visibility dense\ nextstrainSamples21H_Mu 21H/Mu Mutations vcfTabix Mutations in Clade 21H/Mu Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21H_Mu.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21H_Mu.nh\ longLabel Mutations in Clade 21H/Mu Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21H/Mu Mutations\ subGroups view=newClades\ track nextstrainSamples21H_Mu\ nextstrainFreq21I_Delta 21I/Delta bigWig Nextstrain, 21I/Delta clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21I_Delta.bigWig\ longLabel Nextstrain, 21I/Delta clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21I/Delta\ subGroups view=newClades\ track nextstrainFreq21I_Delta\ type bigWig\ visibility dense\ nextstrainSamples21I_Delta 21I/Delta Mutations vcfTabix Mutations in Clade 21I/Delta Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21I_Delta.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21I_Delta.nh\ longLabel Mutations in Clade 21I/Delta Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21I/Delta Mutations\ subGroups view=newClades\ track nextstrainSamples21I_Delta\ nextstrainFreq21J_Delta 21J/Delta bigWig Nextstrain, 21J/Delta clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21J_Delta.bigWig\ longLabel Nextstrain, 21J/Delta clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21J/Delta\ subGroups view=newClades\ track nextstrainFreq21J_Delta\ type bigWig\ visibility dense\ nextstrainSamples21J_Delta 21J/Delta Mutations vcfTabix Mutations in Clade 21J/Delta Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21J_Delta.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21J_Delta.nh\ longLabel Mutations in Clade 21J/Delta Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21J/Delta Mutations\ subGroups view=newClades\ track nextstrainSamples21J_Delta\ nextstrainFreq21K_Omicron 21K/BA.1 bigWig Nextstrain, 21K/Omicron/BA.1 clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21K_Omicron.bigWig\ longLabel Nextstrain, 21K/Omicron/BA.1 clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21K/BA.1\ subGroups view=newClades\ track nextstrainFreq21K_Omicron\ type bigWig\ visibility dense\ nextstrainSamples21K_Omicron 21K/BA.1 Mutations vcfTabix Mutations in Clade 21K/Omicron/BA.1 Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21K_Omicron.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21K_Omicron.nh\ longLabel Mutations in Clade 21K/Omicron/BA.1 Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21K/BA.1 Mutations\ subGroups view=newClades\ track nextstrainSamples21K_Omicron\ nextstrainFreq21L_Omicron 21L/BA.2 bigWig Nextstrain, 21L/Omicron/BA.2 clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21L_Omicron.bigWig\ longLabel Nextstrain, 21L/Omicron/BA.2 clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21L/BA.2\ subGroups view=newClades\ track nextstrainFreq21L_Omicron\ type bigWig\ visibility dense\ nextstrainSamples21L_Omicron 21L/BA.2 Mutations vcfTabix Mutations in Clade 21L/Omicron/BA.2 Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21L_Omicron.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21L_Omicron.nh\ longLabel Mutations in Clade 21L/Omicron/BA.2 Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21L/BA.2 Mutations\ subGroups view=newClades\ track nextstrainSamples21L_Omicron\ nextstrainFreq21M_Omicron 21M/B.1.1.529 bigWig Nextstrain, 21M/Omicron/B.1.1.529 clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21M_Omicron.bigWig\ longLabel Nextstrain, 21M/Omicron/B.1.1.529 clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 21M/B.1.1.529\ subGroups view=newClades\ track nextstrainFreq21M_Omicron\ type bigWig\ visibility dense\ nextstrainSamples21M_Omicron 21M/B.1.1.529 Mutations vcfTabix Mutations in Clade 21M/Omicron/B.1.1.529 Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples21M_Omicron.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain21M_Omicron.nh\ longLabel Mutations in Clade 21M/Omicron/B.1.1.529 Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 21M/B.1.1.529 Mutations\ subGroups view=newClades\ track nextstrainSamples21M_Omicron\ nextstrainFreq22A_Omicron 22A/BA.4 bigWig Nextstrain, 22A/Omicron/BA.4 clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples22A_Omicron.bigWig\ longLabel Nextstrain, 22A/Omicron/BA.4 clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 22A/BA.4\ subGroups view=newClades\ track nextstrainFreq22A_Omicron\ type bigWig\ visibility dense\ nextstrainSamples22A_Omicron 22A/BA.4 Mutations vcfTabix Mutations in Clade 22A/Omicron/BA.4 Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples22A_Omicron.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain22A_Omicron.nh\ longLabel Mutations in Clade 22A/Omicron/BA.4 Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 22A/BA.4 Mutations\ subGroups view=newClades\ track nextstrainSamples22A_Omicron\ nextstrainFreq22B_Omicron 22B/BA.5 bigWig Nextstrain, 22B/Omicron/BA.5 clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples22B_Omicron.bigWig\ longLabel Nextstrain, 22B/Omicron/BA.5 clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 22B/BA.5\ subGroups view=newClades\ track nextstrainFreq22B_Omicron\ type bigWig\ visibility dense\ nextstrainSamples22B_Omicron 22B/BA.5 Mutations vcfTabix Mutations in Clade 22B/Omicron/BA.5 Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples22B_Omicron.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain22B_Omicron.nh\ longLabel Mutations in Clade 22B/Omicron/BA.5 Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 22B/BA.5 Mutations\ subGroups view=newClades\ track nextstrainSamples22B_Omicron\ nextstrainFreq22C_Omicron 22C/BA.2.12.1 bigWig Nextstrain, 22C/Omicron/BA.2.12.1 clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples22C_Omicron.bigWig\ longLabel Nextstrain, 22C/Omicron/BA.2.12.1 clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 22C/BA.2.12.1\ subGroups view=newClades\ track nextstrainFreq22C_Omicron\ type bigWig\ visibility dense\ nextstrainSamples22C_Omicron 22C/BA.2.12.1 Mutations vcfTabix Mutations in Clade 22C/Omicron/BA.2.12.1 Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples22C_Omicron.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain22C_Omicron.nh\ longLabel Mutations in Clade 22C/Omicron/BA.2.12.1 Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 22C/BA.2.12.1 Mutations\ subGroups view=newClades\ track nextstrainSamples22C_Omicron\ nextstrainFreq22D_Omicron 22D/BA.2.75 bigWig Nextstrain, 22D/Omicron/BA.2.75 clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples22D_Omicron.bigWig\ longLabel Nextstrain, 22D/Omicron/BA.2.75 clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 22D/BA.2.75\ subGroups view=newClades\ track nextstrainFreq22D_Omicron\ type bigWig\ visibility dense\ nextstrainSamples22D_Omicron 22D/BA.2.75 Mutations vcfTabix Mutations in Clade 22D/Omicron/BA.2.75 Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples22D_Omicron.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain22D_Omicron.nh\ longLabel Mutations in Clade 22D/Omicron/BA.2.75 Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 22D/BA.2.75 Mutations\ subGroups view=newClades\ track nextstrainSamples22D_Omicron\ nextstrainFreq22E_Omicron 22E/BQ.1 bigWig Nextstrain, 22E/Omicron/BQ.1 clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples22E_Omicron.bigWig\ longLabel Nextstrain, 22E/Omicron/BQ.1 clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 22E/BQ.1\ subGroups view=newClades\ track nextstrainFreq22E_Omicron\ type bigWig\ visibility dense\ nextstrainSamples22E_Omicron 22E/BQ.1 Mutations vcfTabix Mutations in Clade 22E/Omicron/BQ.1 Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples22E_Omicron.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain22E_Omicron.nh\ longLabel Mutations in Clade 22E/Omicron/BQ.1 Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 22E/BQ.1 Mutations\ subGroups view=newClades\ track nextstrainSamples22E_Omicron\ nextstrainFreq22F_Omicron 22F/XBB bigWig Nextstrain, 22F/Omicron/XBB clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples22F_Omicron.bigWig\ longLabel Nextstrain, 22F/Omicron/XBB clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 22F/XBB\ subGroups view=newClades\ track nextstrainFreq22F_Omicron\ type bigWig\ visibility dense\ nextstrainSamples22F_Omicron 22F/XBB Mutations vcfTabix Mutations in Clade 22F/Omicron/XBB Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples22F_Omicron.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain22F_Omicron.nh\ longLabel Mutations in Clade 22F/Omicron/XBB Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 22F/XBB Mutations\ subGroups view=newClades\ track nextstrainSamples22F_Omicron\ nextstrainFreq23A_Omicron 23A/XBB.1.5 bigWig Nextstrain, 23A/Omicron/XBB.1.5 clade: Alternate allele frequency 1 50 0 0 0 127 127 127 0 0 0 varRep 0 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples23A_Omicron.bigWig\ longLabel Nextstrain, 23A/Omicron/XBB.1.5 clade: Alternate allele frequency\ parent nextstrainFreqViewNewClades\ priority 50\ shortLabel 23A/XBB.1.5\ subGroups view=newClades\ track nextstrainFreq23A_Omicron\ type bigWig\ visibility dense\ nextstrainSamples23A_Omicron 23A/XBB.1.5 Mutations vcfTabix Mutations in Clade 23A/Omicron/XBB.1.5 Nextstrain Subset of GISAID EpiCoV TM Samples 0 50 0 0 0 127 127 127 0 0 0 varRep 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainSamples23A_Omicron.vcf.gz\ hapClusterHeight 300\ hapClusterMethod treeFile /gbdb/wuhCor1/nextstrain/nextstrain23A_Omicron.nh\ longLabel Mutations in Clade 23A/Omicron/XBB.1.5 Nextstrain Subset of GISAID EpiCoV TM Samples\ parent nextstrainSamplesViewNewClades on\ priority 50\ shortLabel 23A/XBB.1.5 Mutations\ subGroups view=newClades\ track nextstrainSamples23A_Omicron\ galaxyEnaQ2Ay-122 AY.122 mutations bigBed 8 + Mutations (amino acid level) in AY.122 between 2021-07-05 and 2021-10-05 1 50 60 60 60 157 157 157 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

    \
  • Sample collection date
  • \
  • Sequencing platform, library layout and strategy (currently reanalysis is done for ampliconic paired-end Illumina and ONT data)
  • \
  • the primer scheme used for the generation of amplicons (this information is used to trim primer sequences from the data before variant calling; reanalysis can be done for any primer scheme with publicly available primer binding site information)
  • \
  • some kind of discernible batch information (e.g. a library identifier) that can be used to form batches of samples for reanalysis and batch-level reporting
  • \
\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

    \
  • the precise value for its observed frequency in the lineage and quarter
  • \
  • the intra-sample allele frequency (median and lower/upper quartile) at which the mutation has been called in the samples in which it has been detected.
  • \
  • the collection date and the collecting country of the sample, in which this mutation was first (ever) detected in the context of the lineage. Note that for older, still circulating lineages the collection date of that sample can be older than the start of the earliest quarter displayed in the genome browser (since our complete surveillance data goes back further than four quarters).
  • \
\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

    \
  • country or combination of countries in which samples of the given lineage and collection quarter with the mutation have been collected. You could for example filter all current quarter lineage tracks to show only mutations that have been found (in their respective lineage) in the UK.
  • \
  • within-lineage frequency. By default only mutations are shown that have been observed in at least 5% of the samples assigned to the given lineage in the given quarter (0.05 default filter setting). You can lower or increase that threshold as you see fit. Note however, that the underlying bigbed data of the tracks is filtered to contain only data for mutations above a threshold of 0.1% (i.e. a 0.001 hard filter is always in effect).
  • \
\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q2/04_AY.122_data.bb\ color 60,60,60\ html galaxyEna\ longLabel Mutations (amino acid level) in AY.122 between 2021-07-05 and 2021-10-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q2_tracks on\ priority 50\ shortLabel AY.122 mutations\ spectrum on\ track galaxyEnaQ2Ay-122\ type bigBed 8 +\ galaxyEnaQ1Ay-4-2-2 AY.4.2.2 mutations bigBed 8 + Mutations (amino acid level) in AY.4.2.2 between 2021-10-05 and 2022-01-05 1 50 177 64 13 216 159 134 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

    \
  • Sample collection date
  • \
  • Sequencing platform, library layout and strategy (currently reanalysis is done for ampliconic paired-end Illumina and ONT data)
  • \
  • the primer scheme used for the generation of amplicons (this information is used to trim primer sequences from the data before variant calling; reanalysis can be done for any primer scheme with publicly available primer binding site information)
  • \
  • some kind of discernible batch information (e.g. a library identifier) that can be used to form batches of samples for reanalysis and batch-level reporting
  • \
\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

    \
  • the precise value for its observed frequency in the lineage and quarter
  • \
  • the intra-sample allele frequency (median and lower/upper quartile) at which the mutation has been called in the samples in which it has been detected.
  • \
  • the collection date and the collecting country of the sample, in which this mutation was first (ever) detected in the context of the lineage. Note that for older, still circulating lineages the collection date of that sample can be older than the start of the earliest quarter displayed in the genome browser (since our complete surveillance data goes back further than four quarters).
  • \
\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

    \
  • country or combination of countries in which samples of the given lineage and collection quarter with the mutation have been collected. You could for example filter all current quarter lineage tracks to show only mutations that have been found (in their respective lineage) in the UK.
  • \
  • within-lineage frequency. By default only mutations are shown that have been observed in at least 5% of the samples assigned to the given lineage in the given quarter (0.05 default filter setting). You can lower or increase that threshold as you see fit. Note however, that the underlying bigbed data of the tracks is filtered to contain only data for mutations above a threshold of 0.1% (i.e. a 0.001 hard filter is always in effect).
  • \
\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q1/04_AY.4.2.2_data.bb\ color 177,64,13\ html galaxyEna\ longLabel Mutations (amino acid level) in AY.4.2.2 between 2021-10-05 and 2022-01-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q1_tracks on\ priority 50\ shortLabel AY.4.2.2 mutations\ spectrum on\ track galaxyEnaQ1Ay-4-2-2\ type bigBed 8 +\ galaxyEnaQ3Ay-9 AY.9 mutations bigBed 8 + Mutations (amino acid level) in AY.9 between 2021-04-05 and 2021-07-05 1 50 89 30 113 172 142 184 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

    \
  • Sample collection date
  • \
  • Sequencing platform, library layout and strategy (currently reanalysis is done for ampliconic paired-end Illumina and ONT data)
  • \
  • the primer scheme used for the generation of amplicons (this information is used to trim primer sequences from the data before variant calling; reanalysis can be done for any primer scheme with publicly available primer binding site information)
  • \
  • some kind of discernible batch information (e.g. a library identifier) that can be used to form batches of samples for reanalysis and batch-level reporting
  • \
\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

    \
  • the precise value for its observed frequency in the lineage and quarter
  • \
  • the intra-sample allele frequency (median and lower/upper quartile) at which the mutation has been called in the samples in which it has been detected.
  • \
  • the collection date and the collecting country of the sample, in which this mutation was first (ever) detected in the context of the lineage. Note that for older, still circulating lineages the collection date of that sample can be older than the start of the earliest quarter displayed in the genome browser (since our complete surveillance data goes back further than four quarters).
  • \
\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

    \
  • country or combination of countries in which samples of the given lineage and collection quarter with the mutation have been collected. You could for example filter all current quarter lineage tracks to show only mutations that have been found (in their respective lineage) in the UK.
  • \
  • within-lineage frequency. By default only mutations are shown that have been observed in at least 5% of the samples assigned to the given lineage in the given quarter (0.05 default filter setting). You can lower or increase that threshold as you see fit. Note however, that the underlying bigbed data of the tracks is filtered to contain only data for mutations above a threshold of 0.1% (i.e. a 0.001 hard filter is always in effect).
  • \
\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q3/04_AY.9_data.bb\ color 89,30,113\ html galaxyEna\ longLabel Mutations (amino acid level) in AY.9 between 2021-04-05 and 2021-07-05\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q3_tracks on\ priority 50\ shortLabel AY.9 mutations\ spectrum on\ track galaxyEnaQ3Ay-9\ type bigBed 8 +\ galaxyEnaQ0Ba-1-1-11 BA.1.1.11 mutations bigBed 8 + Mutations (amino acid level) in BA.1.1.11 between 2022-01-05 and 2022-03-09 3 50 89 47 13 172 151 134 1 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

    \
  • Sample collection date
  • \
  • Sequencing platform, library layout and strategy (currently reanalysis is done for ampliconic paired-end Illumina and ONT data)
  • \
  • the primer scheme used for the generation of amplicons (this information is used to trim primer sequences from the data before variant calling; reanalysis can be done for any primer scheme with publicly available primer binding site information)
  • \
  • some kind of discernible batch information (e.g. a library identifier) that can be used to form batches of samples for reanalysis and batch-level reporting
  • \
\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

    \
  • the precise value for its observed frequency in the lineage and quarter
  • \
  • the intra-sample allele frequency (median and lower/upper quartile) at which the mutation has been called in the samples in which it has been detected.
  • \
  • the collection date and the collecting country of the sample, in which this mutation was first (ever) detected in the context of the lineage. Note that for older, still circulating lineages the collection date of that sample can be older than the start of the earliest quarter displayed in the genome browser (since our complete surveillance data goes back further than four quarters).
  • \
\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

    \
  • country or combination of countries in which samples of the given lineage and collection quarter with the mutation have been collected. You could for example filter all current quarter lineage tracks to show only mutations that have been found (in their respective lineage) in the UK.
  • \
  • within-lineage frequency. By default only mutations are shown that have been observed in at least 5% of the samples assigned to the given lineage in the given quarter (0.05 default filter setting). You can lower or increase that threshold as you see fit. Note however, that the underlying bigbed data of the tracks is filtered to contain only data for mutations above a threshold of 0.1% (i.e. a 0.001 hard filter is always in effect).
  • \
\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 bigDataUrl /gbdb/wuhCor1/galaxyEna/data_Q0/04_BA.1.1.11_data.bb\ color 89,47,13\ html galaxyEna\ longLabel Mutations (amino acid level) in BA.1.1.11 between 2022-01-05 and 2022-03-09\ mouseOver $gene:$name | Nuc change: $nucChange | $lineage Frequency (this quarter): $withinLineageFrequency | Intrasample AF (this quarter): $medianAF ($q25AF - $q75AF) | First observed (ever): $earliestDateseen ($earliestCountryseen)\ parent Q0_tracks on\ priority 50\ shortLabel BA.1.1.11 mutations\ spectrum on\ track galaxyEnaQ0Ba-1-1-11\ type bigBed 8 +\ igm_COVID_605 COVID 605 bigBed 9 COVID 605 1 50 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_605.bb\ longLabel COVID 605\ parent igm on\ priority 50\ shortLabel COVID 605\ track igm_COVID_605\ type bigBed 9\ igg_COVID_727 COVID 727 bigBed 9 COVID 727 1 50 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_727.bb\ longLabel COVID 727\ parent igg on\ priority 50\ shortLabel COVID 727\ track igg_COVID_727\ type bigBed 9\ variantNucMuts_XBB Omicron XBB Nuc Muts bigBed 4 Omicron VOC (XBB) nucleotide mutations identifed from GISAID sequences (Sep 2023) 1 50 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=XBB varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/XBB_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (XBB) nucleotide mutations identifed from GISAID sequences (Sep 2023)\ parent variantMuts off\ priority 128\ shortLabel Omicron XBB Nuc Muts\ subGroups variant=Q_XBB mutation=NUC designation=VUM\ track variantNucMuts_XBB\ url https://outbreak.info/situation-reports?pango=XBB\ urlLabel XBB Situation Report at outbreak.info\ igg_COVID_11 COVID 11 bigBed 9 COVID 11 1 51 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_11.bb\ longLabel COVID 11\ parent igg on\ priority 51\ shortLabel COVID 11\ track igg_COVID_11\ type bigBed 9\ igm_COVID_730 COVID 730 bigBed 9 COVID 730 1 51 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_730.bb\ longLabel COVID 730\ parent igm on\ priority 51\ shortLabel COVID 730\ track igm_COVID_730\ type bigBed 9\ variantAaMuts_CH_1_1 Omicron CH.1.1 AA Muts bigBed 4 Omicron CH.1.1 amino acid mutations from GISAID sequences (Sep 22, 2023) 1 51 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=CH.1.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/CH.1.1_prot.bb\ color 219,40,35\ longLabel Omicron CH.1.1 amino acid mutations from GISAID sequences (Sep 22, 2023)\ parent variantMuts off\ priority 21\ shortLabel Omicron CH.1.1 AA Muts\ subGroups variant=T_CH11 mutation=AA designation=VUM\ track variantAaMuts_CH_1_1\ url https://outbreak.info/situation-reports?pango=CH.1.1\ urlLabel CH.1.1 Situation Report at outbreak.info\ igg_COVID_12 COVID 12 bigBed 9 COVID 12 1 52 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_12.bb\ longLabel COVID 12\ parent igg on\ priority 52\ shortLabel COVID 12\ track igg_COVID_12\ type bigBed 9\ igm_COVID_727 COVID 727 bigBed 9 COVID 727 1 52 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_727.bb\ longLabel COVID 727\ parent igm on\ priority 52\ shortLabel COVID 727\ track igm_COVID_727\ type bigBed 9\ variantNucMuts_CH_1_1 Omicron CH.1.1 Nuc Muts bigBed 4 Omicron VOC (CH.1.1) nucleotide mutations identifed from GISAID sequences (Sep 2023) 1 52 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=CH.1.1 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/CH.1.1_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (CH.1.1) nucleotide mutations identifed from GISAID sequences (Sep 2023)\ parent variantMuts off\ priority 131\ shortLabel Omicron CH.1.1 Nuc Muts\ subGroups variant=T_CH11 mutation=NUC designation=VUM\ track variantNucMuts_CH_1_1\ url https://outbreak.info/situation-reports?pango=CH.1.1\ urlLabel CH.1.1 Situation Report at outbreak.info\ igg_COVID_510 COVID 510 bigBed 9 COVID 510 1 53 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_510.bb\ longLabel COVID 510\ parent igg on\ priority 53\ shortLabel COVID 510\ track igg_COVID_510\ type bigBed 9\ igm_COVID_533 COVID 533 bigBed 9 COVID 533 1 53 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_533.bb\ longLabel COVID 533\ parent igm on\ priority 53\ shortLabel COVID 533\ track igm_COVID_533\ type bigBed 9\ variantAaMuts_XBB_1_9 Omicron XBB.1.9 AA Muts bigBed 4 Omicron XBB.1.9 amino acid mutations from GISAID sequences (Sep 22, 2023) 1 53 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=XBB.1.9 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/XBB.1.9_prot.bb\ color 219,40,35\ longLabel Omicron XBB.1.9 amino acid mutations from GISAID sequences (Sep 22, 2023)\ parent variantMuts off\ priority 22\ shortLabel Omicron XBB.1.9 AA Muts\ subGroups variant=U_XBB19 mutation=AA designation=VUM\ track variantAaMuts_XBB_1_9\ url https://outbreak.info/situation-reports?pango=XBB.1.9\ urlLabel XBB.1.9 Situation Report at outbreak.info\ igm_COVID_12 COVID 12 bigBed 9 COVID 12 1 54 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_12.bb\ longLabel COVID 12\ parent igm on\ priority 54\ shortLabel COVID 12\ track igm_COVID_12\ type bigBed 9\ igg_COVID_522 COVID 522 bigBed 9 COVID 522 1 54 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_522.bb\ longLabel COVID 522\ parent igg on\ priority 54\ shortLabel COVID 522\ track igg_COVID_522\ type bigBed 9\ variantNucMuts_XBB_1_9 Omicron XBB.1.9 Nuc Muts bigBed 4 Omicron VOC (XBB.1.9) nucleotide mutations identifed from GISAID sequences (Sep 2023) 1 54 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=XBB.1.9 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/XBB.1.9_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (XBB.1.9) nucleotide mutations identifed from GISAID sequences (Sep 2023)\ parent variantMuts off\ priority 132\ shortLabel Omicron XBB.1.9 Nuc Muts\ subGroups variant=U_XBB19 mutation=NUC designation=VUM\ track variantNucMuts_XBB_1_9\ url https://outbreak.info/situation-reports?pango=XBB.1.9\ urlLabel XBB.1.9 Situation Report at outbreak.info\ igg_COVID_530 COVID 530 bigBed 9 COVID 530 1 55 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_530.bb\ longLabel COVID 530\ parent igg on\ priority 55\ shortLabel COVID 530\ track igg_COVID_530\ type bigBed 9\ igm_COVID_610 COVID 610 bigBed 9 COVID 610 1 55 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_610.bb\ longLabel COVID 610\ parent igm on\ priority 55\ shortLabel COVID 610\ track igm_COVID_610\ type bigBed 9\ variantAaMuts_XBB_2_3 Omicron XBB.2.3 AA Muts bigBed 4 Omicron XBB.2.3 amino acid mutations from GISAID sequences (Sep 22, 2023) 1 55 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=XBB.2.3 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/XBB.2.3_prot.bb\ color 219,40,35\ longLabel Omicron XBB.2.3 amino acid mutations from GISAID sequences (Sep 22, 2023)\ parent variantMuts off\ priority 23\ shortLabel Omicron XBB.2.3 AA Muts\ subGroups variant=V_XBB23 mutation=AA designation=VUM\ track variantAaMuts_XBB_2_3\ url https://outbreak.info/situation-reports?pango=XBB.2.3\ urlLabel XBB.2.3 Situation Report at outbreak.info\ igm_COVID_4371 COVID 4371 bigBed 9 COVID 4371 1 56 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_4371.bb\ longLabel COVID 4371\ parent igm on\ priority 56\ shortLabel COVID 4371\ track igm_COVID_4371\ type bigBed 9\ igg_COVID_535 COVID 535 bigBed 9 COVID 535 1 56 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_535.bb\ longLabel COVID 535\ parent igg on\ priority 56\ shortLabel COVID 535\ track igg_COVID_535\ type bigBed 9\ variantNucMuts_XBB_2_3 Omicron XBB.2.3 Nuc Muts bigBed 4 Omicron VOC (XBB.2.3) nucleotide mutations identifed from GISAID sequences (Sep 2023) 1 56 219 40 35 237 147 145 0 0 0 https://outbreak.info/situation-reports?pango=XBB.2.3 varRep 1 bigDataUrl /gbdb/wuhCor1/strainMuts/XBB.2.3_nuc.bb\ color 219,40,35\ longLabel Omicron VOC (XBB.2.3) nucleotide mutations identifed from GISAID sequences (Sep 2023)\ parent variantMuts off\ priority 133\ shortLabel Omicron XBB.2.3 Nuc Muts\ subGroups variant=V_XBB23 mutation=NUC designation=VUM\ track variantNucMuts_XBB_2_3\ url https://outbreak.info/situation-reports?pango=XBB.2.3\ urlLabel XBB.2.3 Situation Report at outbreak.info\ igm_COVID_407 COVID 407 bigBed 9 COVID 407 1 57 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_407.bb\ longLabel COVID 407\ parent igm on\ priority 57\ shortLabel COVID 407\ track igm_COVID_407\ type bigBed 9\ igg_COVID_432 COVID 432 bigBed 9 COVID 432 1 57 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_432.bb\ longLabel COVID 432\ parent igg on\ priority 57\ shortLabel COVID 432\ track igg_COVID_432\ type bigBed 9\ igg_COVID_531 COVID 531 bigBed 9 COVID 531 1 58 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_531.bb\ longLabel COVID 531\ parent igg on\ priority 58\ shortLabel COVID 531\ track igg_COVID_531\ type bigBed 9\ igm_COVID_607 COVID 607 bigBed 9 COVID 607 1 58 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_607.bb\ longLabel COVID 607\ parent igm on\ priority 58\ shortLabel COVID 607\ track igm_COVID_607\ type bigBed 9\ igm_COVID_510 COVID 510 bigBed 9 COVID 510 1 59 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_510.bb\ longLabel COVID 510\ parent igm on\ priority 59\ shortLabel COVID 510\ track igm_COVID_510\ type bigBed 9\ igg_COVID_529 COVID 529 bigBed 9 COVID 529 1 59 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_529.bb\ longLabel COVID 529\ parent igg on\ priority 59\ shortLabel COVID 529\ track igg_COVID_529\ type bigBed 9\ igg_COVID_4371 COVID 4371 bigBed 9 COVID 4371 1 60 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_4371.bb\ longLabel COVID 4371\ parent igg on\ priority 60\ shortLabel COVID 4371\ track igg_COVID_4371\ type bigBed 9\ igm_COVID_5222 COVID 5222 bigBed 9 COVID 5222 1 60 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_5222.bb\ longLabel COVID 5222\ parent igm on\ priority 60\ shortLabel COVID 5222\ track igm_COVID_5222\ type bigBed 9\ igg_COVID_416 COVID 416 bigBed 9 COVID 416 1 61 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_416.bb\ longLabel COVID 416\ parent igg on\ priority 61\ shortLabel COVID 416\ track igg_COVID_416\ type bigBed 9\ igm_COVID_535 COVID 535 bigBed 9 COVID 535 1 61 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_535.bb\ longLabel COVID 535\ parent igm on\ priority 61\ shortLabel COVID 535\ track igm_COVID_535\ type bigBed 9\ igm_COVID_501 COVID 501 bigBed 9 COVID 501 1 62 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_501.bb\ longLabel COVID 501\ parent igm on\ priority 62\ shortLabel COVID 501\ track igm_COVID_501\ type bigBed 9\ igg_COVID_5222 COVID 5222 bigBed 9 COVID 5222 1 62 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_5222.bb\ longLabel COVID 5222\ parent igg on\ priority 62\ shortLabel COVID 5222\ track igg_COVID_5222\ type bigBed 9\ igm_COVID_509 COVID 509 bigBed 9 COVID 509 1 63 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_509.bb\ longLabel COVID 509\ parent igm on\ priority 63\ shortLabel COVID 509\ track igm_COVID_509\ type bigBed 9\ igg_COVID_520 COVID 520 bigBed 9 COVID 520 1 63 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_520.bb\ longLabel COVID 520\ parent igg on\ priority 63\ shortLabel COVID 520\ track igg_COVID_520\ type bigBed 9\ igm_COVID_410 COVID 410 bigBed 9 COVID 410 1 64 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_410.bb\ longLabel COVID 410\ parent igm on\ priority 64\ shortLabel COVID 410\ track igm_COVID_410\ type bigBed 9\ igg_COVID_5333 COVID 5333 bigBed 9 COVID 5333 1 64 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_5333.bb\ longLabel COVID 5333\ parent igg on\ priority 64\ shortLabel COVID 5333\ track igg_COVID_5333\ type bigBed 9\ igg_COVID_14 COVID 14 bigBed 9 COVID 14 1 65 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_14.bb\ longLabel COVID 14\ parent igg on\ priority 65\ shortLabel COVID 14\ track igg_COVID_14\ type bigBed 9\ igm_COVID_401 COVID 401 bigBed 9 COVID 401 1 65 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_401.bb\ longLabel COVID 401\ parent igm on\ priority 65\ shortLabel COVID 401\ track igm_COVID_401\ type bigBed 9\ igm_COVID_522 COVID 522 bigBed 9 COVID 522 1 66 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_522.bb\ longLabel COVID 522\ parent igm on\ priority 66\ shortLabel COVID 522\ track igm_COVID_522\ type bigBed 9\ igg_COVID_730 COVID 730 bigBed 9 COVID 730 1 66 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_730.bb\ longLabel COVID 730\ parent igg on\ priority 66\ shortLabel COVID 730\ track igg_COVID_730\ type bigBed 9\ igm_COVID_405 COVID 405 bigBed 9 COVID 405 1 67 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_405.bb\ longLabel COVID 405\ parent igm on\ priority 67\ shortLabel COVID 405\ track igm_COVID_405\ type bigBed 9\ igg_COVID_610 COVID 610 bigBed 9 COVID 610 1 67 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_610.bb\ longLabel COVID 610\ parent igg on\ priority 67\ shortLabel COVID 610\ track igg_COVID_610\ type bigBed 9\ igg_COVID_407 COVID 407 bigBed 9 COVID 407 1 68 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_407.bb\ longLabel COVID 407\ parent igg on\ priority 68\ shortLabel COVID 407\ track igg_COVID_407\ type bigBed 9\ igm_COVID_429 COVID 429 bigBed 9 COVID 429 1 68 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_429.bb\ longLabel COVID 429\ parent igm on\ priority 68\ shortLabel COVID 429\ track igm_COVID_429\ type bigBed 9\ igm_COVID_432 COVID 432 bigBed 9 COVID 432 1 69 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_432.bb\ longLabel COVID 432\ parent igm on\ priority 69\ shortLabel COVID 432\ track igm_COVID_432\ type bigBed 9\ igg_COVID_532 COVID 532 bigBed 9 COVID 532 1 69 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_532.bb\ longLabel COVID 532\ parent igg on\ priority 69\ shortLabel COVID 532\ track igg_COVID_532\ type bigBed 9\ igg_COVID_419 COVID 419 bigBed 9 COVID 419 1 70 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_419.bb\ longLabel COVID 419\ parent igg on\ priority 70\ shortLabel COVID 419\ track igg_COVID_419\ type bigBed 9\ igm_COVID_530 COVID 530 bigBed 9 COVID 530 1 70 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_530.bb\ longLabel COVID 530\ parent igm on\ priority 70\ shortLabel COVID 530\ track igm_COVID_530\ type bigBed 9\ igm_COVID_436 COVID 436 bigBed 9 COVID 436 1 71 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_436.bb\ longLabel COVID 436\ parent igm on\ priority 71\ shortLabel COVID 436\ track igm_COVID_436\ type bigBed 9\ igg_COVID_501 COVID 501 bigBed 9 COVID 501 1 71 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_501.bb\ longLabel COVID 501\ parent igg on\ priority 71\ shortLabel COVID 501\ track igg_COVID_501\ type bigBed 9\ igg_COVID_401 COVID 401 bigBed 9 COVID 401 1 72 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_401.bb\ longLabel COVID 401\ parent igg on\ priority 72\ shortLabel COVID 401\ track igg_COVID_401\ type bigBed 9\ igm_COVID_4022 COVID 4022 bigBed 9 COVID 4022 1 72 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_4022.bb\ longLabel COVID 4022\ parent igm on\ priority 72\ shortLabel COVID 4022\ track igm_COVID_4022\ type bigBed 9\ addgene Addgene Plasmids bigBed 6 + Addgene Plasmid Sequences alignable to the Genome 0 100 0 97 168 127 176 211 0 0 0 https://www.addgene.org/$$

Description

\

\

This track shows sequences contained in plasmids that can be ordered through\ Addgene. Many plasmids\ containing SARS-CoV-2 sequences are now available and more are currently\ undergoing quality control analysis. Some of these plasmids are described in\ preprints, published articles, and summaries. For a more detailed description\ of some of these collections, please see the links to Addgene's webpage for\ each plasmid.

Some of the following plasmids are available to industry,\ in addition to academics and nonprofits. For more information on ordering, the\ link to each plasmid's homepage has been provided.

\ \

Display Conventions and Configuration

\ \

Shown on the genome are regions that can be aligned to a plasmid. The \ feature label is the AddGene identifier. A click on a feature shows a \ brief description and link to the full webpage for ordering them.

\ \

Methods

\ \

Genbank files were downloaded from Addgene and were parsed to\ extract relevant data fields and the plasmid nucleotide sequence. The reference\ sequence, SARS-CoV-2 (wuhCor1 - NC_045512v2), was then downloaded from UCSC\ browser and a reference database for blast alignment was created using\ makeblastdb. The viral region of each plasmid was then aligned using blastn.\ Alignments exceeding an expected value of 0.00001 were considered. The aligned\ sequences were then converted to bed format and the relevant Genbank data field\ were added. Finally, bedToBigBed was used to create the BigBed track.

\ \

Credits

\ Thanks to Nathan Mauldin and Jason Fernandes for making this track.\ \

References

\

References can be found on the specific plasmid's webpage. Each plasmid is hosted by https://www.addgene.org.\ map 1 bigDataUrl /gbdb/wuhCor1/bbi/addgene.bb\ color 0,97,168\ exonArrows off\ group map\ longLabel Addgene Plasmid Sequences alignable to the Genome\ noScoreFilter on\ shortLabel Addgene Plasmids\ skipFields addgeneLink\ track addgene\ type bigBed 6 +\ url https://www.addgene.org/$$\ urlLabel Link to the Addgene plasmid page\ visibility hide\ nextstrainFreqViewAll All Samples bigWig Nextstrain Mutations Alternate Allele Frequency 3 100 0 0 0 127 127 127 0 0 0 varRep 0 longLabel Nextstrain Mutations Alternate Allele Frequency\ parent nextstrainFreq\ shortLabel All Samples\ track nextstrainFreqViewAll\ view all\ visibility pack\ gold Assembly bed 3 + Assembly from Fragments 0 100 150 100 30 230 170 40 0 0 0

Description

\

\ This track shows the sequences used in the Jan. 2020 sars-cov-2 genome assembly.\

\

\ Genome assembly procedures are covered in the NCBI\ assembly documentation.
\ NCBI also provides\ specific information about this assembly.\

\

\ The definition of this assembly is from the\ AGP file delivered with the sequence. The NCBI document\ AGP Specification describes the format of the AGP file.\

\

\ In dense mode, this track depicts the contigs that make up the\ currently viewed scaffold.\ Contig boundaries are distinguished by the use of alternating gold and brown\ coloration. Where gaps\ exist between contigs, spaces are shown between the gold and brown\ blocks. The relative order and orientation of the contigs\ within a scaffold is always known; therefore, a line is drawn in the graphical\ display to bridge the blocks.

\

\ Component types found in this track (with counts of that type in parentheses):\

    \
  • D - draft sequence (1)
  • \

\ map 1 altColor 230,170,40\ color 150,100,30\ group map\ html gold\ longLabel Assembly from Fragments\ shortLabel Assembly\ track gold\ type bed 3 +\ visibility hide\ augustusGene AUGUSTUS genePred AUGUSTUS ab initio gene predictions v3.1 0 100 12 105 0 133 180 127 0 0 0

Description

\ \

\ This track shows ab initio predictions from the program\ AUGUSTUS (version 3.1).\ The predictions are based on the genome sequence alone.\

\ \

\ For more information on the different gene tracks, see our Genes FAQ.

\ \

Methods

\ \

\ Statistical signal models were built for splice sites, branch-point\ patterns, translation start sites, and the poly-A signal.\ Furthermore, models were built for the sequence content of\ protein-coding and non-coding regions as well as for the length distributions\ of different exon and intron types. Detailed descriptions of most of these different models\ can be found in Mario Stanke's\ dissertation.\ This track shows the most likely gene structure according to a\ Semi-Markov Conditional Random Field model.\ Alternative splicing transcripts were obtained with\ a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2\ --minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2).\

\ \

\ The different models used by Augustus were trained on a number of different species-specific\ gene sets, which included 1000-2000 training gene structures. The --species option allows\ one to choose the species used for training the models. Different training species were used\ for the --species option when generating these predictions for different groups of\ assemblies.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Assembly GroupTraining Species
Fishzebrafish\ \
Birdschicken\ \
Human and all other vertebrateshuman\ \
Nematodescaenorhabditis
Drosophilafly
A. melliferahoneybee1
A. gambiaeculex
S. cerevisiaesaccharomyces
\

\ This table describes which training species was used for a particular group of assemblies.\ When available, the closest related training species was used.\

\ \

Credits

\ \ Thanks to the\ Stanke lab\ for providing the AUGUSTUS program. The training for the chicken version was\ done by Stefanie König and the training for the\ human and zebrafish versions was done by Mario Stanke.\ \

References

\ \

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ \ Using native and syntenically mapped cDNA alignments to improve de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.\ PMID: 18218656\

\ \

\ Stanke M, Waack S.\ \ Gene prediction with a hidden Markov model and a new intron submodel.\ Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25.\ PMID: 14534192\

\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 12,105,0\ group genes\ longLabel AUGUSTUS ab initio gene predictions v3.1\ shortLabel AUGUSTUS\ track augustusGene\ type genePred\ visibility hide\ strainName44way Bat CoV multiz wigMaf 0.0 1.0 Multiz Alignment of 44 strains with bats as hosts: red=nonsyn green=syn blue=noncod yellow=noalign 3 100 0 10 100 0 90 10 0 0 0 compGeno 1 altColor 0,90,10\ color 0, 10, 100\ frames strainName44wayFrames\ group compGeno\ itemFirstCharCase noChange\ longLabel Multiz Alignment of 44 strains with bats as hosts: red=nonsyn green=syn blue=noncod yellow=noalign\ mafDot on\ mafShowSnp on\ noInherit on\ parent strainCons44wayViewalign on\ priority 100\ sGroup_CoV_2019 Bat_CoV_RaTG13 Bat_SARS_CoV_HKU3-1 Bat_SARS_CoV_HKU3-12 Bat_SARS_CoV_HKU3-7 Bat_SARS-like_CoV_bat-SL-CoVZC45 Bat_SARS-like_CoV_bat-SL-CoVZXC21 Bat_SARS_CoV_Rp3 BtRs-BetaCoV/GX2013 Bat_SARS_CoV_Rs672/2006 Bat_CoV_Anlong-103 Bat_SARS-like_CoV_WIV1 Bat_SARS-like_CoV_Rs4084 Bat_SARS-like_CoV_Rs7327 Bat_SARS-like_CoV_Rs9401 Bat_SARS-like_CoV_As6526 CoV_BtRs-BetaCoV/YN2018C CoV_BtRs-BetaCoV/YN2018B CoV_BtRs-BetaCoV/YN2018D SARS-like_CoV_WIV16 Bat_SARS-like_CoV_Rs4081 Bat_SARS-like_CoV_Rs4255 Bat_SARS-like_CoV_Rs4231 Bat_SARS-like_CoV_Rs4237 Bat_SARS-like_CoV_Rs4247 BtRs-BetaCoV/YN2013 Bat_SARS-like_CoV_Rf4092 CoV_BtRs-BetaCoV/YN2018A SARS_CoV Bat_SARS-like_CoV_YNLF_31C CoV_BtRl-BetaCoV/SC2018 Rhinolophus_affinis_CoV_LYRa11 UNVERIFIED:_SARS-related_CoV_F46 Bat_CoV_Cp/Yunnan2011 Bat_SARS_CoV_Rf1 Bat_CoV_(BtCoV/273/2005) BtRf-BetaCoV/HeB2013 Bat_CoV_Jiyuan-84 Bat_CoV_strain_16BO133 Bat_SARS_CoV_Rm1 BtRs-BetaCoV/HuB2013 Bat_CoV_Rp/Shaanxi2011 SARS-related_CoV_strain_BtKY72 Bat_CoV_BM48-31/BGR/2008\ shortLabel Bat CoV multiz\ snpTable mafSnpStrainName44way\ speciesCodonDefault wuhCor1\ speciesDefaultOn Bat_CoV_RaTG13 Bat_SARS_CoV_HKU3-1 Bat_SARS_CoV_HKU3-12 Bat_SARS_CoV_HKU3-7 Bat_SARS-like_CoV_bat-SL-CoVZC45 Bat_SARS-like_CoV_bat-SL-CoVZXC21 Bat_SARS_CoV_Rp3 BtRs-BetaCoV/GX2013 Bat_SARS_CoV_Rs672/2006 Bat_CoV_Anlong-103 Bat_SARS-like_CoV_WIV1 Bat_SARS-like_CoV_Rs4084 Bat_SARS-like_CoV_Rs7327 Bat_SARS-like_CoV_Rs9401 Bat_SARS-like_CoV_As6526 CoV_BtRs-BetaCoV/YN2018C CoV_BtRs-BetaCoV/YN2018B CoV_BtRs-BetaCoV/YN2018D SARS-like_CoV_WIV16 Bat_SARS-like_CoV_Rs4081 Bat_SARS-like_CoV_Rs4255 Bat_SARS-like_CoV_Rs4231 Bat_SARS-like_CoV_Rs4237 Bat_SARS-like_CoV_Rs4247 BtRs-BetaCoV/YN2013 Bat_SARS-like_CoV_Rf4092 CoV_BtRs-BetaCoV/YN2018A SARS_CoV Bat_SARS-like_CoV_YNLF_31C CoV_BtRl-BetaCoV/SC2018 Rhinolophus_affinis_CoV_LYRa11 UNVERIFIED:_SARS-related_CoV_F46 Bat_CoV_Cp/Yunnan2011 Bat_SARS_CoV_Rf1 Bat_CoV_(BtCoV/273/2005) BtRf-BetaCoV/HeB2013 Bat_CoV_Jiyuan-84 Bat_CoV_strain_16BO133 Bat_SARS_CoV_Rm1 BtRs-BetaCoV/HuB2013 Bat_CoV_Rp/Shaanxi2011 SARS-related_CoV_strain_BtKY72 Bat_CoV_BM48-31/BGR/2008\ speciesGroups CoV_2019\ subGroups view=align\ track strainName44way\ treeImage phylo/wuhCor1_44way.png\ type wigMaf 0.0 1.0\ cd8escape CD8 Escape Muts bigBed 9 + T-Cell MHCI CD8+ Escape Mutations from Agerer et al. Sci Immun 2020 0 100 0 0 0 127 127 127 0 0 0

Description

\

Cytotoxic T lymphocytes (CTL) find and kill cells infected with viruses by examining\ viral epitopes presented by MHC class I molecules on the surface of the cell. In\ the context of SARS-CoV-2, infected patients have elicited CTL responses by\ showing higher levels of granzyme B, perforin and IFN-gamma.\ Rapid rate of mutation in SARS-CoV-2 may hinder the presentation of peptides by\ MHC-I molecules thereby evading T cell response.\

\ \

Agerer et al. performed deep sequencing of viral\ genomes from several patients and identified mutations that were observed in\ Nucleocapsid (N), ORF1ab, Membrane (M), and Envelope (E) proteins. This track\ includes 194 mutants spanning 305 T-cell epitopes restricted to HLA-A*02:01 and\ HLA-B*40:01.\

\ \

Methods

\ In this study, deep sequencing was performed on virus samples from Austria. To\ identify mutant epitopes, MHC-I binding assays were performed. Subsequently,\ PBMCs collected from HLA-A*02:01 and HLA-B*40:01 typed, COVID-19 patients were\ subjected to functional assays. A more detailed description of the methodology\ is decribed in the original article.\ \

References

\

\ Agerer B, Koblischke M, Gudipati V, Montaño-Gutierrez LF, Smyth M, Popa A, Genger JW, Endler L,\ Florian DM, Mühlgrabner V et al.\ \ SARS-CoV-2 mutations in MHC-I-restricted epitopes evade CD8+ T cell\ responses.\ Sci Immunol. 2021 Mar 4;6(57).\ PMID: 33664060\

\ immu 1 bigDataUrl /gbdb/wuhCor1/cd8escape/cd8escape.bb\ group immu\ longLabel T-Cell MHCI CD8+ Escape Mutations from Agerer et al. Sci Immun 2020\ noScoreFilter on\ shortLabel CD8 Escape Muts\ track cd8escape\ type bigBed 9 +\ rosettaMhc CD8 RosettaMHC bigBed 9 + CD8 Epitopes predicted by NetMHC and Rosetta 0 100 0 0 0 127 127 127 0 0 0 https://rosettamhc.chemistry.ucsc.edu/?epitope=$$

Description

\ As a first step toward the development of diagnostic and therapeutic tools to fight the Coronavirus disease (COVID-19), it is important to characterize CD8+ T cell epitopes in the SARS-CoV-2 peptidome that can trigger adaptive immune responses. Here, we use RosettaMHC, a comparative modeling approach which leverages existing high-resolution X-ray structures from peptide/MHC complexes available in the Protein Data Bank, to derive physically realistic 3D models for high-affinity SARS-CoV-2 epitopes. We outline an application of our method to model 439 9mer and 279 10mer predicted epitopes displayed by the common allele HLA-A*02:01, and we make our models publicly available through an online database (https://rosettamhc.chemistry.ucsc.edu). As more detailed studies on antigen-specific T cell recognition become available, RosettaMHC models of antigens from different strains and HLA alleles can be used as a basis to understand the link between peptide/HLA complex structure and surface chemistry with immunogenicity, in the context of SARS-CoV-2 infection.\ \

\ This track includes 718 CD8 epitopes restricted to HLA-A*02:01 as predicted by NetMHCpan4.0 and RosettaMHC.\ The structural models of all 718 epitopes are available in the database (see Description). All the epitopes are scored using a combined NetMHCPan4.0 (eluted ligand) predicted binding affinity and binding energy calculated in Rosetta force field (score = (0.5 * ( ((NetMHCPan affinity - Average NetMHCPan affinity) / range of NetMHCPan affinities) + ( (Rosetta binding energy - Average Rosetta binding energy ) / range of Rosetta binding energies) ) + 1 ) * 500).\

\ \

Methods

\ Epitopes of lengths 9 and 10 from all reading frames of SARS-CoV-2 proteome are generated and filtered using NetMHCPan4.0 (eluted ligand prediction). All the epitopes predicted as strong or weak binders (a total of 718) to HLA-A*02:01 by NetMHCPan4.0 (using default %Rank cut-off) are modeled using RosettaMHC. Further, binding energies of all 718 epitopes to HLA-A*02:01 is calculated in Rosetta. Alongside all the models, their NetMHCpan predictions and binding energies are made available through a database and Supplementary Table 1 from the reference, Nerli and Sgourakis. (2020) in the References section below.\ \

Notes

\

For a full description of the methods used, refer to Nerli and Sgourakis. (2020) in the References section below.

\ \

Credits

\

Nikolaos Sgourakis (nsgourak@ucsc.edu)

\

Santrupti Nerli (snerli@ucsc.edu)

\ \

\ Data were generated and processed at UCSC. For inquiries, please contact Nikolaos Sgourakis from the Sgourakis Research Group at UCSC.\

\ \

References

\

Nerli and Sgourakis. 2020 (Manuscript submitted) (BioRxiv).

\ immu 1 bigDataUrl /gbdb/wuhCor1/bbi/rosetta.bb\ exonArrows off\ exonNumbers off\ group immu\ longLabel CD8 Epitopes predicted by NetMHC and Rosetta\ shortLabel CD8 RosettaMHC\ track rosettaMhc\ type bigBed 9 +\ url https://rosettamhc.chemistry.ucsc.edu/?epitope=$$\ urlLabel Link to Rosetta Model\ urls name="https://rosettamhc.chemistry.ucsc.edu/?epitope=$$"\ cytoBandIdeo Chromosome Band (Ideogram) bed 4 + Ideogram for Orientation 1 100 0 0 0 127 127 127 0 0 0 map 1 group map\ longLabel Ideogram for Orientation\ shortLabel Chromosome Band (Ideogram)\ track cytoBandIdeo\ type bed 4 +\ visibility dense\ igm_COVID_409 COVID 409 bigBed 9 COVID 409 1 100 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igm/COVID_409.bb\ longLabel COVID 409\ parent igm on\ priority 0\ shortLabel COVID 409\ track igm_COVID_409\ type bigBed 9\ igg_COVID_527 COVID 527 bigBed 9 COVID 527 1 100 0 0 0 127 127 127 0 0 0 immu 1 bigDataUrl /gbdb/wuhCor1/pbmShanghai/igg/COVID_527.bb\ longLabel COVID 527\ parent igg on\ priority 0\ shortLabel COVID 527\ track igg_COVID_527\ type bigBed 9\ cpgIslandSuper CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 0 100 0 100 0 128 228 128 0 0 0

Description

\ \

CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time,\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some other reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.

\ \

\ The unmasked version of the track displays potential CpG islands\ that exist in repeat regions and would otherwise not be visible\ in the repeat masked version.\

\ \

\ By default, only the masked version of the track is displayed. To view the\ unmasked version, change the visibility settings in the track controls at\ the top of this page.\

\ \

Methods

\ \

CpG islands were predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment was then\ evaluated for the following criteria:\ \

    \ \
  • GC content of 50% or greater
  • \ \
  • length greater than 200 bp
  • \ \
  • ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the \ \ basis of the number of Gs and Cs in the segment
  • \
\

\

\ The entire genome sequence, masking areas included, was\ used for the construction of the track Unmasked CpG.\ The track CpG Islands is constructed on the sequence after\ all masked sequence is removed.\

\ \

The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length. The ratio of observed to expected \ CpG is calculated according to the formula (cited in \ Gardiner-Garden et al. (1987)):\ \

    Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)
\ \ where N = length of sequence.

\

\ The calculation of the track data is performed by the following command sequence:\

\
twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \\\
  | cpg_lh /dev/stdin 2> cpg_lh.err \\\
    |  awk '{$2 = $2 - 1; width = $3 - $2;  printf("%s\\t%d\\t%s\\t%s %s\\t%s\\t%s\\t%0.0f\\t%0.1f\\t%s\\t%s\\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \\\
     | sort -k1,1 -k2,2n > cpgIsland.bed\
\ The unmasked track data is constructed from\ twoBitToFa -noMask output for the twoBitToFa command.\

\ \

Data access

\

\ CpG islands and its associated tables can be explored interactively using the\ REST API, the\ Table Browser or the\ Data Integrator.\ All the tables can also be queried directly from our public MySQL\ servers, with more information available on our\ help page as well as on\ our blog.

\

\ The source for the cpg_lh program can be obtained from\ src/utils/cpgIslandExt/.\ The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file")\

\ \

Credits

\ \

This track was generated using a modification of a program developed by G. Miklem and L. Hillier \ (unpublished).

\ \

References

\ \

\ Gardiner-Garden M, Frommer M.\ \ CpG islands in vertebrate genomes.\ J Mol Biol. 1987 Jul 20;196(2):261-82.\ PMID: 3656447\

\ regulation 1 altColor 128,228,128\ color 0,100,0\ group regulation\ html cpgIslandSuper\ longLabel CpG Islands (Islands < 300 Bases are Light Green)\ shortLabel CpG Islands\ superTrack on\ track cpgIslandSuper\ type bed 4 +\ crisprDet CRISPR Detection psl CRISPR Detection Guides 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the locations of CRISPR detection guides and flanking primers. There are three studies on this topic,\ with the one by Mammoth Biosciences the most advanced. The Broad Institute's guides were validated, the NYU guides are predictions for now.\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
\ Prefix\ \ Institution\ \ Overview\ \ Links\
MammothMammoth Biosciences\

There are three guides in the set, two are shown on the genome. The third guide detects human RNAseP, as a sample control. Target sequence: TAATTTCTACTAAGTGTAGATAATTACTTGGGTGTGACCCT\

\ Protocol and preprint. \
BroadSabeti Lab, Broad Institute\ Various guides are predicted, the one shown here is from the protocol.\ \ Protocol from Mar 21 2020, Guide website and preprint. \
NYUSanjana Lab, New York University\ These are predictions by a new algorithm. Shown is the prediction with the highest score.\ \ Website\
\

\ \ \

Methods

\ Sequences were mapped with BLAT.\ \

Credits

\ Thanks to the labs for producing this work, as well as Kiley Charbonneau for pointing us to the papers.\ map 1 group map\ longLabel CRISPR Detection Guides\ shortLabel CRISPR Detection\ track crisprDet\ type psl\ resist Drug Resistance Mutations bigBed 9 + Mutations that confer drug resistance (Anna Niewiadomska, BV-BRC) 1 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track lists amino acid mutations that are known to confer drug \ resistance against Remdesivir, Sotrovimab, and Nirmatrelvir (Paxlovid). They were\ sourced with permission from Paul\ Gordon's website at the University of Calgary and Anna Niewiadomska from \ the J. Craig Venter Institute, manually input by\ Max Haeussler.

\ \

Display Conventions

\ Mutations are highlighted on the reference genome. Click or mouse over \ the mutations to show the full description.\ \

Data Access

\

\ The data can be explored interactively with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server\ or the API.\ \

\ Please refer to our\ \ mailing list archives for questions, or our\ Data Access FAQ\ for more information.

\ \

Credits

\

\ Thanks to Anna Niewiadomska from \ the BV-BRC team at the J. Craig Venter Institute for converting these to coordinates.

\ \ varRep 1 bedNameLabel Mutation\ bigDataUrl /gbdb/wuhCor1/resist/resist.bb\ group varRep\ itemRgb on\ longLabel Mutations that confer drug resistance (Anna Niewiadomska, BV-BRC)\ mouseOver $comments\ noScoreFilter on\ shortLabel Drug Resistance Mutations\ track resist\ type bigBed 9 +\ urls urls=$$\ visibility dense\ galaxyEna Galaxy ENA mutations bed GalaxyProject surveillance of SARS-CoV-2 mutations through consistent processing of public raw sequencing data 1 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track represents parts of the SARS-CoV-2 analysis efforts of the GalaxyProject [1].\ \ This project aims at fully open and transparent, high-quality reanalysis of public raw sequencing data deposited in INSDC databases on ready-to-use public infrastructure [2].\ It restricts itself to data deposited by national genome surveillance projects that are providing sufficient sample metadata (along with the submitted data or through personal communication) to allow for best-practice analysis and reporting (for examples see [3, 4, 5]).

\ \

Required metadata are:\

    \
  • Sample collection date
  • \
  • Sequencing platform, library layout and strategy (currently reanalysis is done for ampliconic paired-end Illumina and ONT data)
  • \
  • the primer scheme used for the generation of amplicons (this information is used to trim primer sequences from the data before variant calling; reanalysis can be done for any primer scheme with publicly available primer binding site information)
  • \
  • some kind of discernible batch information (e.g. a library identifier) that can be used to form batches of samples for reanalysis and batch-level reporting
  • \
\

\

\ Analysis is performed on public Galaxy servers with only open-source tools orchestrated through public, community-developed, reproducible workflows available from WorkflowHub and Dockstore and includes mutation calling for all samples, generation of per-sample and batch-level mutation reports and plots, generation of consensus sequences and pangolin lineage assignments.\ Key results and metadata are hosted on a public FTP server provided by the Centre for Genomic Regulation and the Barcelona Supercomputing Centre and form the basis of these UCSC genome browser tracks.\ The project web site has more information about available results data.\

\ \

Display Conventions and Configuration

\ \

Track structure

\

\ The GalaxyProject SARS-CoV-2 mutations tracking effort comes as a supertrack containing four subtracks that represent mutation data from SARS-CoV-2 samples collected in different 3-months periods of the Covid-19 pandemic.\ The quarters are redefined with each data update with the latest/current quarter starting 3 months prior to the day of the update. The end date displayed on the current quarter track corresponds to the collection date of the most recent analyzed sample on the day of the update.\

\ \

Each quarter's subtrack is, in turn, composed of separate mutation data tracks for the five most common pangolin lineages observed in the data for that quarter.

\

Together the tracks can be used to explore the change of dominating lineages (and their associated mutation patterns) over time and, for lineages dominant over multiple quarters, to search for evidence of emerging within-lineage mutations.\ \

Mutation feature display

\

To facilitate such search the shading of mutation features reflects the mutation's observed frequency among the samples of a given lineage in the given quarter, which means that lineage defining mutations should be displayed in dark grey/black, while newly emerging mutations or non-systematic variant calling artefacts should appear in lighter shades of grey.

\

Mutation features are labeled with their effects at the amino acid level and, for SNV mutations, the feature as a whole will extend across the base triplet encoding the affected amino acid, while the thick part of the feature will indicate the precise base that gets changed by the mutation. For deletions, the whole feature will have a thick rendering, while insertions will be displayed all thin.\ \

Mutation details

\

Hovering over any mutation feature (in dense or full display mode of the track) will reveal details of the mutation and the associated statistics, in particular:\

    \
  • the precise value for its observed frequency in the lineage and quarter
  • \
  • the intra-sample allele frequency (median and lower/upper quartile) at which the mutation has been called in the samples in which it has been detected.
  • \
  • the collection date and the collecting country of the sample, in which this mutation was first (ever) detected in the context of the lineage. Note that for older, still circulating lineages the collection date of that sample can be older than the start of the earliest quarter displayed in the genome browser (since our complete surveillance data goes back further than four quarters).
  • \
\

\ \

Filtering Mutations

\

Mutation features displayed in each subtrack can be filtered by\

    \
  • country or combination of countries in which samples of the given lineage and collection quarter with the mutation have been collected. You could for example filter all current quarter lineage tracks to show only mutations that have been found (in their respective lineage) in the UK.
  • \
  • within-lineage frequency. By default only mutations are shown that have been observed in at least 5% of the samples assigned to the given lineage in the given quarter (0.05 default filter setting). You can lower or increase that threshold as you see fit. Note however, that the underlying bigbed data of the tracks is filtered to contain only data for mutations above a threshold of 0.1% (i.e. a 0.001 hard filter is always in effect).
  • \
\ \

Methods

\

\ For analyses, batches of raw sequencing data get downloaded from public databases (in particular, from the FTP server of the European Nucleotide Archive) onto one of several public Galaxy instances.\ The data gets processed with a sequencing platform-specific variation analysis workflow (one for paired-end Illumina data, another one for ONT data), which performs QC, read mapping, mapped reads postprocessing including primer trimming, variant calling and annotation and results in a collection of VCF files, one for each sample in the batch.\ This output gets picked up by a reporting workflow, which generates per-sample and per-batch mutation reports and a per-batch allele-frequency plot for a quick overview over variant patterns in the batch. In parallel, the outputs of the variation analysis workflow are also used by a consensus workflow to produce a FASTA consensus sequence for every sample in the batch.\ \ Sequencing data downloads, execution of the three types of workflows, and export of key results files are orchestrated by bot scripts, which can be used together with the public workflows to set up the complete analysis system on any Galaxy server.\ \ The bot accounts on participating Galaxy servers are checked on a roughly weekly basis for newly finished analysis histories, then\

    \
  1. those histories are made publicly accessible on their server
  2. \
  3. batch information, i.e., samples analyzed and their metadata, links to the histories, etc. are added to
    ftp://xfer13.crg.eu/gx-surveillance.json
  4. \
  5. pangolin lineage assignment is (re)performed for the entire collection of samples ever analyzed
  6. \
  7. the genome browser tracks get recalculated by\
      \
    1. parsing all analyzed data on the ftp server
    2. \
    3. determining the five most frequently observed pangolin lineages for each of the last four quarters, starting from the current date
    4. \
    5. extracting all mutations seen in each quarter for each of the five top lineages in that quarter
    6. \
    7. rebuilding the bigbed files and track files
    8. \
    \
  8. \
\

\ \

Credits

\

\ The analysis behind these tracks is the result of joint efforts of the Galaxy community at large, the usegalaxy.org and usegalaxy.eu teams, the IUC, and the IWC.\

\

\ The infrastructure and development work behind the project was made possible by generous support from funding agencies around the world.\

\

\ For questions regarding SARS-CoV-2 data analysis and its automation with Galaxy, please join us in the GalaxyProject Public Health matrix channel.\

\

The project would not be possible without the sequencing data provided by genome surveillance initiatives that have decided to make their data and metadata publically available by depositing it in INSDC databases. In particular we would like to thank:\

\ \

References

\ \

\

    \
  1. Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Chilton, J.; Coraor, N.; Coppens, F.; Eguinoa, I.; Gladman, S.; Grüning, B.; Keener, N.; Lariviere, D.; Lonie, A.; Kosakovsky Pond, S.; Maier, W.; Nekrutenko, A.; Taylor, J. & Weaver, S. (2020): No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathogens 16(8):e1008643. DOI: 10.1371/journal.ppat.1008643
  2. \
  3. Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; Argila, J. R. D.; Baker, D.; Roach, N.; Gladman, S.; Coppens, F.; Martin, D. P.; Lonie, A.; Grüning, B.; Pond, S. L. K. & Nekrutenko, A. (2021): Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature Biotechnology 39, 1178-1179. DOI: 10.1038/s41587-021-01069-1
  4. \
\

\ varRep 1 group varRep\ longLabel GalaxyProject surveillance of SARS-CoV-2 mutations through consistent processing of public raw sequencing data\ shortLabel Galaxy ENA mutations\ superTrack on\ track galaxyEna\ type bed\ visibility dense\ gap Gap bed 3 + Gap Locations 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the gaps in the Jan. 2020 sars-cov-2 genome assembly.\

\

\ Genome assembly procedures are covered in the NCBI\ assembly documentation.
\ NCBI also provides\ specific information about this assembly.\

\

\ The definition of the gaps in this assembly is from the\ AGP file delivered with the sequence. The NCBI document\ AGP Specification describes the format of the AGP file.\

\

\ Gaps are represented as black boxes in this track.\ If the relative order and orientation of the contigs on either side\ of the gap is supported by read pair data,\ it is a bridged gap and a white line is drawn\ through the black box representing the gap.\

\

This assembly has no annotated gaps.\

\ map 1 group map\ html gap\ longLabel Gap Locations\ shortLabel Gap\ track gap\ type bed 3 +\ visibility hide\ gc5BaseBw GC Percent bigWig 0 100 GC Percent in 5-Base Windows 0 100 0 0 0 128 128 128 0 0 0

Description

\

\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in 5-base windows. High GC content is typically associated with\ gene-rich areas.\

\

\ This track may be configured in a variety of ways to highlight different\ apsects of the displayed information. Click the\ "Graph configuration help"\ link for an explanation of the configuration options.\ \

Credits

\

The data and presentation of this graph were prepared by\ Hiram Clawson.\

\ \ map 0 altColor 128,128,128\ autoScale Off\ color 0,0,0\ graphTypeDefault Bar\ gridDefault OFF\ group map\ html gc5Base\ longLabel GC Percent in 5-Base Windows\ maxHeightPixels 128:36:16\ shortLabel GC Percent\ track gc5BaseBw\ type bigWig 0 100\ viewLimits 30:70\ visibility hide\ windowingFunction Mean\ genscan Genscan Genes genePred genscanPep Genscan Gene Predictions 0 100 170 100 0 212 177 127 0 0 0

Description

\ \

\ This track shows predictions from the\ Genscan program\ written by Chris Burge.\ The predictions are based on transcriptional, translational and donor/acceptor\ splicing signals as well as the length and compositional distributions of exons,\ introns and intergenic regions.\

\ \

\ For more information on the different gene tracks, see our Genes FAQ.

\ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for\ gene prediction\ tracks.\

\ \

\ The track description page offers the following filter and configuration\ options:\

    \
  • Color track by codons: Select the genomic codons option\ to color and label each codon in a zoomed-in display to facilitate validation\ and comparison of gene predictions. Go to the\ \ Coloring Gene Predictions and Annotations by Codon page for more\ information about this feature.
  • \
\

\ \

Methods

\ \

\ For a description of the Genscan program and the model that underlies it,\ refer to Burge and Karlin (1997) in the References section below.\ The splice site models used are described in more detail in Burge (1998)\ below.\

\ \

Credits

\ \ Thanks to Chris Burge for providing the Genscan program.\ \

References

\ \

\ Burge C.\ Modeling Dependencies in Pre-mRNA Splicing Signals.\ In: Salzberg S, Searls D, Kasif S, editors.\ Computational Methods in Molecular Biology.\ Amsterdam: Elsevier Science; 1998. p. 127-163.\

\ \

\ Burge C, Karlin S.\ \ Prediction of complete gene structures in human genomic DNA.\ J. Mol. Biol. 1997 Apr 25;268(1):78-94.\ PMID: 9149143\

\ genes 1 color 170,100,0\ group genes\ longLabel Genscan Gene Predictions\ shortLabel Genscan Genes\ track genscan\ type genePred genscanPep\ visibility hide\ multiz7way Human CoV wigMaf 0.0 1.0 Multiz Alignment and Conservation of 7 Strains of human coronavirus 0 100 0 10 100 0 90 10 0 0 0

Description

\

\ This track shows multiple alignments of 7 human coronavirus sequences,\ aligned to the SARS-CoV-2 NCBI reference sequence SARS-CoV-2 for \ NC_045512.2,\ genome assembly GCF_009858895.2_ASM985889v3.\ \ The multiple alignments were generated using Multiz and\ other tools in the UCSC/Penn State Bioinformatics\ comparative genomics alignment pipeline.\

\ \

\ In the track display, the sequences are labeled using common names.\ Note the table below to relate these common names to the NCBI assembly\ accession identifier.\

\ \

Display Conventions and Configuration

\

\ Pairwise alignments of each species to the SARS-CoV-2 genome are\ displayed as a series of colored blocks indicating the functional effect of polymorphisms (in pack\ mode), or as a wiggle (in full mode) that indicates alignment quality.\ In dense display mode, percent identity of the whole alignments is shown in grayscale using\ darker values to indicate higher levels of identity.\

\ In pack mode, regions that align with 100% identity are not shown. When there is not 100% percent\ identity, blocks of four colors are drawn.\

    \
  • Red blocks are\ drawn when a polymorphism in a coding region results in a change in the amino\ acid that is generated.
  • \
  • Green blocks are\ drawn when a polymorphism in a coding region results in no change to the amino\ acid that is generated.
  • \
  • Blue blocks are\ drawn when a polymorphism is outside a coding region.
  • \
  • Pale yellow blocks\ are drawn when there are no aligning bases to that region in the reference\ genome.
  • \
\

\ Checkboxes on the track configuration page allow selection of the\ species to include in the pairwise display.\ Configuration buttons are available to select all of the species\ (+), deselect all of the species (-), or\ use the default settings (Reset to defaults).\

\ For text nucleotide alignments, click on\ the alignment tracks. To view detailed information about the alignments at a specific\ position, zoom to a small region or click the 'base' button to see amino acid alignments.

\ \

Base Level

\

\ When zoomed-in to the base-level display, the track shows the amino acid\ composition of each alignment.\ The numbers and symbols on the Gaps\ line indicate the lengths of gaps in the SARS-CoV-2 sequence at those\ alignment positions relative to the longest non-SARS-CoV-2 sequence.\ If there is sufficient space in the display, the size of the gap is shown.\ If the space is insufficient and the gap size is a multiple of 3, a\ "*" is displayed; other gap sizes are indicated by "+".

\

\ Codon translation can be turned off in base-level display mode if desired. \ You can select the species for translation from the pull-down menu in the Codon\ Translation configuration section at the top of the page. Then, select one of\ the following modes:\

    \
  • \ No codon translation: The gene annotation is not used; the bases are\ displayed without translation.\
  • \ Use default species reading frames for translation: The annotations from\ the genome displayed in the Default species to establish reading frame\ pull-down menu are used to translate all the aligned species present in the\ alignment.\
  • \ Use reading frames for species if available, otherwise no translation:\ Codon translation is performed only for those species where the region is\ annotated as protein coding.\
  • Use reading frames for species if available, otherwise use default species:\ Codon translation is done on those species that are annotated as being protein\ coding over the aligned region using species-specific annotation; the remaining\ species are translated using the default species annotation.\

\ \

Methods

\

\ Pairwise alignments with the reference sequence were generated for\ each sequence using LASTZ version 1.04.03.\ Parameters used for each LASTZ alignment:\

\
# hsp_threshold      = 3000\
# gapped_threshold   = 3000 = L\
# x_drop             = 910\
# y_drop             = 9400 = Y\
# gap_open_penalty   = 400\
# gap_extend_penalty = 30\
#        A    C    G    T\
#   A   91 -114  -31 -123\
#   C -114  100 -125  -31\
#   G  -31 -125  100 -114\
#   T -123  -31 -114   91\
# seed=1110100110010101111 w/2 transitions\
# step=1\
\ Pairwise alignments were then linked into chains using a dynamic programming\ algorithm that finds maximally scoring chains of gapless subsections\ of the alignments organized in a kd-tree. Parameters used in\ the chaining (axtChain) step:\
\
-minScore=1000 -linearGap=loose

\

\ High-scoring chains were then placed along the genome, with\ gaps filled by lower-scoring chains, to produce an alignment net.\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \
countsample
date
accessionphylogenetic
distance
descriptive name
12019-12-30NC_045512.20.000000SARS-CoV-2 (2019)
22003-04NC_004718.30.885159SARS-CoV-1 (Tor2)
32012-06-13NC_019843.32.434930MERS Middle East respiratory syndrome CoV
42004-03NC_006213.12.589639Human CoV OC43 strain ATCC VR-759
52004-04NC_006577.22.649716Human CoV HKU1
62000-09NC_002645.12.983896Human CoV 229E
72004-03NC_005831.23.009141Human Coronavirus NL63
\

\ The multiple alignment was constructed from the resulting\ pairwise alignments progressively aligned using\ MultiZ/autoMZ.\ The phylogenetic tree was calculated on 31mer frequency similarity\ and neighbor joining that distance matrix with the\ PHYLIP toolset command:\ neighbor. The reference sequence NC_045512v2 is at the\ top of the tree:\

\
((((SARS_CoV_2 SARS_CoV_1) MERS) (OC43 HKU1)) (CoV229E NL63))\
\ Framing tables from the genes were constructed to enable\ visualization of codons in the multiple alignment display.

\

Data Access

\

\ Downloads for data in this track are available:\

\ \ \

Credits

\

This track was created using the following programs:\

    \
  • Alignment tools: LASTZ (formerly Blastz) and MultiZ by Minmei Hou, \ Scott Schwartz, Robert Harris, and Webb Miller of the \ Penn State Bioinformatics Group\
  • Conservation scoring: phastCons, phyloP, phyloFit, tree_doctor, msa_view and\ other programs in PHAST by\ Adam Siepel at Cold Spring Harbor Laboratory (original development\ done at the Haussler lab at UCSC).\
  • Chaining and Netting: axtChain, chainNet by Jim Kent at UCSC\
  • MAF Annotation tools: mafAddIRows by Brian Raney, UCSC; mafAddQRows\ by Richard Burhans, Penn State; genePredToMafFrames by Mark Diekhans, UCSC\
  • Tree image generator: phyloPng by Galt Barber, UCSC\
  • Conservation track display: Kate Rosenbloom, Hiram Clawson (wiggle\ display), and Brian Raney (gap annotation and codon framing) at UCSC\
\

\ \

References

\ \

\ Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M,\ Fullah M, Dudas G et al.\ Genomic surveillance elucidates Ebola virus origin and transmission\ during the 2014 outbreak.\ Science 2014 Sep 12;345(6202):1369-72.\ PMID: 25214632;\ Supplemental Materials and Methods\

\ \

Phylo-HMMs, phastCons, and phyloP:

\

\ Felsenstein J, Churchill GA.\ A Hidden Markov Model approach to\ variation among sites in rate of evolution.\ Mol Biol Evol. 1996 Jan;13(1):93-104.\ PMID: 8583911\

\ \

\ Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A.\ \ Detection of nonneutral substitution rates on mammalian phylogenies.\ Genome Res. 2010 Jan;20(1):110-21.\ PMID: 19858363; PMC: PMC2798823\

\ \

\ Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K,\ Clawson H, Spieth J, Hillier LW, Richards S, et al.\ Evolutionarily conserved elements in vertebrate, insect, worm,\ and yeast genomes.\ Genome Res. 2005 Aug;15(8):1034-50.\ PMID: 16024819; PMC: PMC1182216\

\ \

\ Siepel A, Haussler D.\ Phylogenetic Hidden Markov Models.\ In: Nielsen R, editor. Statistical Methods in Molecular Evolution.\ New York: Springer; 2005. pp. 325-351.\

\ \

\ Yang Z.\ A space-time process model for the evolution of DNA\ sequences.\ Genetics. 1995 Feb;139(2):993-1005.\ PMID: 7713447; PMC: PMC1206396\

\ \

Chain/Net:

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron:\ duplication, deletion, and rearrangement in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.\ PMID: 14500911; PMC: PMC208784\

\ \

Multiz:

\

\ Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM,\ Baertsch R, Rosenbloom K, Clawson H, Green ED, et al.\ Aligning multiple genomic sequences with the threaded blockset aligner.\ Genome Res. 2004 Apr;14(4):708-15.\ PMID: 15060014; PMC: PMC383317\

\ \

LASTZ (formerly Blastz):

\

\ Chiaromonte F, Yap VB, Miller W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput. 2002:115-26.\ PMID: 11928468\

\ \

\ Harris RS.\ Improved pairwise alignment of genomic DNA.\ Ph.D. Thesis. Pennsylvania State University, USA. 2007.\

\ \

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-mouse alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.\ PMID: 12529312; PMC: PMC430961\

\ compGeno 1 altColor 0,90,10\ color 0, 10, 100\ frames multiz7wayFrames\ group compGeno\ html cons7way\ itemFirstCharCase noChange\ longLabel Multiz Alignment and Conservation of 7 Strains of human coronavirus\ mafDot on\ mafShowSnp on\ sGroup_CoV_2019 SARS_CoV_1 MERS OC43 HKU1 NL63 CoV229E\ shortLabel Human CoV\ snpMode on\ snpTable mafSnp7way\ speciesCodonDefault wuhCor1\ speciesDefaultOn SARS_CoV_1 MERS OC43 HKU1 NL63 CoV229E\ speciesGroups CoV_2019\ track multiz7way\ treeImage phylo/wuhCor1_7way.png\ type wigMaf 0.0 1.0\ visibility hide\ icshape icSHAPE RNA Struct bigWig icSHAPE RNA Structure 0 100 0 0 0 127 127 127 0 0 0

Description

\

This track shows normalized icSHAPE reactivity data of in vivo and in vitro SARS-CoV-2 experiments.\

\ \

Methods

\

\ icSHAPE scores were provided by Qiangfeng Cliff Zhang and converted to bigWig.\ The first five nucleotides of the genome and the last thirty nucleotides have\ no data due to technical defects (on the 5' and 3' ends of the virus) or insufficient \ sequencing depth. The icSHAPE score is distributed between 0 and 1.\

\ \

Data Access

\

\ You can download the bigWig file underlying this track (icshape/*.bw) from our\ Download Server. The data can be explored interactively with the\ Table Browser\ or the Data Integrator. The data can be\ accessed from scripts through our API.\

\
http://api.genome.ucsc.edu/getData/track?genome=wuhCor1;track=icshapeInvitro;chrom=NC_045512v2;start=100;end=7875\
\ Command-line extraction can be accomplished using an example like the following command:\
\
bigWigToWig -udcDir=. -chrom=NC_045512v2 -start=100 end=7875 https://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/icshape/invivo.bigWig myOutput.wig\

\

\ Please refer to our mailing list archives for questions, or our\ Data Access FAQ for more\ information.

\ \

References

\

Lei Sun, Pan Li, Xiaohui Ju, Jian Rao, Wenze Huang, Shaojun Zhang, Tuanlin Xiong, Kui Xu, Xiaolin Zhou, Lili Ren, Qiang Ding, Jianwei Wang, Qiangfeng Cliff Zhang\ \ "In vivo structural characterization of the whole SARS-CoV-2 RNA genome identifies host cell target proteins vulnerable to re-purposed drugs", Biorxiv July 8 2020

\ rna 0 compositeTrack on\ group rna\ longLabel icSHAPE RNA Structure\ shortLabel icSHAPE RNA Struct\ track icshape\ type bigWig\ visibility hide\ epitopes IEDB Predictions bigBed 9 + IEDB-Predicted Epitopes from Grifoni et al 2020 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This composite track indicates the immune epitope predictions\ for B cells, CD4 T-cells and CD8 T-cells, using these software packages:\ B cells = BebiPred 2.0, CD4 = IEDB Tepitool, CD8 = NetMHCpan4.0EL

\

\ The color range for the markers is from dark blue (strong prediction)\ to dark red (weak prediction). Black is used for markers with no\ calculated prediction.

\

\ From the publication:\ \ Candidate targets for immune responses to 2019-Novel Coronavirus (nCoV):\ sequence homology- and bioinformatic-based predictions,\ full reference below.\

\

The prediction of epitopes for CD8 T-cells was run for the following HLA alleles, as they have a worldwide population\ frequency > 6%

HLA\ allelesFrequency in worldwide\ population
HLA-A*01:0116.2
HLA-A*02:0125.2
HLA-A*03:0115.4
HLA-A*11:0112.9
HLA-A*23:016.4
HLA-A*24:0216.8
HLA-B*07:0213.3
HLA-B*08:0111.5
HLA-B*35:016.5
HLA-B*40:0110.3
HLA-B*44:029.2
HLA-B*44:037.6
\

\

Summary

\

\ We identified potential targets for immune responses to 2019-nCoV\ and provide essential information for understanding human immune responses\ to this virus and evaluation of diagnostic and vaccine candidates.\

\

Abstract

\

\ Effective countermeasures against the recent emergence and rapid expansion\ of the 2019-Novel Coronavirus (2019-nCoV) require the development of data\ and tools to understand and monitor viral spread and immune responses.\ However, little information about the targets of immune responses to\ 2019-nCoV is available. We used the Immune Epitope Database and Analysis\ Resource (IEDB) resource to catalog available data related to other\ coronaviruses, including SARS-CoV, which has high sequence similarity to\ 2019-nCoV, and is the best-characterized coronavirus in terms of epitope\ responses. We identified multiple specific regions in 2019-nCoV that have\ high homology to SARS virus. Parallel bionformatic predictions identified a\ priori potential B and T cell epitopes for 2019-nCoV. The independent\ identification of the same regions using two approaches reflects the high\ probability that these regions are targets for immune recognition of 2019-nCoV.\

\ \

Credits

\ Data collected by Arkal Arjun Rao for the\ Sgourakis Research Group, U.C. Santa Cruz\ \

References

\

\ Grifoni A, Sidney J, Zhang Y, Scheuermann RH, Peters B, Sette A.\ \ Candidate targets for immune responses to 2019-Novel Coronavirus (nCoV):\ sequence homology- and bioinformatic-based predictions,\ BioRxiv 2020 (doi: https://doi.org/10.1101/2020.02.12.946087)\ immu 1 compositeTrack on\ group immu\ itemRgb on\ longLabel IEDB-Predicted Epitopes from Grifoni et al 2020\ shortLabel IEDB Predictions\ track epitopes\ type bigBed 9 +\ visibility hide\ ucscToINSDC INSDC bed 4 Accession at INSDC - International Nucleotide Sequence Database Collaboration 0 100 0 0 0 127 127 127 0 0 0 https://www.ncbi.nlm.nih.gov/nuccore/$$

Description

\

\ This track associates UCSC Genome Browser chromosome names to accession\ names from the International Nucleotide Sequence Database Collaboration (INSDC).\

\ \

\ The data were downloaded from the NCBI assembly database.\

\ \

Credits

\

The data for this track was prepared by\ Hiram Clawson.\ \ map 1 group map\ longLabel Accession at INSDC - International Nucleotide Sequence Database Collaboration\ shortLabel INSDC\ track ucscToINSDC\ type bed 4\ url https://www.ncbi.nlm.nih.gov/nuccore/$$\ urlLabel INSDC link:\ visibility hide\ microdel Microdeletions bed 6 Microdeletions in GISAID sequences 0 100 0 0 0 127 127 127 1 0 0

Description

\

This track shows deletions that have been found in the sequences uploaded to the GISAID database as of June 6, 2020.\ Three confidence levels of deletion calls are shown:

\
    \
  • deletions found in at least 1 \ GISAID sequence
  • \
  • deletions found in at least 2 GISAID\ sequences
  • \
  • deletions found in at least 2 GISAID sequences that\ were able to be validated with raw reads.
\ \

Methods

\

\ We accessed all GISAID SARS-CoV-2 sequences on June 6, 2020. We filtered to\ high coverage reads encompassing the entire SARS-CoV-2 genome (>=29000 bps),\ leaving 12,403 sequences.\ We aligned the reads using MAFFT.

\ \

Verification

\

We validated several deletions with the raw reads from NCBI's SRA Run browser. \ Additionally, NYU Langone Health provided us with the aligned reads for many of \ their sequences.

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, combined with other datasets in the\ Data Integrator tool, \ or downloaded directly as "microdel.txt.gz" from\ the download server.\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

Credits

\

\ We thank all of the labs that submitted their sequences to the GISAID database. \ The full acknowledgement table can be found at \ https://github.com/briannachrisman/SARS-CoV-2_Microdeletions/blob/master/acknowledgments.pdf.\ We thank the public health laboratories VIDRL and MDU-PHL at The Peter Doherty Institute for \ Infection and Immunity for providing over 1000 high quality raw reads to NCBI. \ Thank you NYU Langone SARS-CoV2 Sequencing Team's Matthew T Maurano, Matija Snuderl, and \ Adriana Heguy for providing many of their raw reads.\

\ \ \

References

\

Chrisman, Brianna Sierra, Kelley Paskov, Nate Stockham, Kevin Tabatabaei, Jae-Yoon Jung, Peter Washington, Maya Varma, Min Woo Sun, Sepideh Maleki, and Dennis P. Wall. "Indels in SARS-CoV-2 occur at template-switching hotspots." BioData Mining 14, no. 1 (2021): 1-16. https://doi.org/10.1186/s13040-021-00251-0\

\ varRep 1 group varRep\ longLabel Microdeletions in GISAID sequences\ shortLabel Microdeletions\ spectrum on\ track microdel\ type bed 6\ strainName119way Multiz Align wigMaf 0.0 1.0 Multiz Alignment of 119 strains: red=nonsyn green=syn blue=noncod yellow=noalign 3 100 0 10 100 0 90 10 0 0 0 compGeno 1 altColor 0,90,10\ color 0, 10, 100\ frames strainName119wayFrames\ group compGeno\ itemFirstCharCase noChange\ longLabel Multiz Alignment of 119 strains: red=nonsyn green=syn blue=noncod yellow=noalign\ mafDot on\ mafShowSnp on\ noInherit on\ parent strainCons119wayViewalign on\ priority 100\ sGroup_Other_CoV_sequences_from_vertebrate_hosts Bat_CoV_TG13 Pangolin-CoV-2020_MP789 Bat_CoV_BM48-31/BGR/2008 Wencheng_Sm_shrew_CoV_Xingguo-101 Bat_CoV_HKU4-1 Beluga_Whale_CoV_SW1 Wigeon_CoV_HKU20 Avian_infectious_bronchitis_virus BetaCoV_HKU24_strain_HKU24-R05005I Turkey_CoV BetaCoV_ErinaceusCoV/VMC/2012-174/GER/2012 Bat_Hp-betaCoV/Zhejiang2013 BetaCoV_England_1 MERS_Middle_East_respiratory_syndrome_CoV Rabbit_CoV_HKU14 Night-heron_CoV_HKU19 Bovine_CoV Bat_CoV_PREDICT/PDF-2180 Bat_CoV_HKU5-1 Human_CoV_OC43_strain_ATCC_VR-759 Common-moorhen_CoV_HKU21 Bulbul_CoV_HKU11-934 Human_CoV_HKU1 Bat_CoV_HKU9-1 Rat_CoV_Parker BtMr-AlphaCoV/SAX2011 Mouse_hepatitis_virus_strain_MHV-A59_C12_mutant Magpie-robin_CoV_HKU18 Rousettus_bat_CoV_HKU10 Rousettus_bat_CoV_GCCDC1_356 Sparrow_CoV_HKU17 Porcine_CoV_HKU15_strain_HKU15-155 Munia_CoV_HKU13-3514 Human_CoV_229E White-eye_CoV_HKU16 Coronavirus_AcCoV-JC34 Camel_alphaCoV_camel/Riyadh/Ry141/2015 Human_Coronavirus_NL63 Mink_CoV_strain_WD1127 Ferret_CoV_FRCoV-NL-2010 Porcine_epidemic_diarrhea_virus Scotophilus_bat_CoV_512 NL63-related_bat_CoV_strain_BtKYNL63-9a Bat_CoV_HKU8 Bat_CoV_CDPHE15/USA/2006 Thrush_CoV_HKU12-600 BtRf-AlphaCoV/HuB2013 Lucheng_Rn_rat_CoV_Lucheng-19 Bat_CoV_HKU2 BtNv-AlphaCoV/SC2013 Swine_enteric_CoV_strain_Italy/213306/2009 Transmissible_gastroenteritis_virus Bat_CoV_1A Feline_infectious_peritonitis_virus BtRf-AlphaCoV/YN2012 SARS_CoV\ sGroup_SARS-CoV-2_sequences SARS-CoV-2/CA6/human/2020/USA WIV04 BetaCoV/Wuhan/IPBCAMS-WH-04/2019 2019-nCoV_WHU01 BetaCoV/Wuhan/WH-03/2019 WIV06 BetaCoV/Wuhan/YS8011/2020 BetaCoV/Hangzhou/HZ-1/2020_20cov-1L 2019-nCoV/USA-CA8/2020 BetaCoV/Wuhan/WH19008/2019 BetaCoV/Wuhan/WH19005/2019 BetaCoV/Wuhan/IPBCAMS-WH-02/2019 SARS0CoV-2/61-TW/human/2020/NPL SARS-CoV-2/WH-09/human/2020/CHN 2019-nCoV/USA-WI1/2020 2019-nCoV_HKU-SZ-002a_2020 2019-nCoV/USA-WA1/2020 Taiwan/NTU01/2020 BetaCoV/Wuhan/WH-04/2019 2019-nCoV/Japan/TY/WK-521/2020 2019-nCoV/Japan/TY/WK-501/2020 2019-nCoV/Japan/TY/WK-012/2020 2019-nCoV_HKU-SZ-005b_2020 2019-nCoV/USA-CA7/2020 SARS-CoV-2/IQTC04/human/2020/CHN SARS-CoV-2/105/human/2020/CHN SARS-CoV-2/233/human/2020/CHN SARS-CoV-2/Hu/DP/Kng/19-027 SARS-CoV-2/Hu/DP/Kng/19-020 SARS-CoV-2/SP02/human/2020/BRA 2019-nCoV/USA-AZ1/2020 SARS-CoV-2/Yunnan-01/human/2020/CHN BetaCoV/Finland/1/2020 2019-nCoV/USA-IL1/2020 BetaCoV/Korea/SNU01/2020 SARS-CoV-2/01/human/2020/SWE 2019-nCoV/USA-MA1/2020 SARS-CoV-2/WA2/human/2020/USA SARS-CoV-2/IL2/human/2020/USA 2019-nCoV/USA-CA3/2020 2019-nCoV/USA-CA2/2020 BetaCoV/Australia/VIC01/2020 2019-nCoV/USA-TX1/2020 SARS-CoV-2/IQTC03/human/2020/CHN SARS-CoV-2/IQTC02/human/2020/CHN BetaCoV/Wuhan/IPBCAMS-WH-01/2019 WIV07 2019-nCoV/USA-CA5/2020 2019-nCoV/USA-CA1/2020 BetaCoV/Wuhan/WH-01/2019 WIV05 2019-nCoV/Japan/KY/V-029/2020 BetaCoV/Japan/AI/I-004/2020 BetaCov/Taiwan/NTU02/2020 BetaCoV/Wuhan/IPBCAMS-WH-03/2019 2019-nCoV/USA-CA9/2020 BetaCoV/Wuhan/IPBCAMS-WH-05/2020 BetaCoV/Wuhan/WH19004/2020 WIV02 SARS-CoV-2/IQTC01/human/2020/CHN BetaCoV/Wuhan/WH-02/2019 BetaCoV/Wuhan/WH19002/2019\ shortLabel Multiz Align\ snpTable mafSnpStrainName119way\ speciesCodonDefault wuhCor1\ speciesDefaultOff 2019-nCoV_WHU01 WIV04 BetaCoV/Wuhan/IPBCAMS-WH-04/2019 BetaCoV/Wuhan/WH-03/2019 WIV06 BetaCoV/Hangzhou/HZ-1/2020_20cov-1L BetaCoV/Wuhan/YS8011/2020 BetaCoV/Wuhan/IPBCAMS-WH-05/2020 2019-nCoV/USA-CA8/2020 2019-nCoV/USA-CA9/2020 BetaCoV/Wuhan/WH19008/2019 BetaCoV/Wuhan/IPBCAMS-WH-03/2019 BetaCoV/Wuhan/WH19005/2019 BetaCoV/Wuhan/IPBCAMS-WH-02/2019 SARS0CoV-2/61-TW/human/2020/NPL SARS-CoV-2/WH-09/human/2020/CHN 2019-nCoV/USA-WI1/2020 2019-nCoV/USA-CA5/2020 2019-nCoV/USA-CA2/2020 BetaCoV/Wuhan/WH19004/2020 BetaCov/Taiwan/NTU02/2020 SARS-CoV-2/Hu/DP/Kng/19-020 SARS-CoV-2/Hu/DP/Kng/19-027 WIV07 WIV05 SARS-CoV-2/CA6/human/2020/USA BetaCoV/Wuhan/WH-01/2019 WIV02 2019-nCoV/USA-CA3/2020 SARS-CoV-2/IQTC01/human/2020/CHN BetaCoV/Japan/AI/I-004/2020 2019-nCoV/Japan/KY/V-029/2020 BetaCoV/Wuhan/IPBCAMS-WH-01/2019 SARS-CoV-2/SP02/human/2020/BRA BetaCoV/Australia/VIC01/2020 SARS-CoV-2/Yunnan-01/human/2020/CHN SARS-CoV-2/IQTC04/human/2020/CHN BetaCoV/Finland/1/2020 2019-nCoV/USA-CA7/2020 SARS-CoV-2/105/human/2020/CHN SARS-CoV-2/233/human/2020/CHN 2019-nCoV_HKU-SZ-005b_2020 SARS-CoV-2/WA2/human/2020/USA 2019-nCoV/USA-MA1/2020 2019-nCoV/Japan/TY/WK-521/2020 2019-nCoV/Japan/TY/WK-012/2020 2019-nCoV/Japan/TY/WK-501/2020 2019-nCoV_HKU-SZ-002a_2020 2019-nCoV/USA-WA1/2020 2019-nCoV/USA-AZ1/2020 2019-nCoV/USA-TX1/2020 SARS-CoV-2/IQTC02/human/2020/CHN SARS-CoV-2/IQTC03/human/2020/CHN BetaCoV/Wuhan/WH-04/2019 Taiwan/NTU01/2020 2019-nCoV/USA-CA1/2020 SARS-CoV-2/IL2/human/2020/USA SARS-CoV-2/01/human/2020/SWE 2019-nCoV/USA-IL1/2020 BetaCoV/Korea/SNU01/2020 BetaCoV/Wuhan/WH-02/2019 BetaCoV/Wuhan/WH19002/2019 Bat_CoV_TG13\ speciesDefaultOn Bat_CoV_TG13 Pangolin-CoV-2020_MP789 Bat_CoV_BM48-31/BGR/2008 Wencheng_Sm_shrew_CoV_Xingguo-101 Bat_CoV_HKU4-1 Beluga_Whale_CoV_SW1 Wigeon_CoV_HKU20 Avian_infectious_bronchitis_virus BetaCoV_HKU24_strain_HKU24-R05005I Turkey_CoV BetaCoV_ErinaceusCoV/VMC/2012-174/GER/2012 Bat_Hp-betaCoV/Zhejiang2013 BetaCoV_England_1 MERS_Middle_East_respiratory_syndrome_CoV Rabbit_CoV_HKU14 Night-heron_CoV_HKU19 Bovine_CoV Bat_CoV_PREDICT/PDF-2180 Bat_CoV_HKU5-1 Human_CoV_OC43_strain_ATCC_VR-759 Common-moorhen_CoV_HKU21 Bulbul_CoV_HKU11-934 Human_CoV_HKU1 Bat_CoV_HKU9-1 Rat_CoV_Parker BtMr-AlphaCoV/SAX2011 Mouse_hepatitis_virus_strain_MHV-A59_C12_mutant Magpie-robin_CoV_HKU18 Rousettus_bat_CoV_HKU10 Rousettus_bat_CoV_GCCDC1_356 Sparrow_CoV_HKU17 Porcine_CoV_HKU15_strain_HKU15-155 Munia_CoV_HKU13-3514 Human_CoV_229E White-eye_CoV_HKU16 Coronavirus_AcCoV-JC34 Camel_alphaCoV_camel/Riyadh/Ry141/2015 Human_Coronavirus_NL63 Mink_CoV_strain_WD1127 Ferret_CoV_FRCoV-NL-2010 Porcine_epidemic_diarrhea_virus Scotophilus_bat_CoV_512 NL63-related_bat_CoV_strain_BtKYNL63-9a Bat_CoV_HKU8 Bat_CoV_CDPHE15/USA/2006 Thrush_CoV_HKU12-600 BtRf-AlphaCoV/HuB2013 Lucheng_Rn_rat_CoV_Lucheng-19 Bat_CoV_HKU2 BtNv-AlphaCoV/SC2013 Swine_enteric_CoV_strain_Italy/213306/2009 Transmissible_gastroenteritis_virus Bat_CoV_1A Feline_infectious_peritonitis_virus BtRf-AlphaCoV/YN2012 SARS_CoV\ speciesGroups SARS-CoV-2_sequences Other_CoV_sequences_from_vertebrate_hosts\ subGroups view=align\ track strainName119way\ treeImage phylo/wuhCor1_119way.png\ type wigMaf 0.0 1.0\ nextstrainFreq Nextstrain Frequency bigWig Nextstrain Mutations Alternate Allele Frequency 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ Nextstrain.org displays\ data about mutations that have occurred in the current 2019/2020 outbreak of SARS-CoV-2.\ Nextstrain has a powerful user interface for viewing the time stamped phylogenetic tree\ that it infers from the patterns of mutations in sequences worldwide.\ Nextstrain maintains an ongoing pipeline that continuously obtains SARS-CoV-2 genome sequences\ and metadata from GISAID,\ aligns them against the reference genome\ (NC_045512.2),\ and infers a phylogenetic tree.\

\

\ This track shows the alternate allele frequency of each mutation reported by Nextstrain as a\ bar graph with the height indicating the frequency.\ (The Nextstrain Mutations track offers a more detailed display of the mutations,\ breaking up the vertical bars according to the order of virus genome samples in the\ phylogenetic tree.)\

\ \

Methods

\

Nextstrain downloads SARS-CoV-2 genomes from\ GISAID\ as they are submitted by labs worldwide.\ The sequences are processed by an\ automated pipeline\ and annotations are written to a data file\ that UCSC downloads and extracts annotations for display.

\ \

Data Access

\

\ You can download the bigBed file underlying this track (nextstrainSamples*.bigWig) from our\ Download Server. The data can be explored interactively with the\ Table Browser\ or the Data Integrator. The data can be\ accessed from scripts through our API.\

\
http://api.genome.ucsc.edu/getData/track?genome=wuhCor1;track=nextstrainFreqB4;chrom=NC_045512v2;start=100;end=7875\
\ Command-line extraction can be accomplished using an example like the following command:\
\
bigWigToWig -udcDir=. -chrom=NC_045512v2 -start=100 end=7875 http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/nextstrain/nextstrainSamples.bigWig myOutput.wig\

\

\ Please refer to our mailing list archives for questions, or our\ Data Access FAQ for more\ information.

\ \

Data usage policy

\ The data presented here is intended to rapidly disseminate analysis of\ important pathogens. Unpublished data is included with permission of the data\ generators, and does not impact their right to publish. Please contact the\ respective authors (available via the\ Nextstrain metadata.tsv file)\ if you intend to carry out further research using their data.\ Derived data, such as phylogenies, can be downloaded from\ nextstrain.org\ (see "DOWNLOAD DATA" link at bottom of page) -\ please contact the relevant authors where appropriate.\ \

Credits

\

Thanks to\ nextstrain.org for\ sharing its analysis of genomes collected by\ GISAID EpiCoV TM,\ and to researchers worldwide for sharing their SARS-CoV-2 genome sequences.\

\ \

References

\

\ Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, Sagulenko P, Bedford T, Neher\ RA.\ \ Nextstrain: real-time tracking of pathogen evolution.\ Bioinformatics. 2018 Dec 1;34(23):4121-4123.\ PMID: 29790939; PMC: PMC6247931\

\

\ Sagulenko P, Puller V, Neher RA.\ \ TreeTime: Maximum-likelihood phylodynamic analysis.\ Virus Evol. 2018 Jan;4(1):vex042.\ PMID: 29340210; PMC: PMC5758920\

\ varRep 0 autoScale off\ compositeTrack on\ group varRep\ longLabel Nextstrain Mutations Alternate Allele Frequency\ maxHeightPixels 100:40:8\ pennantIcon Updated red ../goldenPath/newsarch.html#071720 "Now updated daily"\ shortLabel Nextstrain Frequency\ subGroup1 view Views all=All_Samples newClades=Year-Letter_Clades\ track nextstrainFreq\ type bigWig\ viewLimits 0:1.0\ viewLimitsMax 0:1.0\ visibility hide\ nextstrainGene Nextstrain Genes bigBed 4 Genes annotated by nextstrain.org/ncov 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ Nextstrain.org displays\ data about mutations that occur in the current 2019/2020 outbreak.\ Nextstrain has a powerful user interface for viewing the evolutionary tree\ that it infers from the patterns of mutations in sequences worldwide, but\ does not offer a detailed plot of mutations along the genome, so we have processed\ their genome annotations into genome browser tracks.\

\

\ Use this track in conjunction with the track\ Nextstrain Mutations (protein-coding mutations use gene names from this track).\

\ \

Methods

\

Nextstrain downloads SARS-CoV-2 genomes from\ GISAID\ as they are submitted by labs worldwide.\ The sequences are processed by an\ automated pipeline\ and annotations are written to a data file\ that UCSC downloads and extracts annotations for display.

\ \

Data Access

\

\ You can download the bigBed file underlying this track (nextstrainGene) from our \ Download Server. The data can be explored interactively with \ the Table Browser\ or the Data Integrator. The data can be\ accessed from scripts through our API.

\ \

Credits

\

Thanks to\ nextstrain.org for\ sharing its analysis of genomes collected by\ GISAID,\ and to researchers worldwide for sharing their SARS-CoV-2 genome sequences.\

\ \

References

\

\ Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, Sagulenko P, Bedford T, Neher\ RA.\ \ Nextstrain: real-time tracking of pathogen evolution.\ Bioinformatics. 2018 Dec 1;34(23):4121-4123.\ PMID: 29790939; PMC: PMC6247931\

\ genes 1 bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainGene.bb\ group genes\ longLabel Genes annotated by nextstrain.org/ncov\ shortLabel Nextstrain Genes\ track nextstrainGene\ type bigBed 4\ visibility hide\ nextstrainParsimony Nextstrain Parsimony bigWig Parsimony Scores for Nextstrain Mutations and Phylogenetic Tree 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ Nextstrain.org displays\ data about single nucleotide mutation alleles in the SARS-CoV-2 RNA and protein sequences that have\ occurred in different samples of the virus during the current 2019/2020 outbreak.\ Nextstrain has a powerful user interface for viewing the time stamped phylogenetic tree\ that it infers from the patterns of mutations in sequences worldwide.\ Nextstrain maintains an ongoing pipeline that continuously obtains SARS-CoV-2 genome sequences\ and metadata from\ GISAID,\ aligns them against the reference genome\ (NC_045512.2),\ and infers a phylogenetic tree.\

\

\ A parsimony score\ can be computed for each mutation as the minimum number of nucleotide changes along branches\ of the tree that would lead to the observed sample genotypes at the leaves of the tree.\ For example, if there is a branch for which all leaves have a mutation, and no other leaves of\ the tree have the mutation, then the mutation presumably occurred once on that branch and the\ parsimony score would be one. However, when a mutation appears on leaves belonging to several\ branches whose other leaves do not have the mutation, then the mutation would need to occur\ on multiple branches in the tree, increasing the parsimony score. Mutations with a parsimony\ score that is relatively high, especially when compared to alternate allele count (the number \ of samples/leaves with the mutation), may be of interest when identifying systematic errors\ and/or sites of recurrent mutations.\

\

\ This track shows the parsimony score of each single-nucleotide substitution reported by Nextstrain\ as a bar graph\ with the height indicating the score.\ (The Nextstrain Mutations track\ displays the phylogenetic tree and sample genotypes\ from which the parsimony scores were generated.\

\ \ \

Methods

\

Nextstrain downloads SARS-CoV-2 genomes from\ GISAID EpiCoV TM\ as they are submitted by labs worldwide.\ The sequences are processed by an\ automated pipeline\ and annotations are written to a data file\ that UCSC downloads and extracts annotations for display.\ UCSC computes parsimony scores using the phylogenetic tree and mutations extracted\ from Nextstrain.

\ \

Data Access

\

\ You can download the bigWig file underlying this track (nextstrainParsimony.bw) from our\ Download Server. The data can be explored interactively with the \ Table Browser\ or the Data Integrator. The data can be\ accessed from scripts through our API.\

\

\ Nextstrain.org\ offers phylogenetic trees and metadata files:\ scroll to the bottom of the page and click "DOWNLOAD DATA",\ and a dialog with download options appears.\

\ \

Credits

\

This work is made possible by the open sharing of genetic data by research\ groups from all over the world. We gratefully acknowledge their contributions.\ Special thanks to\ nextstrain.org for\ sharing its analysis of genomes collected by\ GISAID.\

\ \

References

\

\ Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, Sagulenko P, Bedford T, Neher\ RA.\ \ Nextstrain: real-time tracking of pathogen evolution.\ Bioinformatics. 2018 Dec 1;34(23):4121-4123.\ PMID: 29790939; PMC: PMC6247931\

\ varRep 0 autoScale off\ bigDataUrl /gbdb/wuhCor1/nextstrain/nextstrainParsimony.bw\ group varRep\ longLabel Parsimony Scores for Nextstrain Mutations and Phylogenetic Tree\ maxHeightPixels 100:40:8\ pennantIcon Updated red ../goldenPath/newsarch.html#071720 "Now updated daily"\ shortLabel Nextstrain Parsimony\ track nextstrainParsimony\ type bigWig\ viewLimits 0:60\ visibility hide\ phyloGenes PhyloCSF Genes bigGenePred PhyloCSF Genes - Curated conserved genes 0 100 0 0 0 127 127 127 0 0 0

Description

\

These tracks show curated SARS-CoV-2 protein-coding genes conserved within the Sarbecovirus subgenus as determined using PhyloCSF [1], FRESCo [2], and other comparative genomics methods, consistent with experimental evidence in SARS-CoV-2. Ambiguous gene names were resolved according to the recommendations in [3]. For a complete description of the evidence, see [4].

For a complete description of the evidence, see [4].

\
    \
  • \ The PhyloCSF Genes track shows the conserved protein-coding genes, namely ORF1a, ORF1ab, S, ORF3a, ORF3c, E, M, ORF6, ORF7a, ORF7b, ORF8, N, and ORF9b.\

    Notes:

    \
      \
    • \ ORF3c is a 41 codon ORF overlapping ORF3a in a different frame with coordinates 25457-25582; it has also been referred to as ORF3h, ORF3a*, and 3a.iORF1.\
    • \
    • \ ORF9b is a 97 codon ORF overlapping N in a different frame with coordinates 28284-28577; it has also been referred to as ORF9a.\
    • \
    \
  • \ \
  • \ The PhyloCSF Rejected Genes track shows other gene candidates that have been proposed that do not show the signature of conserved protein-coding genes or persuasive experimental evidence of function [4], and are thus unlikely to be actual protein-coding genes, namely ORF2b, ORF3d, ORF3d-2, ORF3b, ORF9c, and ORF10.\

    Notes:

    \
      \
    • \ ORF2b is a 39 codon ORF with coordinates 21744-21860 overlapping the spike protein in a different frame; it has also been referred to as S.iORF1.\
    • \
    • \ ORF3d is a 57 codon ORF with coordinates 25524-25697 overlapping ORF3a in a different frame; it has also been referred to as ORF3b.\
    • \
    • \ ORF3d-2 is a 33 codon ORF with coordinates 25596-25697 that is a subset of ORF3d starting at a downstream in-frame AUG codon; it has also been referred to as 3a.iORF2.\
    • \
    • \ ORF3b is the 22 codon ortholog of the 5' end of SARS-CoV ORF3b with coordinates 25814-25882, ending at an in-frame stop codon that is not present in SARS-CoV.\
    • \
    • \ ORF9c is a 73 codon ORF overlapping N in a different frame with coordinates 28734-28955; it has also been referred to as ORF9b and ORF14.\
    • \
    \
  • \
\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/phyloGenes/PhyloCSFgenes.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \ \

Methods

\

See [4]. Note that the data was updated in June 2021: ORF14 was renamed to ORF9c, ORF2b and ORF3d-2 were added.

\ \

Credits

\

Questions should be directed to Irwin Jungreis.

\

If you use the SARS-CoV-2 PhyloCSF Genes Track Hub, please cite Jungreis et al. 2021 [4].

\

References

\ \

[1] Lin MF, Jungreis I, and Kellis M (2011). PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions. Bioinformatics 27:i275-i282 (ISMB/ECCB 2011).

\ \

[2] Sealfon RS, Lin MF, Jungreis I, Wolf MY, Kellis M, Sabeti PC (2015). FRESCo: finding regions of excess synonymous constraint in diverse viruses. Genome Biol. doi: 10.1186/s13059-015-0603-7.

\ \

[3] Jungreis, I., Nelson, C. W., Ardern, Z., Finkel, Y., Krogan, N. J., Sato, K., ... & Kellis, M. (2021). Conflicting and ambiguous names of overlapping ORFs in the SARS-CoV-2 genome: A homology-based resolution. Virology 558, 145-151. doi.org/10.1016/j.virol.2021.02.013

\ \

[4] Jungreis I, Sealfon R, Kellis M (2021). SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes. Nature Communications 12(1), 1-20. doi:10.1038/s41467-021-22905-7

\ genes 1 baseColorDefault genomicCodons\ compositeTrack on\ exonNumbers on\ group genes\ itemRgb on\ longLabel PhyloCSF Genes - Curated conserved genes\ noScoreFilter on\ shortLabel PhyloCSF Genes\ track phyloGenes\ type bigGenePred\ poranHla1 Poran HLA I bigBed 9 + RECON HLA-I epitopes 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows putative epitopes for CD4+ and CD8+ T cells whose HLA binding \ properties cover over 99% for US, European, and Asian populations, for both HLA-I and HLA-II.\ The track includes 11,776 CD8 epitopes restricted to HLA-I as predicted by RECON. All the epitopes \ are scored using a combined coverage score reported for USA, EUR, and API. Specifically, \ score = (USA_coverage+EUR_coverage+API_coverage)*1000/3.

\ For more details, see \ here.\

\ \

Display Conventions and Configuration

\

\ Genomic locations of epitopes are labeled with a unique ID. Mousing over an item shows\ the protein name and restrictions to HLA-I. A click on an item shows a standard feature \ detail page with the both the ID and the mouse over information.

\ \

Methods

\

For a full description of the methods used, \ refer here.

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\ \

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/poranHla/CD8-hla1.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

Asaf Poran, Dewi Harjanto, Matthew Malloy, Michael S. Rooney, Lakshmi Srinivasan, \ Richard B. Gaynor. Sequence-based \ prediction of vaccine targets for inducing T cell responses to SARS-CoV-2 \ utilizing the bioinformatics predictor RECON. bioRxiv 2020.04.06.027805

\ immu 1 bigDataUrl /gbdb/wuhCor1/bbi/poranHla/CD8-hla1.bb\ group immu\ html poranHla1\ longLabel RECON HLA-I epitopes\ mouseOverField name2\ noScoreFilter on\ shortLabel Poran HLA I\ track poranHla1\ type bigBed 9 +\ visibility hide\ poranHla2 Poran HLA II bigBed 9 + RECON HLA-II epitopes 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows putative epitopes for CD4+ and CD8+ T cells whose HLA binding \ properties cover over 99% for US, European, and Asian populations, for both HLA-I and HLA-II.\ The track includes 11,776 CD8 epitopes restricted to HLA-I as predicted by RECON. All the epitopes \ are scored using a combined coverage score reported for USA, EUR, and API. Specifically, \ score = (USA_coverage+EUR_coverage+API_coverage)*1000/3.

\ For more details, see \ here.\

\ \

Display Conventions and Configuration

\

\ Genomic locations of epitopes are labeled with a unique ID. Mousing over an item shows\ the protein name and restrictions to HLA-I. A click on an item shows a standard feature \ detail page with the both the ID and the mouse over information.

\ \

Methods

\

For a full description of the methods used, \ refer here.

\ \

Data Access

\

\ The raw data can be explored interactively with the\ Table Browser, or combined with other datasets in the\ Data Integrator tool.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server.

\ \

\ Annotations can\ be converted from binary to ASCII text by our command-line tool bigBedToBed.\ Instructions for downloading this command can be found on our\ utilities page.\ The tool can also be used to obtain features within a given range without downloading the file,\ for example:

\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/poranHla/CD8-hla1.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout\ \

\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information.

\ \

References

\

Asaf Poran, Dewi Harjanto, Matthew Malloy, Michael S. Rooney, Lakshmi Srinivasan, \ Richard B. Gaynor. Sequence-based \ prediction of vaccine targets for inducing T cell responses to SARS-CoV-2 \ utilizing the bioinformatics predictor RECON. bioRxiv 2020.04.06.027805

\ immu 1 bigDataUrl /gbdb/wuhCor1/bbi/poranHla/CD8-hla2.bb\ group immu\ html poranHla1\ longLabel RECON HLA-II epitopes\ mouseOverField name2\ noScoreFilter on\ shortLabel Poran HLA II\ track poranHla2\ type bigBed 9 +\ visibility hide\ potPathoIndel Pot. pathogenic indels bigBed 4 Potential pathogenic insertions and deletions from Gussow et al, PNAS 2020 0 100 0 0 0 127 127 127 0 0 0

Description

\

This track shows genomic features that differentiate SARS-CoV-2 and the viruses behind the two previous\ deadly coronavirus outbreaks, SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV),\ from less pathogenic coronaviruses. These features include:

\
    \
  • \ Enhancement of the nuclear localization signals in the nucleocapsid protein
  • \
  • Inserts in the spike glycoprotein that appear to be associated with the high case fatality rate\ of these coronaviruses
  • \
  • \ Inserts in the spike glycoprotein that appear to be associated with the host switch from animals to humans\
\ \

Methods

\

\ We searched for specific regions, within an alignment of 944 human coronaviruses, that contributed the\ most to the separation between coronaviruses with a high case fatality rate and those with a low case fatality\ rate, using a combination of comparative genomics and machine learning techniques. For the zoonotic insertions,\ we scanned an alignment of human and nonhuman coronaviruses to find regions in which over 50% of the strains in\ the alignment differed from the human strain, and for which the differing strains were explicitly the most distant\ from human. We identified only one such location, across all three high case fatality rate virus groups.\

\ \

References

\

Ayal B. Gussow*, Noam Auslander*, Guilhem Faure, Yuri I. Wolf, Feng Zhang, Eugene V. Koonin.\ \ Genomic determinants of pathogenicity in SARS-CoV-2 and other human coronaviruses.\ \ Proceedings of the National Academy of Sciences Jun 2020, 117 (26) 15193-15199; DOI: 10.1073/pnas.2008176117.

\ varRep 1 bigDataUrl /gbdb/wuhCor1/bbi/potPathoIndel.bb\ group varRep\ longLabel Potential pathogenic insertions and deletions from Gussow et al, PNAS 2020\ shortLabel Pot. pathogenic indels\ track potPathoIndel\ type bigBed 4\ resistPred Predicted Resistance bigBed 9 + Mutations that may confer drug resistance - from Coronavirus3d.org (label: count of GISAID sequences with mutation, Feb 2022) 0 100 0 0 0 127 127 127 0 0 0

Description

\

This track lists amino acid positions that are close to \ certain Covid drug target protein binding sites; potentially relevant \ for viral drug resistance. The protein products are the S, NSP3, and NSP5 proteins. The data is \ derived from 3D Protein Data Bank (PDB) X-ray crystal structures and was imported from \ \ Coronavirus3d.org, a project funded by NIAID, NIGMS and UCR. For more information, please \ visit that site and look through the 9 protein-drug structures listed.\

\ \

Display Conventions

\ The display shows amino acids within 5 Angstroms of an inhibitor binding site on the\ S, NSP3, and NSP5 proteins. The label of the annotations \ is the number of GISAID sequences with at least one mutation\ at this position as of March 3, 2022. Click or mouse-over \ the mutations to show the full list of nucleotide changes and the \ frequency of each.\ \

Data Access

\

\ You can see the original data\ as tables on Coronavirus3d.org.\ The data can be explored interactively at the Genome Browser with the\ Table Browser or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in\ a bigBed file that can be downloaded from\ the download server\ or the API.\

\

\ Please refer to our\ \ mailing list archives for questions, or our\ Data Access FAQ\ for more information.

\ \

Credits

\

\ Thanks to Anna Niewiadomska from \ the BV-BRC team at the J. Craig Venter Institute for finding the dataset and testing\ the track. Thanks to the Godzik Lab for making available the data in .tsv format\ on the coronavirus3d.org website.

\ \ varRep 1 bedNameLabel Mutation\ bigDataUrl /gbdb/wuhCor1/resistPred/resistPred.bb\ group varRep\ itemRgb on\ longLabel Mutations that may confer drug resistance - from Coronavirus3d.org (label: count of GISAID sequences with mutation, Feb 2022)\ mouseOver $changeFreq\ shortLabel Predicted Resistance\ track resistPred\ type bigBed 9 +\ problematicSites Problematic Sites bigBed Problematic sites where masking or caution are recommended for analysis 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ Attempts to infer phylogenetic relationships, sites under selection, or evidence of recombination\ from SARS-CoV-2 genome sequences can be led astray by sequencing errors, contamination, and\ hypermutable sites. In order to make reliable inferences, it is important to identify probable\ errors and susceptible sites within the genome sequences, carefully consider how those might\ affect the specific analysis one is about to perform, and perhaps exclude problematic sites from\ analysis.\

\

\ This track shows locations in the SARS-CoV-2 genome that have been identified as problematic for\ analysis for various reasons. They have been collected in the github repository\ https://github.com/W-L/ProblematicSites_SARS-CoV2/.\ Locations have been separated into two subtracks and colored corresponding to levels of severity:\

    \
  • Mask: Problems are expected to affect most types\ of analysis, so it is recommended to mask out these sites before analysis.\
  • Caution: Some types of analysis may be\ affected while other types may not; caution is recommended.\
\

\

\ Locations are labeled with the following terms to indicate the type of potential problem:\

    \
  • ambiguous: Sites which show an excess of ambiguous basecalls relative to the \ number of alternative alleles, often emerging from a single country or sequencing laboratory\
  • amended: Previous sequencing errors which now appear to have been fixed in the\ latest versions of the GISAID sequences, at least in sequences from some of the sequencing\ laboratories\
  • highly_ambiguous: Sites with a very high proportion of ambiguous characters,\ relative to the number of alternative alleles\
  • highly_homoplasic: Positions which are extremely homoplasic - it is sometimes not\ necessarily clear if these are hypermutable sites or sequencing artefacts\
  • homoplasic: Homoplasic sites, with many mutation events needed to explain a\ relatively small alternative allele count\
  • interspecific_contamination: Cases (only one instance as of July 2020) in which\ the known sequencing issue is due to contamination from genetic material that does not have\ SARS-CoV-2 origin\
  • nanopore_adapter: Cases in which the known sequencing issue is due to the adapter\ sequences in nanopore reads\
  • narrow_src: Mutations which are found in sequences from only a few sequencing labs \ (usually two or three), possibly as a consequence of the same artefact reproduced independently\
  • neighbour_linked: Proximal mutations displaying near perfect linkage\
  • seq_end: Alignment ends are affected by low coverage and high error rates\ (masking recommended, but might be more stringent than necessary)\
  • single_src: Only observed in samples from a single laboratory\
\ \

\ \

Methods

\

\ Multiple groups applied various methods (De Maio, Walker et al.;\ De Maio, Gozashti et al.; Turakhia et al.) to identify sites that\ were homoplasic, likely contaminated, likely sequencing error and/or observed in multiple\ virus lineages by only one or a few laboratories. They contributed their observations\ and recommendations to the github repository\ https://github.com/W-L/ProblematicSites_SARS-CoV2/.\ UCSC downloaded the collection, split the sites into Mask and Caution subsets depending\ on the recommended action and reformatted the data for display in the Genome Browser.\

\ \

Data Access

\

\ The original data file was downloaded from github:\ https://raw.githubusercontent.com/W-L/ProblematicSites_SARS-CoV2/master/problematic_sites_sarsCov2.vcf.\ You can download the bigBed files underlying this track (problematicSites*.bb) from our\ Download Server. The data can be explored interactively with the \ Table Browser\ or the Data Integrator. The data can be\ accessed from scripts through our API.\

\ \ \

References

\ \

\ De Maio N, Walker C, Borges R, Weilguny L, Slodkowicz G, Goldman N.\ Issues with SARS-CoV-2 sequencing data.\ virological.org. 2020 May 5.\

\ \

\ De Maio N, Gozashti L, Turakhia Y, Walker C, Lanfear R, Corbett-Detig R, Goldman N.\ Updated analysis with data from 12th June 2020.\ virological.org. 2020 July 14.\

\ \

\ Turakhia Y, Thornlow B, Gozashti L, Hinrichs AS, Fernandes JD, Haussler D, and Corbett-Detig R.\ Stability of SARS-CoV-2 Phylogenies.\ bioRxiv. 2020 June 9.\

\ map 1 compositeTrack on\ dataVersion v8 (2021-10-27)\ group map\ longLabel Problematic sites where masking or caution are recommended for analysis\ shortLabel Problematic Sites\ skipEmptyFields on\ track problematicSites\ type bigBed\ unipProtease Protease Cleavage bigBed Protease Cleavage Sites 0 100 50 50 20 152 152 137 0 0 0

Description

\

\

This track shows annotated protease sites for human and viral proteases.\

\

Display Conventions and Configuration

\

\

Each track highlights two amino acids (not the full recognition motif). The protease cut happens in the middle of the annotation.\ \ \

Methods

\

These sites are directly taken from the Uniprot "Other Annotations" track and reprocessed to be more human-readable on the browser. "Other Annotations" marked as "site" were labeled with the\ protease that recognizes them.\

References

\

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

uniprot 1 bigDataUrl /gbdb/wuhCor1/uniprot/protease_sitesCoV2.bb\ color 50,50,20\ dataVersion Manually created based on April 2020 UniProt release\ exonNumbers off\ group uniprot\ longLabel Protease Cleavage Sites\ shortLabel Protease Cleavage\ track unipProtease\ type bigBed\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ rnaStructRangan Rangan RNA bed 6 + Rangan et al. RNA predictions 0 100 20 90 0 137 172 127 0 0 0

Description

\

\ This track shows the locations of RNA structure predictions reported in\ \ Rangan et al. bioRxiv 2020\ as well as recognizable matches to Rfam annotations found at \ \ https://rfam.xfam.org/search?q=coronavirus.

\ \

Display Conventions and Configuration

\

\ At zoomed out positions, RNA structure is indicated by dark (stem) and light (loops \ and single strands) green color bars. Zooming in to base level displays the prediction \ in \ "extended dot-bracket" notation.

\ \

Credits

\

\ This track was created by UCSC undergraduate students Justin Sim and Alinne Gonzalez Armenta \ with the indispensable assistance of Brian Raney.

\ \

References

\

\ Rangan Ramya, Zheludev Ivan N., Das Rhiju, 2020.\ \ RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses.\ bioRxiv

\ \ rna 1 color 20,90,0\ exonArrows off\ group rna\ longLabel Rangan et al. RNA predictions\ noScoreFilter .\ shortLabel Rangan RNA\ track rnaStructRangan\ type bed 6 +\ visibility hide\ recomb Recomb. Breakpoints bigBed 6 + Recombination Breakpoints from Thurakia et al 2022 1 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows recombination breakpoints inferred by the RIPPLES software from a phylogenetic tree of 1.6 million SARS-CoV-2 sequences, described by Thurakia et al, Nature 2022.\

\ \

The track is in "density" mode by default, it shows the density of recombinated\ sequences per nucleotide. By deactivating the "Density plot" checkbox on the configuration\ page, all recombinations can be shown.\

\ \ \

Methods

\ From Thurakia et al, Nature 2022: "We developed a new method for detecting recombination in pandemic-scale\ phylogenies, Recombination Inference using Phylogenetic PLacEmentS, RIPPLES. Because recombination\ violates the central assumption of many phylogenetic methods, that is, that a single evolutionary\ history is shared across the genome, recombinant lineages arising from diverse genomes will often\ be found on 'long branches', which result from accommodating the divergent evolutionary histories\ of the two parental haplotypes. Note that as long as recombination is relatively uncommon,\ phylogenetic inference is expected to remain accurate even when branch lengths are artifactually\ expanded. RIPPLES exploits that signal by first identifying long branches on a comprehensive\ SARS-CoV-2 mutation-annotated tree. RIPPLES then exhaustively breaks the potential recombinant\ sequence into distinct segments and replaces each onto a global phylogeny using maximum parsimony.\ RIPPLES reports the two parental nodes-hereafter termed donor and acceptor-that result in the\ highest parsimony score improvement relative to the original placement on the global phylogeny. Our\ approach therefore leverages phylogenetic signals for each parental lineage and the spatial\ correlation of markers along the genome. We establish significance using a null model conditioned\ on the inferred site-specific rates of de novo mutation."\ \

Data Access

\

\ You can download the bigBed file underlying this track (primers) from our\ Download Server.\ The data can be explored interactively with the Table Browser\ or the Data Integrator. The data can also be\ accessed from scripts through our API.

\ \

Credits

\

\ Thanks to Bryan Thornlow for sharing the data.\

References

\

\ Turakhia Y, Thornlow B, Hinrichs A, McBroome J, Ayala N, Ye C, Smith K, De Maio N, Haussler D,\ Lanfear R et al.\ \ Pandemic-scale phylogenomics reveals the SARS-CoV-2 recombination landscape.\ Nature. 2022 Sep;609(7929):994-997.\ PMID: 35952714; PMC: PMC9519458\

\ \ \ varRep 1 bigDataUrl /gbdb/wuhCor1/recomb/ripples_breakpoints.bb\ doWiggle on\ group varRep\ longLabel Recombination Breakpoints from Thurakia et al 2022\ maxHeightPixels 100:40:8\ priority 100\ shortLabel Recomb. Breakpoints\ track recomb\ type bigBed 6 +\ visibility dense\ ucscToRefSeq RefSeq Acc bed 4 RefSeq Accession 0 100 0 0 0 127 127 127 0 0 0 https://www.ncbi.nlm.nih.gov/nuccore/$$

Description

\

\ This track associates UCSC Genome Browser chromosome names to accession\ identifiers from the NCBI Reference Sequence Database (RefSeq).\

\ \

\ The data were downloaded from the NCBI assembly database.\

\ \

Credits

\

The data for this track was prepared by\ Hiram Clawson.\ map 1 group map\ longLabel RefSeq Accession\ shortLabel RefSeq Acc\ track ucscToRefSeq\ type bed 4\ url https://www.ncbi.nlm.nih.gov/nuccore/$$\ urlLabel RefSeq accession:\ visibility hide\ rfam Rfam bed 5 Rfam families 0 100 98 23 0 176 139 127 0 0 0 https://rfam.org/family/$$

Rfam families

\

\ The SARS-CoV-2 genome was annotated using the structured RNA families from the Rfam database including the following Coronavirus-specific families:

\ \ \

Methods

\

\ The annotations were generated using the \ Infernal cmsearch program and the Rfam covariance \ models (release 14.2). The cmsearch output was manually edited to remove the lower-scoring \ Betacoronavirus-5UTR and Betacoronavirus-3UTR families that belong to the same Rfam clans \ as the Sarbecovirus-5UTR and Sarbecovirus-3UTR families (CL00116 and CL00117).

\ The alignments and the secondary structure were produced using LocARNA and refined based \ on the latest literature.

\ \

Credits

\

\ The curated Sarbecovirus alignments were provided by Kevin Lamkiewicz \ and Manja Marz \ (Friedrich Schiller University Jena). Eric Nawrocki \ (NCBI) revised the existing Rfam entries (RF00164, RF00165, and RF00507). We also thank \ \ Ramakanth Madhugiri (Justus Liebig University Giessen) for reviewing the Coronavirus \ UTR alignments. The track was prepared by the Rfam team.\

\ This work is part of the BBSRC \ funded project to expand the coverage of viral RNAs in Rfam.

\ \

References

\ \ rna 1 color 98,23,0\ group rna\ longLabel Rfam families\ shortLabel Rfam\ track rfam\ type bed 5\ url https://rfam.org/family/$$\ visibility hide\ primers RT-PCR Primers psl RT-PCR Detection Kit Primer Sets 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the locations of those primers in detection kits that match the\ reference sequence. The primers were copied from \ \ a spreadsheet created by the project\ \ OpenCovid19.\ The initial version of the track used the FASTA file from Design Flaws in COVID-19 Primers from Multiple International Labs.\

\ \

\ Most are RT-qPCR primer sets, sequencing primers have the prefix Seq1- or Seq2-. RT-qPCR sets consists of one forward, one reverse and one internal probe, as indicated by the names.

\ \

\ As expected, the three control primers were not found at all: US-CDC-Control_RP-P, US-CDC-Control_RP-R, US-CDC-Control_RP-F.\

\ \

Here is a quick overview of the origin of the primers, please see the website and spreadsheet linked above for more details:

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
\ Prefix\ \ Institution\ \ Overview\ \ Manual\
CN-CDC-China CDC
NIID-Nat. Inst. of Infect. Dis., Japan
WH-National Inst. of Health (Thailand)
HKU-The University of Hong KingDetects N gene and Orf1b. Not specific for SARS-Cov2, but other Sarbecovirus species are not in circulationWHO Peiris Protocol
US-CDC-CDC, USAThree reactions, target: N gene. One primer/probe set detects all\ betacoronaviruses, two sets are specific for SARS-CoV-2. All 3 assays must be positive \ Instructions
EU-DrostenDrosten Lab, Charite Berlin, GermanySet 1: run E and RdRp primers (designed for SARS-CoV, SARS-CoV-2, and bat-associated\ betacoronaviruses), if Set 1 is positive, use SARS-Cov-2 specific detection primerInstructions
\ \

\ The sequences and the identifiers for these primers were obtained from the following sources, among others:\ \

\ More details can be found in the spreadsheet linked above.\ \ \

Methods

\ The primers were mapped with the following command:
blat ../../wuhCor1.2bit primers.fa stdout -stepSize=3 -tileSize=6 -minScore=10 -oneOff=1 -noHead -fine | pslReps stdin stdout /dev/null -minNearTopSize=10 -minCover=0.8 -nohead > primers.psl\ \ \

Data Access

\

\ You can download the PSL file underlying this track (primers) from our\ Download Server.\ The data can be explored interactively with the Table Browser\ or the Data Integrator. The data can also be\ accessed from scripts through our API.

\ \

Credits

\

\ This data annotation track was made by Maximilian Haeussler, with assistance from Daniel Schmelter.\ Fasta data collected by Tomer Altman (Biome Bioinformatics), prepared by Jason Fernandes (UCSC), updated by Darach Miller (Stanford) and the OpenCovid19 project.

\ map 1 group map\ longLabel RT-PCR Detection Kit Primer Sets\ shortLabel RT-PCR Primers\ track primers\ type psl\ simpleRepeat Simple Repeats bed 4 + Simple Tandem Repeats by TRF 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays simple tandem repeats (possibly imperfect repeats) located\ by Tandem Repeats\ Finder (TRF) which is specialized for this purpose. These repeats can\ occur within coding regions of genes and may be quite\ polymorphic. Repeat expansions are sometimes associated with specific\ diseases.

\ \

Methods

\

\ For more information about the TRF program, see Benson (1999).\

\ \

Credits

\

\ TRF was written by \ Gary Benson.

\ \

References

\ \

\ Benson G.\ \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.\ PMID: 9862982; PMC: PMC148217\

\ varRep 1 group varRep\ longLabel Simple Tandem Repeats by TRF\ shortLabel Simple Repeats\ track simpleRepeat\ type bed 4 +\ visibility hide\ spikeMuts Spike Mutations bigBed 9 + Spike protein mutations from community annotation (Feb 2021) 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ The SARS-CoV-2 spike protein, which binds the virus to host cells, is a key target of vaccine \ development to combat COVID-19, and mutations in this protein have potential to change infectivity \ and response to disease treatments as well as vaccine efficacy.\ As of February 2021 more than a dozen mutations in this protein have been detected via sequencing \ worldwide, and specific mutations have been identified to be associated with\ viruses that are significantly more transmissible.

\

\ This track presents amino acid mutations identified in the SARS-CoV2 Spike protein, based on the community annotation at \ CoVariants.org,\ supplemented by the\ \ Variants of SARS-CoV-2 Wikipedia page.

\ \

\ Mutations in this track (with nicknames): H69-, D80Y, S98F, A222V, N439K (Nick), L452R, Y453F, S477N, E484 (Eeek), N501 (Nelly), D614G (Doug), A626S, P681H (Pooh), A701B, V1122

\

\ Information provided for each mutation, if available, includes:\

    \
  • Date first sequenced
  • \
  • Notes of clinical significance
  • \
  • Notes regarding geographic incidence
  • \
  • Link to charts and tables presenting geographic distribution
  • \
  • Link to Nextstrain build
  • \
  • Links to relevant publications
  • \
\

\ \ \

Display Conventions

\

\ The track items are colored as follows:\ \ \ \ \ \
RedIdentified as strong antibody escape by Bloom Lab RBD-mutation screen
PurpleACE2 receptor binding region (RBD)
BlueOther
\ \ \ \

\ The mutation name, e.g. N501 as well as alternative identifiers (e.g. N501Y, 501Y) or nickname \ if present, can be typed in to the browser position box to navigate the browser to the mutation \ position and highlight the mutation in the browser window.

\ \

Release Notes

\

\ This track was updated to include the L453R mutation in the B.1.429 variant (first \ identified in California), as displayed in the \ Variants of Concern \ track.\ The color scheme was also changed in this track update.\

\ \

Data Access

\

\ The raw data can be explored interactively with the Table Browser, or Data Integrator.\ For automated analysis, the genome annotation can be downloaded from the\ downloads server.\ Data files for earlier versions of this track can be downloaded from our \ archive download server.\ Please refer to our mailing list archives\ for questions, or our Data Access FAQ for more information.\

\ \

Credits

\

\ Thanks to Emma B. Hodcroft, Institute of Social and Preventive Medicine, University of Bern, Switzerland\ for leading and maintaining the community annotation resource on which this track is largely based.

\ \

References

\

\ CoVariants: SARS-CoV-2 Mutations and Variants of Interest\ Emma B. Hodcroft, Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland\

\

\ Variants of SARS-CoV-2, Wikipedia\

\ \ \ \ \ varRep 1 bedNameLabel Mutation\ bigDataUrl /gbdb/wuhCor1/spikeMuts/spikeMuts.Feb21.bb\ group varRep\ html spikeMuts.Feb21\ itemRgb on\ longLabel Spike protein mutations from community annotation (Feb 2021)\ mouseOver $name $clinicalNotes\ noScoreFilter on\ searchIndex name\ searchTrix /gbdb/wuhCor1/spikeMuts/spikeMutsSearch.Feb21.ix\ shortLabel Spike Mutations\ track spikeMuts\ type bigBed 9 +\ urls geographyChartsUrl="$$" nextstrainBuildUrl="$$"\ targets T-React. Epitopes bigBed T-cell reactive epitopes in patients and donors 0 100 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows data from Braun et al, 2020, \ "Presence of SARS-CoV-2 reactive T cells in COVID-19 patients and healthy donors".

\

\ The authors stimulated PBMCs with two sets (M1 and M2) of overlapping SARS-CoV peptide pools. \ Importantly, these are SARS-CoV peptides that have experimental support as putative MHC-II \ epitopes for SARS-CoV, but note they do not all 100% match SARS-Cov-2 sequence.

\ Using these peptide sets, the authors "demonstrate the presence of S-reactive CD4+ 52 \ T cells in 83% of COVID-19 patients, 53 as well as in 34% of SARS-CoV-2 seronegative \ healthy donors (HD), albeit at lower frequencies."

\ \

Display Conventions and Configuration

\

\ Two tracks are available:\

\
    \
  • M1_peptides: The more N-terminal peptide library
  • \
  • M2 peptides: The more C-terminal peptide library
  • \
\ \

\ The annotated interval represents the alignment of the peptide to the viral\ genome. The sequence displayed in the name is the SARS-CoV peptide sequence (the actual \ sequence that was used in the paper, not necessarily\ identical to the SARS-CoV-2 peptide).

\ \

Methods

\

\ Table S1 contains the peptide sequences (SARS-CoV sequence) used. This table was \ downloaded and tblastn was used to align the identified SARS-CoV peptides to the SARS-CoV-2 genome.

\

\ A small number of peptides were not found or reported erroneous hits, in which the coordinates \ were identified by manually blat-ing the SARS-CoV-2 sequence from the alignments reported in Fig S1.

\ \

References

\ Braun et al, 2020. \ "Presence of SARS-CoV-2 reactive T cells in COVID-19 patients and healthy donors"\ immu 1 compositeTrack on\ group immu\ longLabel T-cell reactive epitopes in patients and donors\ shortLabel T-React. Epitopes\ track targets\ type bigBed\ visibility hide\ vaccines Vaccines bigPsl COVID Vaccines BioNTech/Pfizer BNT-162b2 and Moderna mRNA-1273 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the alignment of three different mRNA vaccine sequences\ to the SARS-CoV-2 genome:\

    \
  1. The BioNTech/Pfizer BNT-162b2 sequence as published by the World Health Organization\
  2. The reconstructed BioNTech/Pfizer BNT-162b2 RNA as sequenced by the\ Andrew Fire lab, Stanford University School of Medicine\
  3. The Moderna mRNA-1273 sequence as sequenced by the\ Andrew Fire lab, Stanford University School of Medicine\
\

\

Note that the actual vaccines are synthesized with N1-methyl-pseudouridine \ (Ψ) in place of uridine. See paper by Hubert in References for\ a discussion.\

\ \

Display Conventions and Configuration

\

\ The psl output from blat was converted to a bigPsl\ format file for display in this track. Depending upon the size of the\ section of the genome in display, the track will draw black where\ nucleotides are identical between vaccine sequence and the SARS-CoV-2\ sequence. Red lines indicate differences in nucleotides. At viewpoints\ with smaller sections of the genome in view, setting the\ Color track by codons or bases: to different mRNA bases\ will show the nucleotides in the vaccine that are different than the\ SARS-CoV-2 sequence.\

\

Methods

\

\ The mRNA sequences were obtained from the MS WORD documents as\ mentioned in the references below. And the\ Andrew Fire lab\ github supplied the\ fasta sequencing result for\ the BioNTech/Pfizer BNT-162b2 and Moderna mRNA-1273 samples.\

\

\ The PSL alignment file was obtained via the UCSC genome browser\ blat service with parameters -t=dnax -q=rnax and filtered\ to allow only scores above 1000 to filter out the polyA match:\

\
  gfClient -maxIntron=10 -t=dnax -q=rnax <host> <port> \\\
     /gbdb/wuhCor1 threeVaccines.fa stdout \\\
        | pslFilter -minScore=1000 stdin wuhCor1.vaccines.psl\
\
  pslScore wuhCor1.vaccines.psl\
\
  #tName          tStart  tEnd    qName:qStart-qEnd       score   percentIdent\
  NC_045512v2     21559   25384   ModernaMrna1273:54-3879  1419    68.60\
  NC_045512v2     21559   25384   ReconstructedBNT162b2:51-3876 1701    72.30\
  NC_045512v2     21559   25384   WHO_BNT162b2:51-3876     1701    72.30\
\
  faCount threeVaccines.fa | tawk '{print $1,"1.."$2+1}' \\\
     | head -4 | tail -3 > threeVaccines.cds\
  pslToBigPsl -cds=threeVaccines.cds -fa=threeVaccines.fa wuhCor1.vaccines.psl stdout \\\
     | sort -k1,1 -k2,2n > wuhCor1.vaccines.bigPsl\
\
  bedToBigBed -type=bed12+13 -tab -as=HOME/kent/src/hg/lib/bigPsl.as \\\
    wuhCor1.vaccines.bigPsl wuhCor1.chrom.sizes wuhCor1.vaccines.bb\
\

\ \

Data Access

\

\ The fasta file sequences and psl alignment file can be obtained from\ our download server at:\ https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/vaccines/.\

\ \

\ The bigPsl alignment file used for the display of this track\ in the genome browser can be accessed from\ https://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/wuhCor1.vaccines.bb.\ The kent command line access tool bigBedToBed,\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and\ binaries can be found\ here. \

\ \

\ The protein encoded by the three sequences has two AA substitutions\ compared to the SARS-CoV-2 S glycoprotein. Variations: S:K986P and S:V987P\ in the vaccine sequence. See also:\ The tiny tweak behind COVID-19 vaccines.\

\
>BNT162b2\
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFD\
NPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVY\
SSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQT\
LLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRV\
QPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSF\
VIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPC\
NGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL\
PFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGS\
NVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTI\
SVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGF\
NFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAG\
TITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALN\
TLVKQLSSNFGAISSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRV\
DFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT\
FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDL\
QELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYTZZ\
\

\ \

References

\

\ Dae Eun Jeong, Matthew McCoy, Karen Artiles, Orkan Ilbay, Andrew Fire, Kari Nadeau, Helen Park, Brooke Betts, Scott Boyd, Ramona Hoh, and Massa Shoura\ Assemblies of putative SARS-CoV2-spike-encoding mRNA sequences for vaccines BNT-162b2 and mRNA-1273\ obtained from github\

\ \

\ Bert Hubert\ Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine\ 25 Dec 2020\

\ \

\ WikiPedia\ Pfizer-BioNTech COVID-19 vaccine\

\ \

\ World Health Organization MedNet\ Messenger RNA encoding the full-length SARS-CoV-2 spike glycoprotein Sept. 2020 document 11889\

\ \

\ Cyril Le Nouën, Peter L. Collins, and Ursula J. Buchholz\ Attenuation of Human Respiratory Viruses by Synonymous Genome Recoding Frontiers in Immunology 2019; 10: 1250. PMID: 31231383\

\ \

\ Ryan Cross\ The tiny tweak behind COVID-19 vaccines,\ Chemical & Engineering News 29 September 2020 Vol 98, issue 38\

\ \

Credits

\

\ Thank you to the Andrew Fire lab, Stanford University School of Medicine\ for providing the sequencing data of these vaccines.\

\ \ \

The presentation of this track was prepared by Hiram Clawson (hclawson@ucsc.edu).\

\ immu 1 baseColorDefault diffCodons\ baseColorUseCds table given\ baseColorUseSequence lfExtra\ bigDataUrl /gbdb/wuhCor1/bbi/wuhCor1.vaccines.bb\ color 0,0,0\ group immu\ html vaccines\ indelDoubleInsert on\ indelQueryInsert on\ longLabel COVID Vaccines BioNTech/Pfizer BNT-162b2 and Moderna mRNA-1273\ pslSequence no\ shortLabel Vaccines\ showCdsAllScales .\ showCdsMaxZoom 30000.0\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 30000.0\ track vaccines\ type bigPsl\ iedb Validated epitopes from IEDB bigBed Validated epitopes from IEDB 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows epitope sequences displayed by various class I MHC alleles as\ annotated by National Institute for Allergy and Infectious Diseases (NIAID) Immune Epitope Database (IEDB). Only the epitopes\ with positive assays are displayed on this track. These epitopes were validated using many different methods. Click through to the IEDB page for each individual epitope to see how that epitope was validated.\

\ \

\ EpitopeID is a clickable link to each epitope on the IEDB site where details about\ assays, literature and HLA restriction are provided.

\ \

References

\

\ Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I, Salimi N, Damle R, Sette A, Peters B.\ \ The immune epitope database 2.0.\ Nucleic Acids Res. 2010 Jan;38(Database issue):D854-62.\ PMID: 19906713; PMC: PMC2808938\

\ immu 1 bigDataUrl /gbdb/wuhCor1/iedb/iedb.bb\ group immu\ longLabel Validated epitopes from IEDB\ shortLabel Validated epitopes from IEDB\ track iedb\ type bigBed\ urls epitopeID="http://www.iedb.org/epitope/$$"\ visibility hide\ variantMuts Variants of Concern bigBed 4 Mutations in Variants of Concern (VOC), Interest (VOI), or Under Monitoring (VUM) (configure to show more lineages) 1 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays amino acid and nucleotide mutations in SARS-CoV-2 variants\ as defined in December 2021 by \ the World Health Organization (WHO). Note that the \ Centers for Disease Control (CDC) classification of SARS-CoV-2 \ variants is slightly different than the WHO.

\ \

\ Mutations in this track were identified from viral sequences from \ GISAID.\ Variant incidence and geographic distribution information is available from links to the \ Outbreak.info web resource\ on the mutation details pages.

\ \

    \
  • Variants of Concern (VOC) have evidence for increased transmissibility, \ virulence, and/or decreased diagnostic, therapeutic, or vaccine efficacy.
  • \
  • Variants of Interest (VOI) contain mutations suspected or confirmed \ to cause a change in transmissibility, virulence, or diagnostic / therapeutic / \ vaccine efficacy, plus evidence of significant community transmission, \ a cluster of cases, or detection in multiple countries.
  • \
  • Variants under Monitoring (VUM) include variants with unclear \ epidemiological impact. This track includes only the four VUMs which were previously \ identified as Variants of Interest, now reclassified at this lower level of concern.
  • \
\

\ \

\ The related track\ B.1.1.7 in USA\ displays a phylogenetic tree of the first B.1.1.7 (Alpha) variant sequences collected \ in the United States.\

\ \

BV-BRC has a similar list of variants of concern and their mutations, but has added \ representative sequences.\

\ \ \

Display Conventions

\

\ Track colors are based on\ Nextstrain.org\ clade coloring:\

\ \ \ \

\ The Greek-letter names assigned by the World Health Organization (WHO) are listed \ in this table, along with lineage and clade designations:\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
ColorWHO labelPangolin lineageNextstrain cladeGISAID cladeFirst detectedDate VOC/VOIType
     Alpha B.1.1.720I (V1)GRYSep 2020, United Kingdom18-Dec-2020former VOC
     Beta B.1.35120H (V2)GH/501Y.V2May 2020, South Africa18-Dec-2020former VOC
     GammaP.120J (V3)GR/501Y.V3Nov 2020, Brazil11-Jan-2021former VOC
     DeltaB.1.617.221AGK/478K.V1Oct 2020, India11-May-2021former VOC
     OmicronB.1.1.529/BA.121KGR/484ANov 2021, Multiple countries26-Nov-2021former VOC
     OmicronBA.221LGRANov 2021, Multiple countries13-Dec-2021former VOC
     OmicronBA.422AGRAJan. 202218-May-2022former VUM
     OmicronBA.522BGRAJan. 202218-May-2022former VUM
     OmicronBA.2.12.122CGRADec. 202118-May-2022former VUM
     OmicronBA.2.7522DGRAMay 2022, India29-Jul-2022former VOC
     OmicronBQ.122EGRAFeb. 202112-Oct-2022former VUM
     OmicronXBB22FGRAAug. 202212-Oct-2022VUM as of 29 January 2024
     OmicronXBB.1.523AGRAOct. 202211-Jan-2023 VUM, 15-Mar-2023 VOIVOI as of 29 January 2024
     OmicronXBB.1.1623BGRAJan. 202322-Mar-2023 VUM, 17-04-2023 VOIVOI as of 29 January 2024
     OmicronCH.1.123CGRAJuly 20229-Feb-2023former VUM
     OmicronXBB.1.923DGRADec. 202230-Mar-2023XBB.1.9.1 VUM as of 29 January 2024
     OmicronXBB.2.323EGRASep. 202217-May-2023VUM as of 29 January 2024
     OmicronEG.5.123FGRAFeb. 20239-Aug-2023EG.5 VOI as of 29 January 2024
     OmicronXBB.1.5.7023GGRAMar. 2023n/an/a
     OmicronHK.323HGRAJune 2023n/an/a
     OmicronBA.2.8623IGRAJuly 202321-Nov-2023VOI
     OmicronJN.123IGRAAug. 202318-Dec-2023VOI
     LambdaC.3721GGR/452Q.V1Dec 2020, Peru14-Jun-2021former VOI
     MuB.1.62121HGHJan 2021, Colombia30-Aug-2021former VOI
     EpsilonB.1.42921CGH/452R.V1Mar 2020, USA06-Jul-2021former VUM
     EtaB.1.52521DG/484K.V3Dec 202020-Sep-2021former VUM
     IotaB.1.52621FGH/253G.V1Nov 2020, USA20-Sep-2021former VUM
     KappaB.1.617.121BG/452R.V3Oct 2020, India20-Sep-2021former VUM
\

\

\ Mutations in the amino acid track are named with the format: \

\
        [Reference amino acid][1-based coordinate in peptide][Alternate amino acid]. E.g., L452R\
\

\ Mutations in the nucleotide track are named with the format: \

\
        [Reference nucleotide][1-based coordinate in genome][Alternate nucleotide]. E.g., T22918G\
\ Insertions and deletions in both tracks are named: \
\
        [del/ins]_[1-based genomic coordinate of first affected nucleotide].  E.g., del_21991\
\

\ \

Methods

\

\ For each virus variant, SARS-CoV-2 genome sequences containing all characteristic\ mutations of the lineage were downloaded from \ GISAID\ using the lineage search feature\ (restricting to complete, high-coverage genomes, and restricting to earliest sample\ collection dates when there were too many results for the download limit of 10,000\ sequences per query).\

\ \

\ Sequences were aligned to the\ \ SARS-CoV-2 reference genome\ using the\ \ global_profile_alignment.sh\ script from the\ sarscov2phylo repository.\ Single-nucleotide substitutions were extracted from the alignment using the UCSC tool\ faToVcf\ (available on the UCSC download server\ or from bioconda;\ also requires the\ \ SARS-CoV-2 reference sequence).\ Single-nucleotide substitutions present at a frequency\ of at least 0.95 (.70 for Delta, .80 for Omicron) were retained while all others are discarded.\

\

\ For indel detection, the \ Minimap2 \ suite of tools was used as follows:\

\
\
        minimap2 --cs [Reference Sequence] [Set of Unaligned Sequences] | paftools.js call -L 10000 -\
\

\ Indels present at a frequency of at least 0.85 (.50 for Delta, .70 for Omicron) were retained.\ Less stringent cutoffs were applied to Delta and Omicron\ variant sequences due to low quality of early sequences.

\

\ The results were then combined and formatted by\ lineageVariants.py. \ The entire pipeline was run using\ \ lineageVariants.sh.\

\ \

Data Access

\

\ You can download the bigBed data files for this track from the\ UCSC Download Server.\ The data can be explored interactively with the\ Table Browser\ or the Data Integrator. The data can be\ accessed from scripts through our API.\ For complete genome Fasta sequences of variants of concern, please visit the following \ third-party page:\

\ \ \

Release Notes

\

\ Version 2 of this track adds one new Variant of Concern (Delta), two new Variants of \ Interest (Lambda, Mu), and three named variants previously VOI, now designated as less \ concerning Variants under Monitoring (Eta, Iota, Kappa). \ The track labels of all variants have been updated to include WHO labels. Track colors \ reflect Nextstrain conventions at the time of track data update (September 10, 2021).\

\

\ Omicron BA.1 was added December 2, 2021 (called B.1.1.529 at the time of discovery).\

\

\ Omicron lineages BA.4 and BA.5 were added in May 2022.\

\

\ Omicron lineages BA.2, BA.2.12.1, BA.2.75, BQ.1, XBB, XBB.1.5, XBB.1.16, CH.1.1, XBB.1.9, XBB.2.3,\ and EG.5.1 were added in September 2023.\

\

\ Omicron lineages XBB.1.5.70, HK.3, BA.2.86 and JN.1 were added in January 2024.\

\ \

Credits

\

\ This work is made possible by the open sharing of genetic data by research\ groups from all over the world. We gratefully acknowledge their contributions.\ We thank \ Rob Lanfear \ at the Australia National University for developing and maintaining the\ sarscov2phylo web resource.\ We also thank \ the Su, \ Wu, \ and Andersen labs at Scripps Research for creating the \ Outbreak.info resource.\ The lineageVariants scripts were developed and run at UCSC by Nick Keener,\ Kate Rosenbloom and Angie Hinrichs.\

\ \

References

\

\ Rambaut A, Holmes EC, O'Toole Á, Hill V, McCrone JT, Ruis C, du Plessis L, Pybus OG.\ \ A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology.\ Nat Microbiol. 2020 Nov;5(11):1403-1407.\ PMID: 32669681\

\

\ Rambaut A, Loman N, Pybus O, Barclay W, Barrett J,\ Carabelli A, Connor T, Peacock T, Robertson DL, Volz E, et al.\ Preliminary genomic characterization of an emergent SARS-CoV-2 lineage in the UK\ defined by a novel set of spike mutations.\ Virological. 2020 Dec 18.\

\ \

\ Volz E, Mishra S, Chand M, Barrett JC, Johnson E,\ Geidelberg L, Hinsley WR, Laydon DJ, Dabrera G, O'Toole Á, et al.\ \ Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking \ epidemiological and genetic data.\ Virological. 2020 Dec 31.

\ \

\ Tegally et al, December 21, 2020.\ \ Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa medRxiv preprint.\ Zhang al, January 20, 2021.\ \ Emergence of a novel SARS-CoV-2 strain in Southern California, USA \ medRxiv preprint.\

\ \

\ Voloch et al, December 26, 2020. \ \ Genomic characterization of a novel SARS-CoV-2 lineage from Rio de Janeiro, Brazil medRxiv preprint.\

\ \

\ Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI:\ 10.5281/zenodo.3958883\

\ \

\ Li, Heng\ \ Minimap2: pairwise alignment for nucleotide sequences.\ Bioinformatics. 2018 Sep 15;34(18):3094-3100.\ PMID: 29750242; PMC: PMC6137996\

\ \

\ Gangavarapu, Karthik; Alkuzweny, Manar; Cano, Marco; Haag, Emily; Latif, Alaa Abdel; \ Mullen, Julia L.; Rush, Benjamin; Tsueng, Ginger; Zhou, Jerry; Andersen, Kristian G.; \ Wu, Chunlei; Su, Andrew I.; Hughes, Laura D. \ Outbreak.info\

\ varRep 1 allButtonPair on\ bedNameLabel Mutation\ compositeTrack on\ darkerLabels on\ dataVersion Sequences downloaded September 10, 2021, update on Dec 2, 2021, May 4, 2022, September 22, 2023, and January 29, 2024\ group varRep\ html variantMuts\ longLabel Mutations in Variants of Concern (VOC), Interest (VOI), or Under Monitoring (VUM) (configure to show more lineages)\ noScoreFilter on\ pennantIcon New red ../goldenPath/newsarch.html#021224 "Updated Feb. 12, 2024"\ shortLabel Variants of Concern\ sortOrder designation=+ variant=+ mutation=+\ subGroup1 mutation Mutations AA=Amino_Acid NUC=Nucleotide\ subGroup2 variant Variants A_B117=Alpha B_B1351=Beta C_P1=Gamma D_B16172=Delta L_C37=Lambda M_B1621=Mu E_B1429=Epsilon E_B1525=Eta I_B1526=Iota K_B16171=Kappa J_BA1=Omicron_BA.1 K_BA2=Omicron_BA.2 L_BA4=Omicron_BA.4 M_BA5=Omicron_BA.5 N_BA2121=Omicron_BA.2.12.1 O_BA275=Omicron_BA.2.75 P_BQ1=Omicron_BQ.1 Q_XBB=Omicron_XBB R_XBB15=Omicron_XBB.1.5 S_XBB116=Omicron_XBB.1.16 T_CH11=Omicron_CH.1.1 U_XBB19=Omicron_XBB.1.9 V_XBB23=Omicron_XBB.2.3 W_EG51=Omicron_EG.5.1 X_XBB1570=Omicron_XBB.1.5.70 Y_HK3=Omicron_HK.3 Z_BA286=Omicron_BA.2.86 ZA_JN1=Omicron_JN.1\ subGroup3 designation Designation VOC=VOC VOI=VOI VUM=VUM\ track variantMuts\ type bigBed 4\ visibility dense\ windowmaskerSdust WM + SDust bed 3 Genomic Intervals Masked by WindowMasker + SDust 0 100 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track depicts masked sequence as determined by\ WindowMasker. The\ WindowMasker tool is included in the NCBI C++ toolkit. The source code\ for the entire toolkit is available from the NCBI\ \ FTP site.\

\ \

Methods

\ \

\ To create this track, WindowMasker was run with the following parameters:\

\
windowmasker -mk_counts true -input wuhCor1.fa -output wm_counts\
windowmasker -ustat wm_counts -sdust true -input wuhCor1.fa -output repeats.bed\
\ The repeats.bed (BED3) file was loaded into the "windowmaskerSdust" table for\ this track.\

\ \

References

\ \

\ Morgulis A, Gertz EM, Schäffer AA, Agarwala R.\ WindowMasker: window-based masker for sequenced genomes.\ Bioinformatics. 2006 Jan 15;22(2):134-41.\ PMID: 16287941\

\ varRep 1 group varRep\ longLabel Genomic Intervals Masked by WindowMasker + SDust\ shortLabel WM + SDust\ track windowmaskerSdust\ type bed 3\ visibility hide\ nextstrainFreqViewNewClades Year-Letter Clades bigWig Nextstrain Mutations Alternate Allele Frequency 0 100 0 0 0 127 127 127 0 0 0 varRep 0 longLabel Nextstrain Mutations Alternate Allele Frequency\ parent nextstrainFreq\ shortLabel Year-Letter Clades\ track nextstrainFreqViewNewClades\ view newClades\ visibility hide\