This track shows single nucleotide variants (SNVs), from the Rhesus Macaque Genome Consortium that were sequenced and identified by Jeff Rogers' lab at BCM-HGSC.
In "dense" mode, a vertical line is drawn at the position of each variant. In "pack" mode, since these variants have been phased, the display shows a clustering of haplotypes in the viewed range, sorted by similarity of alleles weighted by proximity to a central variant. The clustering view can highlight local patterns of linkage.
In the clustering display, each sample's phased diploid genotype is split into two independent haplotypes. Each haplotype is placed in a horizontal row of pixels; when the number of haplotypes exceeds the number of vertical pixels for the track, multiple haplotypes fall in the same pixel row and pixels are averaged across haplotypes.
Each variant is a vertical bar with white (invisible) representing the reference allele and black representing the non-reference allele(s). Tick marks are drawn at the top and bottom of each variant's vertical bar to make the bar more visible when most alleles are reference alleles. The vertical bar for the central variant used in clustering is outlined in purple. In order to avoid long compute times, the range of alleles used in clustering may be limited; alleles used in clustering have purple tick marks at the top and bottom.
The clustering tree is displayed to the left of the main image. It does not represent relatedness of individuals; it simply shows the arrangement of local haplotypes by similarity. When a rightmost branch is purple, it means that all haplotypes in that branch are identical, at least within the range of variants used in clustering.
All SNV calls are relative to the reference rhesus macaque genome (Mmul_10/rheMac10). Gene models from the Ensembl release 98 merged Ensembl and RefSeq dataset that also includes annotations based on PacBio iso-seq (available here) were used to predict the functional consequences of the SNVs.
Whole-genome sequencing was performed over an eight-year period. Consequently, as technology improved, the sequencing platforms used to generate next-generation sequencing reads for this dataset progressed as follows: Illumina HiSeq 2000, HiSeq Rapid 2500, HiSeq X, and NovaSeq platforms, generating 2 X 100 bp or 2 X 150 bp paired-end reads, as is typical for each platform. All underlying sequence data have been deposited into the SRA (BioProject ID: PRJNA251548).
Reads were aligned to the reference genome (Mmul_10/rheMac10) , which also included the mitochondria genome (NC_005943.1) and had the pseudoautosomal region of chromosome Y masked using BWA-MEM 0.7.12-r1039 (Li and Durbin, 2009; Li, 2013). To identify reads potentially originating from a single fragment of DNA and mark them in the bam files, we used Picard MarkDuplicates version 1.105.
SNVs were called using the Genome Analysis Toolkit (GATK) version 4.1.2.0 (McKenna, et al., 2010) and a VCF file was generated. The hard filters suggested by the developers of GATK (https://software.broadinstitute.org/gatk/documentation/article?id=11097) were applied to the SNVs and all failing SNVs were removed. We then used GATK VariantAnnotator to annotate SNVs applying AlleleBalance. SNVs with an allelic balance for heterozygous calls (ABHet=ref/(ref+alt)) ABHet < 0.2 or ABHet > 0.8 were removed.
The Variant Effect Predictor software from Ensembl (McLaren et al., 2010) was used to predict the functional consequence of SNVs queried against Ensembl release 98 rhesus macaque gene models based on Ensembl and RefSeq gene predictions and including PacBio iso-seq data.
Definitions of consequence types can be found in the VEP documentation.
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at http://arxiv.org/pdf/1303.3997v2.pdf 2013.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754-60. PMID: 19451168; PMC: PMC2705234
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010 Sep;20(9):1297-303. PMID: 20644199; PMC: PMC2928508
McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010 Aug 15;26(16):2069-70. PMID: 20562413; PMC: PMC2916720