This track shows a count of the number of different amino acids from each non-reference strain found at each amino acid location of the reference sequence, ASM1346v1/USA300_FPR3757, protein coding regions.
The 369-way alignment is used to extract the data for these counts. The 369-way MAF file is processed with the mafGene command to extract the protein sequence for each strain that was aligned to the reference sequence coding regions. The ncbiGene track is used as the reference location for coding regions. The mafGene command is used to extract the amino-acid alignments:
For each chromosome in the reference sequence: mafGene -chrom=<chrName> -exons staAur2 369way.maf ncbiGene species.list <chrName>.exonAA.fa
Typical output in the exonAA.fa file is as follows:
>SAUSA300_RS15440_staAur2_1_1 21 0 0 NC_007793v1:1994975-1995037+ LYFKNSSLNYIPTTFGGEPFK >SAUSA300_RS15440_Sa_USA300_SUR12_1_1 21 0 0 NZ_CP014407v1:1999012-1999023+ L--KNSSLNYIPTTFGGEPFK >SAUSA300_RS15440_Sa_USA300_TCH1516_1_1 21 0 0 NC_010079v1:1995723-1995734+ L--KNSSLNYIPTTFGGEPFK >SAUSA300_RS15440_Sa_JE2_1_1 21 0 0 NZ_CP020619v1:1996662-1996717+ LYFKNSSLNYIPTTFGGE--- >SAUSA300_RS15440_Sa_C2406_1_1 21 0 0 NZ_CP019590v1:2001635-2001646+ LYFKNSSLNYIPTTFGGEPFK . . . etc . . .This fasta file is scanned with a perl script to count those amino acids in each location. For each AA location on the reference species, there are 23 different counts obtained. One count for each of the 20 amino acids, plus three extra counts for the categories:
Click into any item on the screen to see details of the strains carrying the various amio acids at that position.
The track will show in the window any amino acids that have a non-reference amino acid in at least one of the aligned strains. The number of rows displayed will necessarily change as the size of the window changes.
Amino acids are shown, one per row, in order of the biochemical nature of the side chain:
In all display modes, darker colors indicate a higher level of occurrance of the amino acid residue in the strain set. If there are no actual non-reference amino acids to display (i.e., excluding the NUL row), the reference amino acid is shown in white background.
A mouseover will give the number and percentage of non-reference amino acids in the set. E.g., in the default location polC gene, the serine at AA17 shows "68% (247/362)."
In dense and pack modes, the track shows the amino-acid names to the left of the data row in three- and one-letter AA codes and in single-letter code in any items in the display.
In full mode, the individual amino-acid names and their biochemical group (hydrophobic, neutral, positive, negative, etc) are displayed above each amino-acid row.
In squish mode, only a thin colored bar is shown for each amino acid.