cpgIslandExt CpG Islands CpG Islands (Islands < 300 Bases are Light Green) Expression and Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 cpgIslandSuper CpG Islands CpG Islands (Islands < 300 Bases are Light Green) Expression and Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 rmsk RepeatMasker Repeating Elements by RepeatMasker Variation and Repeats Description This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences for interspersed repeats and low complexity DNA sequences. The program outputs a detailed annotation of the repeats that are present in the query sequence (represented by this track), as well as a modified version of the query sequence in which all the annotated repeats have been masked (generally available on the Downloads page). RepeatMasker uses the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka (2000) in the References section below. Some newer assemblies have been made with Dfam, not Repbase. You can find the details for how we make our database data here in our "makeDb/doc/" directory. When analyzing the data tables of this track, keep in mind that Repbase is not the same as the Repeatmasker sequence database and that the repeat names in the Repeatmasker output are not the same as the sequence names in the Repeatmasker database. Concretely, you can find a name such as "L1PA4" in the Repeatmasker output and this track, but there is not necessarily a single sequence "L1PA4" in the Repeatmasker database. This is because Repeatmasker creates annotations by joining matches to partial pieces of the database together so there is no 1:1 relationship between its sequence database and the annotations. To learn more, you can read the Repeatmasker paper, its source code or reach out to the Repeatmasker authors, your local expert on transposable elements or us. Display Conventions and Configuration In full display mode, this track displays up to ten different classes of repeats: Short interspersed nuclear elements (SINE), which include ALUs Long interspersed nuclear elements (LINE) Long terminal repeat elements (LTR), which include retroposons DNA repeat elements (DNA) Simple repeats (micro-satellites) Low complexity repeats Satellite repeats RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA) Other repeats, which includes class RC (Rolling Circle) Unknown The level of color shading in the graphical display reflects the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading. A "?" at the end of the "Family" or "Class" (for example, DNA?) signifies that the curator was unsure of the classification. At some point in the future, either the "?" will be removed or the classification will be changed. Methods Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. Repeats are soft-masked. Alignments may extend through repeats, but are not permitted to initiate in them. See the FAQ for more information. Credits Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track. References Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. https://www.repeatmasker.org/. 1996-2010. Repbase Update is described in: Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. PMID: 10973072 For a discussion of repeats in mammalian genomes, see: Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):657-63. PMID: 10607616 Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. PMID: 8994846 robustPeaks TSS peaks FANTOM5 DPI peak, robust set Expression and Regulation Description The FANTOM5 track shows mapped transcription start sites (TSS) and their usage in primary cells, cell lines, and tissues to produce a comprehensive overview of gene expression across the human body by using single molecule sequencing. Display Conventions and Configuration Items in this track are colored according to their strand orientation. Blue indicates alignment to the negative strand, and red indicates alignment to the positive strand. Methods Protocol Individual biological states are profiled by HeliScopeCAGE, which is a variation of the CAGE (Cap Analysis Gene Expression) protocol based on a single molecule sequencer. The standard protocol requiring 5 µg of total RNA as a starting material is referred to as hCAGE, and an optimized version for a lower quantity (~ 100 ng) is referred to as LQhCAGE (Kanamori-Katyama et al. 2011). hCAGE LQhCAGE Samples Transcription start sites (TSSs) were mapped and their usage in human, mouse, dog, rat, macaque and chicken primary cells, cell lines, and tissues was to produce a comprehensive overview of mammalian gene expression across the human body. 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. Individual samples shown in "TSS activity" tracks are grouped as below. Primary cell Tissue Cell Line Time course Fractionation TSS peaks and enhancers TSS (CAGE) peaks across the panel of the biological states (samples) are identified by DPI (decomposition based peak identification, Forrest et al. 2014), where each of the peaks consists of neighboring and related TSSs. The peaks are used as anchors to define promoters and units of promoter-level expression analysis. Two subsets of the peaks are defined based on evidence of read counts, depending on scopes of subsequent analyses, and the first subset (referred as a robust set of the peaks, thresholded for expression analysis is shown as TSS peaks. The summary tracks consist of the TSS (CAGE) peaks, the enhancers, and summary profiles of TSS activities (total and maximum values). The summary track consists of the following tracks. TSS (CAGE) peaks the robust peaks TSS summary profiles Total counts and TPM (tags per million) in all the samples Maximum counts and TPM among the samples TSS activity 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. The read counts tracks indicate raw counts of CAGE reads, and the TPM tracks indicate normalized counts as TPM (tags per million). Categories of individual samples - Cell Line hCAGE - Cell Line LQhCAGE - fractionation hCAGE - Primary cell hCAGE - Primary cell LQhCAGE - Time course hCAGE - Tissue hCAGE Data Access FANTOM5 data can be explored interactively with the Table Browser and cross-referenced with the Data Integrator. For programmatic access, the track can be accessed using the Genome Browser's REST API. ReMap annotations can be downloaded from the Genome Browser's download server as a bigBed file. This compressed binary format can be remotely queried through command line utilities. Please note that some of the download files can be quite large. The FANTOM5 reprocessed data can be found and downloaded on the FANTOM website. Credits Thanks to Shuhei Noguchi, the FANTOM5 consortium, the Large Scale Data Managing Unit and Preventive Medicine and Applied Genomics Unit, the Center for Integrative Medical Sciences (IMS), and RIKEN for providing this data and its analysis. References Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014 Mar 27;507(7493):455-461. PMID: 24670763; PMC: PMC5215096 Arner E, Daub CO, Vitting-Seerup K, Andersson R, Lilje B, Drablos F, Lennartsson A, Ronnerblad M Hrydziuszko O, Vitezic M et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science. 2015 Feb 27;347(6225):1010-4. PMID: 25678556; PMC: PMC4681433 FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M et al. A promoter-level mammalian expression atlas. Nature. 2014 Mar 27;507(7493):462-70. PMID: 24670764; PMC: PMC4529748 Kanamori-Katayama M, Itoh M, Kawaji H, Lassmann T, Katayama S, Kojima M, Bertin N, Kaiho A, Ninomiya N, Daub CO et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 2011 Jul;21(7):1150-9. PMID: 21596820; PMC: PMC3129257 Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 2015 Jan 5;16(1):22. PMID: 25723102; PMC: PMC4310165 fantom5 FANTOM5 FANTOM5 Expression and Regulation Description The FANTOM5 track shows mapped transcription start sites (TSS) and their usage in primary cells, cell lines, and tissues to produce a comprehensive overview of gene expression across the human body by using single molecule sequencing. Display Conventions and Configuration Items in this track are colored according to their strand orientation. Blue indicates alignment to the negative strand, and red indicates alignment to the positive strand. Methods Protocol Individual biological states are profiled by HeliScopeCAGE, which is a variation of the CAGE (Cap Analysis Gene Expression) protocol based on a single molecule sequencer. The standard protocol requiring 5 µg of total RNA as a starting material is referred to as hCAGE, and an optimized version for a lower quantity (~ 100 ng) is referred to as LQhCAGE (Kanamori-Katyama et al. 2011). hCAGE LQhCAGE Samples Transcription start sites (TSSs) were mapped and their usage in human, mouse, dog, rat, macaque and chicken primary cells, cell lines, and tissues was to produce a comprehensive overview of mammalian gene expression across the human body. 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. Individual samples shown in "TSS activity" tracks are grouped as below. Primary cell Tissue Cell Line Time course Fractionation TSS peaks and enhancers TSS (CAGE) peaks across the panel of the biological states (samples) are identified by DPI (decomposition based peak identification, Forrest et al. 2014), where each of the peaks consists of neighboring and related TSSs. The peaks are used as anchors to define promoters and units of promoter-level expression analysis. Two subsets of the peaks are defined based on evidence of read counts, depending on scopes of subsequent analyses, and the first subset (referred as a robust set of the peaks, thresholded for expression analysis is shown as TSS peaks. The summary tracks consist of the TSS (CAGE) peaks, the enhancers, and summary profiles of TSS activities (total and maximum values). The summary track consists of the following tracks. TSS (CAGE) peaks the robust peaks TSS summary profiles Total counts and TPM (tags per million) in all the samples Maximum counts and TPM among the samples TSS activity 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. The read counts tracks indicate raw counts of CAGE reads, and the TPM tracks indicate normalized counts as TPM (tags per million). Categories of individual samples - Cell Line hCAGE - Cell Line LQhCAGE - fractionation hCAGE - Primary cell hCAGE - Primary cell LQhCAGE - Time course hCAGE - Tissue hCAGE Data Access FANTOM5 data can be explored interactively with the Table Browser and cross-referenced with the Data Integrator. For programmatic access, the track can be accessed using the Genome Browser's REST API. ReMap annotations can be downloaded from the Genome Browser's download server as a bigBed file. This compressed binary format can be remotely queried through command line utilities. Please note that some of the download files can be quite large. The FANTOM5 reprocessed data can be found and downloaded on the FANTOM website. Credits Thanks to Shuhei Noguchi, the FANTOM5 consortium, the Large Scale Data Managing Unit and Preventive Medicine and Applied Genomics Unit, the Center for Integrative Medical Sciences (IMS), and RIKEN for providing this data and its analysis. References Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014 Mar 27;507(7493):455-461. PMID: 24670763; PMC: PMC5215096 Arner E, Daub CO, Vitting-Seerup K, Andersson R, Lilje B, Drablos F, Lennartsson A, Ronnerblad M Hrydziuszko O, Vitezic M et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science. 2015 Feb 27;347(6225):1010-4. PMID: 25678556; PMC: PMC4681433 FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M et al. A promoter-level mammalian expression atlas. Nature. 2014 Mar 27;507(7493):462-70. PMID: 24670764; PMC: PMC4529748 Kanamori-Katayama M, Itoh M, Kawaji H, Lassmann T, Katayama S, Kojima M, Bertin N, Kaiho A, Ninomiya N, Daub CO et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 2011 Jul;21(7):1150-9. PMID: 21596820; PMC: PMC3129257 Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 2015 Jan 5;16(1):22. PMID: 25723102; PMC: PMC4310165 Total_counts_multiwig Total counts Total counts of CAGE reads Expression and Regulation Description The FANTOM5 track shows mapped transcription start sites (TSS) and their usage in primary cells, cell lines, and tissues to produce a comprehensive overview of gene expression across the human body by using single molecule sequencing. Display Conventions and Configuration Items in this track are colored according to their strand orientation. Blue indicates alignment to the negative strand, and red indicates alignment to the positive strand. Methods Protocol Individual biological states are profiled by HeliScopeCAGE, which is a variation of the CAGE (Cap Analysis Gene Expression) protocol based on a single molecule sequencer. The standard protocol requiring 5 µg of total RNA as a starting material is referred to as hCAGE, and an optimized version for a lower quantity (~ 100 ng) is referred to as LQhCAGE (Kanamori-Katyama et al. 2011). hCAGE LQhCAGE Samples Transcription start sites (TSSs) were mapped and their usage in human, mouse, dog, rat, macaque and chicken primary cells, cell lines, and tissues was to produce a comprehensive overview of mammalian gene expression across the human body. 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. Individual samples shown in "TSS activity" tracks are grouped as below. Primary cell Tissue Cell Line Time course Fractionation TSS peaks and enhancers TSS (CAGE) peaks across the panel of the biological states (samples) are identified by DPI (decomposition based peak identification, Forrest et al. 2014), where each of the peaks consists of neighboring and related TSSs. The peaks are used as anchors to define promoters and units of promoter-level expression analysis. Two subsets of the peaks are defined based on evidence of read counts, depending on scopes of subsequent analyses, and the first subset (referred as a robust set of the peaks, thresholded for expression analysis is shown as TSS peaks. The summary tracks consist of the TSS (CAGE) peaks, the enhancers, and summary profiles of TSS activities (total and maximum values). The summary track consists of the following tracks. TSS (CAGE) peaks the robust peaks TSS summary profiles Total counts and TPM (tags per million) in all the samples Maximum counts and TPM among the samples TSS activity 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. The read counts tracks indicate raw counts of CAGE reads, and the TPM tracks indicate normalized counts as TPM (tags per million). Categories of individual samples - Cell Line hCAGE - Cell Line LQhCAGE - fractionation hCAGE - Primary cell hCAGE - Primary cell LQhCAGE - Time course hCAGE - Tissue hCAGE Data Access FANTOM5 data can be explored interactively with the Table Browser and cross-referenced with the Data Integrator. For programmatic access, the track can be accessed using the Genome Browser's REST API. ReMap annotations can be downloaded from the Genome Browser's download server as a bigBed file. This compressed binary format can be remotely queried through command line utilities. Please note that some of the download files can be quite large. The FANTOM5 reprocessed data can be found and downloaded on the FANTOM website. Credits Thanks to Shuhei Noguchi, the FANTOM5 consortium, the Large Scale Data Managing Unit and Preventive Medicine and Applied Genomics Unit, the Center for Integrative Medical Sciences (IMS), and RIKEN for providing this data and its analysis. References Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014 Mar 27;507(7493):455-461. PMID: 24670763; PMC: PMC5215096 Arner E, Daub CO, Vitting-Seerup K, Andersson R, Lilje B, Drablos F, Lennartsson A, Ronnerblad M Hrydziuszko O, Vitezic M et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science. 2015 Feb 27;347(6225):1010-4. PMID: 25678556; PMC: PMC4681433 FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M et al. A promoter-level mammalian expression atlas. Nature. 2014 Mar 27;507(7493):462-70. PMID: 24670764; PMC: PMC4529748 Kanamori-Katayama M, Itoh M, Kawaji H, Lassmann T, Katayama S, Kojima M, Bertin N, Kaiho A, Ninomiya N, Daub CO et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 2011 Jul;21(7):1150-9. PMID: 21596820; PMC: PMC3129257 Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 2015 Jan 5;16(1):22. PMID: 25723102; PMC: PMC4310165 TotalCounts_Rev Total counts of CAGE reads (rev) Total counts of CAGE reads reverse Expression and Regulation TotalCounts_Fwd Total counts of CAGE reads (fwd) Total counts of CAGE reads forward Expression and Regulation Max_counts_multiwig Max counts Max counts of CAGE reads Expression and Regulation Description The FANTOM5 track shows mapped transcription start sites (TSS) and their usage in primary cells, cell lines, and tissues to produce a comprehensive overview of gene expression across the human body by using single molecule sequencing. Display Conventions and Configuration Items in this track are colored according to their strand orientation. Blue indicates alignment to the negative strand, and red indicates alignment to the positive strand. Methods Protocol Individual biological states are profiled by HeliScopeCAGE, which is a variation of the CAGE (Cap Analysis Gene Expression) protocol based on a single molecule sequencer. The standard protocol requiring 5 µg of total RNA as a starting material is referred to as hCAGE, and an optimized version for a lower quantity (~ 100 ng) is referred to as LQhCAGE (Kanamori-Katyama et al. 2011). hCAGE LQhCAGE Samples Transcription start sites (TSSs) were mapped and their usage in human, mouse, dog, rat, macaque and chicken primary cells, cell lines, and tissues was to produce a comprehensive overview of mammalian gene expression across the human body. 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. Individual samples shown in "TSS activity" tracks are grouped as below. Primary cell Tissue Cell Line Time course Fractionation TSS peaks and enhancers TSS (CAGE) peaks across the panel of the biological states (samples) are identified by DPI (decomposition based peak identification, Forrest et al. 2014), where each of the peaks consists of neighboring and related TSSs. The peaks are used as anchors to define promoters and units of promoter-level expression analysis. Two subsets of the peaks are defined based on evidence of read counts, depending on scopes of subsequent analyses, and the first subset (referred as a robust set of the peaks, thresholded for expression analysis is shown as TSS peaks. The summary tracks consist of the TSS (CAGE) peaks, the enhancers, and summary profiles of TSS activities (total and maximum values). The summary track consists of the following tracks. TSS (CAGE) peaks the robust peaks TSS summary profiles Total counts and TPM (tags per million) in all the samples Maximum counts and TPM among the samples TSS activity 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. The read counts tracks indicate raw counts of CAGE reads, and the TPM tracks indicate normalized counts as TPM (tags per million). Categories of individual samples - Cell Line hCAGE - Cell Line LQhCAGE - fractionation hCAGE - Primary cell hCAGE - Primary cell LQhCAGE - Time course hCAGE - Tissue hCAGE Data Access FANTOM5 data can be explored interactively with the Table Browser and cross-referenced with the Data Integrator. For programmatic access, the track can be accessed using the Genome Browser's REST API. ReMap annotations can be downloaded from the Genome Browser's download server as a bigBed file. This compressed binary format can be remotely queried through command line utilities. Please note that some of the download files can be quite large. The FANTOM5 reprocessed data can be found and downloaded on the FANTOM website. Credits Thanks to Shuhei Noguchi, the FANTOM5 consortium, the Large Scale Data Managing Unit and Preventive Medicine and Applied Genomics Unit, the Center for Integrative Medical Sciences (IMS), and RIKEN for providing this data and its analysis. References Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014 Mar 27;507(7493):455-461. PMID: 24670763; PMC: PMC5215096 Arner E, Daub CO, Vitting-Seerup K, Andersson R, Lilje B, Drablos F, Lennartsson A, Ronnerblad M Hrydziuszko O, Vitezic M et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science. 2015 Feb 27;347(6225):1010-4. PMID: 25678556; PMC: PMC4681433 FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M et al. A promoter-level mammalian expression atlas. Nature. 2014 Mar 27;507(7493):462-70. PMID: 24670764; PMC: PMC4529748 Kanamori-Katayama M, Itoh M, Kawaji H, Lassmann T, Katayama S, Kojima M, Bertin N, Kaiho A, Ninomiya N, Daub CO et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 2011 Jul;21(7):1150-9. PMID: 21596820; PMC: PMC3129257 Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 2015 Jan 5;16(1):22. PMID: 25723102; PMC: PMC4310165 MaxCounts_Rev Max counts of CAGE reads (rev) Max counts of CAGE reads reverse Expression and Regulation MaxCounts_Fwd Max counts of CAGE reads (fwd) Max counts of CAGE reads forward Expression and Regulation refSeqComposite NCBI RefSeq RefSeq genes from NCBI Genes and Gene Predictions Description The NCBI RefSeq Genes composite track shows chicken protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq). All subtracks use coordinates provided by RefSeq, except for the UCSC RefSeq track, which UCSC produces by realigning the RefSeq RNAs to the genome. This realignment may result in occasional differences between the annotation coordinates provided by UCSC and NCBI. For RNA-seq analysis, we advise using NCBI aligned tables like RefSeq All or RefSeq Curated. See the Methods section for more details about how the different tracks were created. Please visit NCBI's Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track is a composite track that contains differing data sets. To show only a selected set of subtracks, uncheck the boxes next to the tracks that you wish to hide. Note: Not all subtracks are available on all assemblies. The possible subtracks include: RefSeq aligned annotations and UCSC alignment of RefSeq annotations RefSeq All – all curated and predicted annotations provided by RefSeq. RefSeq Curated – subset of RefSeq All that includes only those annotations whose accessions begin with NM, NR, NP or YP. (NP and YP are used only for protein-coding genes on the mitochondrion; YP is used for human only.) They were manually curated, based on publications describing transcripts and manual reviews of evidence which includes EST and full-length cDNA alignments, protein sequences, splice sites and any other evidence available in databases or the scientific literature. The resulting sequences can differ from the genome, they exist independently from a particular human genome build, and so must be aligned to the genome to create a track. The "RefSeq Curated" track is NCBI's mapping of these transcripts to the genome. Another alignment track exists for these, the "UCSC RefSeq" track (see beloow). RefSeq Predicted – subset of RefSeq All that includes those annotations whose accessions begin with XM or XR. They were predicted based on protein, cDNA, EST and RNA-seq alignments to the genome assembly by the NCBI Gnomon prediction software. RefSeq Other – all other annotations produced by the RefSeq group that do not fit the requirements for inclusion in the RefSeq Curated or the RefSeq Predicted tracks. Examples are untranscribed pseudogenes or gene clusters, such as HOX or protocadherin alpha. They were manually curated from publications or databases but are not typical transcribed genes. RefSeq Alignments – alignments of RefSeq RNAs to the chicken genome provided by the RefSeq group, following the display conventions for PSL tracks. RefSeq Diffs – alignment differences between the chicken reference genome(s) and RefSeq curated transcripts. (Track not currently available for every assembly.) UCSC RefSeq – annotations generated from UCSC's realignment of RNAs with NM and NR accessions to the chicken genome. This track was previously known as the "RefSeq Genes" track. RefSeq Select (subset, only on hg38) – Subset of RefSeq Curated, transcripts marked as part of the RefSeq Select dataset. A single Select transcript is chosen as representative for each protein-coding gene. See NCBI RefSeq Select. RefSeq HGMD (subset) – Subset of RefSeq Curated, transcripts annotated by the Human Gene Mutation Database. This track is only available on the human genomes hg19 and hg38. It is the most restricted RefSeq subset, targeting clinical diagnostics. NCBI Orthologs – Orthologous genes were identified by NCBI's Eukaryotic Genome Annotation Pipeline for the NCBI Gene dataset using a combination of protein sequence similarity and local synteny analysis. Orthology is determined between the genome being annotated and a reference genome, such as human or zebrafish, and pairs of orthologs are grouped together. Transitive relationships are inferred within each group, for example, zebrafish <-> human <-> mouse. For more information on how NCBI calculates orthologs, see the details provided here. This track is available for the following assemblies: hg38, mm39, danRer11, canFam6, and bosTau9. The RefSeq All, RefSeq Curated, RefSeq Predicted, and UCSC RefSeq tracks follow the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), or reviewed (dark), as defined by RefSeq. Color Level of review Reviewed: the RefSeq record has been reviewed by NCBI staff or by a collaborator. The NCBI review process includes assessing available sequence data and the literature. Some RefSeq records may incorporate expanded sequence and annotation information. Provisional: the RefSeq record has not yet been subject to individual review. The initial sequence-to-gene association has been established by outside collaborators or NCBI staff. Predicted: the RefSeq record has not yet been subject to individual review, and some aspect of the RefSeq record is predicted. The item labels and codon display properties for features within this track can be configured through the check-box controls at the top of the track description page. To adjust the settings for an individual subtrack, click the wrench icon next to the track name in the subtrack list . Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name or OMIM identifier instead of the gene name, show all or a subset of these labels including the gene name, OMIM identifier and accession names, or turn off the label completely. Codon coloring: This track has an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. The RefSeq Diffs track contains five different types of inconsistency between the reference genome sequence and the RefSeq transcript sequences. The five types of differences are as follows: mismatch – aligned but mismatching bases, plus HGVS g. to show the genomic change required to match the transcript and HGVS c./n. to show the transcript change required to match the genome. short gap – genomic gaps that are too small to be introns (arbitrary cutoff of < 45 bp), most likely insertions/deletion variants or errors, with HGVS g. and c./n. showing differences. shift gap – shortGap items whose placement could be shifted left and/or right on the genome due to repetitive sequence, with HGVS c./n. position range of ambiguous region in transcript. Here, thin and thick lines are used -- the thin line shows the span of the repetitive sequence, and the thick line shows the rightmost shifted gap. double gap – genomic gaps that are long enough to be introns but that skip over transcript sequence (invisible in default setting), with HGVS c./n. deletion. skipped – sequence at the beginning or end of a transcript that is not aligned to the genome (invisible in default setting), with HGVS c./n. deletion HGVS Terminology (Human Genome Variation Society): g. = genomic sequence ; c. = coding DNA sequence ; n. = non-coding RNA reference sequence. When reporting HGVS with RefSeq sequences, to make sure that results from research articles can be mapped to the genome unambiguously, please specify the RefSeq annotation release displayed on the transcript's Genome Browser details page and also the RefSeq transcript ID with version (e.g. NM_012309.4 not NM_012309). Methods Tracks contained in the RefSeq annotation and RefSeq RNA alignment tracks were created at UCSC using data from the NCBI RefSeq project. Data files were downloaded from RefSeq in GFF file format and converted to the genePred and PSL table formats for display in the Genome Browser. Information about the NCBI annotation pipeline can be found here. The RefSeq Diffs track is generated by UCSC using NCBI's RefSeq RNA alignments. The UCSC RefSeq Genes track is constructed using the same methods as previous RefSeq Genes tracks. RefSeq RNAs were aligned against the chicken genome using BLAT. Those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. The NCBI Orthologs track was generated using the latest NCBI files (gene2accession and gene_orthologs). NCBI chromosome identifiers were mapped to UCSC-compatible IDs using species-specific chromosome alias files, and genes were filtered to include only those located on valid NCBI chromosomes. A custom Python script processed the ortholog relationships and created bed files for each species. The bed files were then converted to BigBed format, with indexing for search functionality. The procedure is documented in the makeDoc from our GitHub repository. Data Access The raw data for these tracks can be accessed in multiple ways. It can be explored interactively using the REST API, Table Browser or Data Integrator. The tables can also be accessed programmatically through our public MySQL server or downloaded from our downloads server for local processing. The previous track versions are available in the archives of our downloads server. You can also access any RefSeq table entries in JSON format through our JSON API. The data in the RefSeq Other, RefSeq Diffs, and NCBI Orthologs tracks are organized in bigBed file format; more information about accessing the information in this bigBed file can be found below. The other subtracks are associated with database tables as follows: genePred format: RefSeq All - ncbiRefSeq RefSeq Curated - ncbiRefSeqCurated RefSeq Predicted - ncbiRefSeqPredicted UCSC RefSeq - refGene PSL format: RefSeq Alignments - ncbiRefSeqPsl The first column of each of these tables is "bin". This column is designed to speed up access for display in the Genome Browser, but can be safely ignored in downstream analysis. You can read more about the bin indexing system here. The annotations in the RefSeqOther, RefSeqDiffs, and NCBI Orthologs tracks are stored in bigBed files, which can be obtained from our downloads server here, ncbiRefSeqOther.bb, ncbiRefSeqDiffs.bb, and ncbiOrtho.bb. Individual regions or the whole set of genome-wide annotations can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system from the utilities directory linked below. For example, to extract only annotations in a given region, you could use the following command: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/galGal5/ncbiRefSeq/ncbiRefSeqOther.bb -chrom=chr16 -start=34990190 -end=36727467 stdout You can download a GTF format version of the RefSeq All table from the GTF downloads directory. The genePred format tracks can also be converted to GTF format using the genePredToGtf utility, available from the utilities directory on the UCSC downloads server. The utility can be run from the command line like so: genePredToGtf galGal5 ncbiRefSeqPredicted ncbiRefSeqPredicted.gtf Note that using genePredToGtf in this manner accesses our public MySQL server, and you therefore must set up your hg.conf as described on the MySQL page linked near the beginning of the Data Access section. A file containing the RNA sequences in FASTA format for all items in the RefSeq All, RefSeq Curated, and RefSeq Predicted tracks can be found on our downloads server here. Please refer to our mailing list archives for questions. Previous versions of the ncbiRefSeq set of tracks can be found on our archive download server. Credits This track was produced at UCSC from data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 refGene UCSC RefSeq UCSC annotations of RefSeq RNAs (NM_* and NR_*) Genes and Gene Predictions Description The RefSeq Genes track shows known chicken protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq). The data underlying this track are updated weekly. Please visit the Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark). The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name instead of the gene name, show both the gene and accession names, or turn off the label completely. Codon coloring: This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. Hide non-coding genes: By default, both the protein-coding and non-protein-coding genes are displayed. If you wish to see only the coding genes, click this box. Methods RefSeq RNAs were aligned against the chicken genome using BLAT. Those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from RNA sequence data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 ncbiRefSeqGenomicDiff RefSeq Diffs Differences between NCBI RefSeq Transcripts and the Reference Genome Genes and Gene Predictions ncbiRefSeqPsl RefSeq Alignments RefSeq Alignments of RNAs Genes and Gene Predictions ncbiRefSeqOther RefSeq Other NCBI RefSeq Other Annotations (not NM_*, NR_*, XM_*, XR_*, NP_* or YP_*) Genes and Gene Predictions ncbiRefSeqPredicted RefSeq Predicted NCBI RefSeq genes, predicted subset (XM_* or XR_*) Genes and Gene Predictions ncbiRefSeqCurated RefSeq Curated NCBI RefSeq genes, curated subset (NM_*, NR_*, NP_* or YP_*) Genes and Gene Predictions ncbiRefSeq RefSeq All NCBI RefSeq genes, curated and predicted (NM_*, XM_*, NR_*, XR_*, NP_*, YP_*) Genes and Gene Predictions TSS_activity_read_counts TSS activity(read counts) TSS activity per sample(read counts) Expression and Regulation Description The FANTOM5 track shows mapped transcription start sites (TSS) and their usage in primary cells, cell lines, and tissues to produce a comprehensive overview of gene expression across the human body by using single molecule sequencing. Display Conventions and Configuration Items in this track are colored according to their strand orientation. Blue indicates alignment to the negative strand, and red indicates alignment to the positive strand. Methods Protocol Individual biological states are profiled by HeliScopeCAGE, which is a variation of the CAGE (Cap Analysis Gene Expression) protocol based on a single molecule sequencer. The standard protocol requiring 5 µg of total RNA as a starting material is referred to as hCAGE, and an optimized version for a lower quantity (~ 100 ng) is referred to as LQhCAGE (Kanamori-Katyama et al. 2011). hCAGE LQhCAGE Samples Transcription start sites (TSSs) were mapped and their usage in human, mouse, dog, rat, macaque and chicken primary cells, cell lines, and tissues was to produce a comprehensive overview of mammalian gene expression across the human body. 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. Individual samples shown in "TSS activity" tracks are grouped as below. Primary cell Tissue Cell Line Time course Fractionation TSS peaks and enhancers TSS (CAGE) peaks across the panel of the biological states (samples) are identified by DPI (decomposition based peak identification, Forrest et al. 2014), where each of the peaks consists of neighboring and related TSSs. The peaks are used as anchors to define promoters and units of promoter-level expression analysis. Two subsets of the peaks are defined based on evidence of read counts, depending on scopes of subsequent analyses, and the first subset (referred as a robust set of the peaks, thresholded for expression analysis is shown as TSS peaks. The summary tracks consist of the TSS (CAGE) peaks, the enhancers, and summary profiles of TSS activities (total and maximum values). The summary track consists of the following tracks. TSS (CAGE) peaks the robust peaks TSS summary profiles Total counts and TPM (tags per million) in all the samples Maximum counts and TPM among the samples TSS activity 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. The read counts tracks indicate raw counts of CAGE reads, and the TPM tracks indicate normalized counts as TPM (tags per million). Categories of individual samples - Cell Line hCAGE - Cell Line LQhCAGE - fractionation hCAGE - Primary cell hCAGE - Primary cell LQhCAGE - Time course hCAGE - Tissue hCAGE Data Access FANTOM5 data can be explored interactively with the Table Browser and cross-referenced with the Data Integrator. For programmatic access, the track can be accessed using the Genome Browser's REST API. ReMap annotations can be downloaded from the Genome Browser's download server as a bigBed file. This compressed binary format can be remotely queried through command line utilities. Please note that some of the download files can be quite large. The FANTOM5 reprocessed data can be found and downloaded on the FANTOM website. Credits Thanks to Shuhei Noguchi, the FANTOM5 consortium, the Large Scale Data Managing Unit and Preventive Medicine and Applied Genomics Unit, the Center for Integrative Medical Sciences (IMS), and RIKEN for providing this data and its analysis. References Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014 Mar 27;507(7493):455-461. PMID: 24670763; PMC: PMC5215096 Arner E, Daub CO, Vitting-Seerup K, Andersson R, Lilje B, Drablos F, Lennartsson A, Ronnerblad M Hrydziuszko O, Vitezic M et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science. 2015 Feb 27;347(6225):1010-4. PMID: 25678556; PMC: PMC4681433 FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M et al. A promoter-level mammalian expression atlas. Nature. 2014 Mar 27;507(7493):462-70. PMID: 24670764; PMC: PMC4529748 Kanamori-Katayama M, Itoh M, Kawaji H, Lassmann T, Katayama S, Kojima M, Bertin N, Kaiho A, Ninomiya N, Daub CO et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 2011 Jul;21(7):1150-9. PMID: 21596820; PMC: PMC3129257 Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 2015 Jan 5;16(1):22. PMID: 25723102; PMC: PMC4310165 ChickenWingBudsDay03HH20_CNhs14212_ctss_rev WingBuds_day03- chicken wing buds, day03 (HH20)_CNhs14212_10232-103I7_reverse Expression and Regulation ChickenWingBudsDay03HH20_CNhs14212_ctss_fwd WingBuds_day03+ chicken wing buds, day03 (HH20)_CNhs14212_10232-103I7_forward Expression and Regulation ChickenLegBudsDay03HH20_CNhs14213_ctss_rev LegBuds_day03- chicken leg buds, day03 (HH20)_CNhs14213_10233-103I8_reverse Expression and Regulation ChickenLegBudsDay03HH20_CNhs14213_ctss_fwd LegBuds_day03+ chicken leg buds, day03 (HH20)_CNhs14213_10233-103I8_forward Expression and Regulation ChickenEmbryoExtraembryonicTissueDay15HH41_CNhs14211_ctss_rev Extraembryonic_day15- chicken embryo, extraembryonic tissue, day15 (HH41)_CNhs14211_10231-103I6_reverse Expression and Regulation ChickenEmbryoExtraembryonicTissueDay15HH41_CNhs14211_ctss_fwd Extraembryonic_day15+ chicken embryo, extraembryonic tissue, day15 (HH41)_CNhs14211_10231-103I6_forward Expression and Regulation ChickenEmbryoExtraembryonicTissueDay07HH32_CNhs14210_ctss_rev Extraembryonic_day07- chicken embryo, extraembryonic tissue, day07 (HH32)_CNhs14210_10230-103I5_reverse Expression and Regulation ChickenEmbryoExtraembryonicTissueDay07HH32_CNhs14210_ctss_fwd Extraembryonic_day07+ chicken embryo, extraembryonic tissue, day07 (HH32)_CNhs14210_10230-103I5_forward Expression and Regulation ChickenEmbryoWholeBodyDay20HH45_CNhs14207_ctss_rev EmbryoWholeBody_day20- chicken embryo whole body, day20 (HH45)_CNhs14207_10229-103I4_reverse Expression and Regulation ChickenEmbryoWholeBodyDay20HH45_CNhs14207_ctss_fwd EmbryoWholeBody_day20+ chicken embryo whole body, day20 (HH45)_CNhs14207_10229-103I4_forward Expression and Regulation ChickenEmbryoWholeBodyDay15HH41_CNhs14206_ctss_rev EmbryoWholeBody_day15- chicken embryo whole body, day15 (HH41)_CNhs14206_10228-103I3_reverse Expression and Regulation ChickenEmbryoWholeBodyDay15HH41_CNhs14206_ctss_fwd EmbryoWholeBody_day15+ chicken embryo whole body, day15 (HH41)_CNhs14206_10228-103I3_forward Expression and Regulation ChickenEmbryoWholeBodyDay10HH37_CNhs10485_ctss_rev EmbryoWholeBody_day10- chicken embryo whole body, day10 (HH37)_CNhs10485_10004-101A9_reverse Expression and Regulation ChickenEmbryoWholeBodyDay10HH37_CNhs10485_ctss_fwd EmbryoWholeBody_day10+ chicken embryo whole body, day10 (HH37)_CNhs10485_10004-101A9_forward Expression and Regulation ChickenEmbryoWholeBodyDay07HH32_CNhs14205_ctss_rev EmbryoWholeBody_day07- chicken embryo whole body, day07 (HH32)_CNhs14205_10227-103I2_reverse Expression and Regulation ChickenEmbryoWholeBodyDay07HH32_CNhs14205_ctss_fwd EmbryoWholeBody_day07+ chicken embryo whole body, day07 (HH32)_CNhs14205_10227-103I2_forward Expression and Regulation ChickenEmbryoWholeBodyDay05HH2627_CNhs13981_ctss_rev EmbryoWholeBody_day05- chicken embryo whole body, day05 (HH26-27)_CNhs13981_10226-103I1_reverse Expression and Regulation ChickenEmbryoWholeBodyDay05HH2527_CNhs14208_ctss_rev EmbryoWholeBody_day05- chicken embryo whole body, day05 (HH25-27)_CNhs14208_10234-103I9_reverse Expression and Regulation ChickenEmbryoWholeBodyDay05HH2627_CNhs13981_ctss_fwd EmbryoWholeBody_day05+ chicken embryo whole body, day05 (HH26-27)_CNhs13981_10226-103I1_forward Expression and Regulation ChickenEmbryoWholeBodyDay05HH2527_CNhs14208_ctss_fwd EmbryoWholeBody_day05+ chicken embryo whole body, day05 (HH25-27)_CNhs14208_10234-103I9_forward Expression and Regulation ChickenEmbryoWholeBodyDay04HH23_CNhs14204_ctss_rev EmbryoWholeBody_day04- chicken embryo whole body, day04 (HH23)_CNhs14204_10225-103H9_reverse Expression and Regulation ChickenEmbryoWholeBodyDay04HH23_CNhs14204_ctss_fwd EmbryoWholeBody_day04+ chicken embryo whole body, day04 (HH23)_CNhs14204_10225-103H9_forward Expression and Regulation ChickenEmbryoWholeBodyDay03HH19_CNhs14203_ctss_rev EmbryoWholeBody_day03- chicken embryo whole body, day03 (HH19)_CNhs14203_10223-103H7_reverse Expression and Regulation ChickenEmbryoWholeBodyDay03HH20_CNhs13980_ctss_rev EmbryoWholeBody_day03- chicken embryo whole body, day03 (HH20)_CNhs13980_10224-103H8_reverse Expression and Regulation ChickenEmbryoWholeBodyDay03HH19_CNhs14203_ctss_fwd EmbryoWholeBody_day03+ chicken embryo whole body, day03 (HH19)_CNhs14203_10223-103H7_forward Expression and Regulation ChickenEmbryoWholeBodyDay03HH20_CNhs13980_ctss_fwd EmbryoWholeBody_day03+ chicken embryo whole body, day03 (HH20)_CNhs13980_10224-103H8_forward Expression and Regulation ChickenEmbryoWholeBodyDay02_5HH17_CNhs13979_ctss_rev EmbryoWholeBody_day02_5- chicken embryo whole body, day02_5 (HH17)_CNhs13979_10222-103H6_reverse Expression and Regulation ChickenEmbryoWholeBodyDay02_5HH17_CNhs13979_ctss_fwd EmbryoWholeBody_day02_5+ chicken embryo whole body, day02_5 (HH17)_CNhs13979_10222-103H6_forward Expression and Regulation ChickenEmbryoWholeBodyDay02HH13_CNhs13978_ctss_rev EmbryoWholeBody_day02- chicken embryo whole body, day02 (HH13)_CNhs13978_10220-103H4_reverse Expression and Regulation ChickenEmbryoWholeBodyDay02HH14_CNhs14202_ctss_rev EmbryoWholeBody_day02- chicken embryo whole body, day02 (HH14)_CNhs14202_10221-103H5_reverse Expression and Regulation ChickenEmbryoWholeBodyDay02HH13_CNhs13978_ctss_fwd EmbryoWholeBody_day02+ chicken embryo whole body, day02 (HH13)_CNhs13978_10220-103H4_forward Expression and Regulation ChickenEmbryoWholeBodyDay02HH14_CNhs14202_ctss_fwd EmbryoWholeBody_day02+ chicken embryo whole body, day02 (HH14)_CNhs14202_10221-103H5_forward Expression and Regulation ChickenEmbryoWholeBodyDay01_5HH1112_CNhs14201_ctss_rev EmbryoWholeBody_day01_5- chicken embryo whole body, day01_5 (HH11-12)_CNhs14201_10219-103H3_reverse Expression and Regulation ChickenEmbryoWholeBodyDay01_5HH10_CNhs13977_ctss_rev EmbryoWholeBody_day01_5- chicken embryo whole body, day01_5 (HH10)_CNhs13977_10218-103H2_reverse Expression and Regulation ChickenEmbryoWholeBodyDay01_5HH1112_CNhs14201_ctss_fwd EmbryoWholeBody_day01_5+ chicken embryo whole body, day01_5 (HH11-12)_CNhs14201_10219-103H3_forward Expression and Regulation ChickenEmbryoWholeBodyDay01_5HH10_CNhs13977_ctss_fwd EmbryoWholeBody_day01_5+ chicken embryo whole body, day01_5 (HH10)_CNhs13977_10218-103H2_forward Expression and Regulation ChickenEmbryoWholeBodyDay01HH723Somites_CNhs14200_ctss_rev EmbryoWholeBody_day01- chicken embryo whole body, day01 (HH7 2-3 somites)_CNhs14200_10217-103H1_reverse Expression and Regulation ChickenEmbryoWholeBodyDay01HH713Somites_CNhs13976_ctss_rev EmbryoWholeBody_day01- chicken embryo whole body, day01 (HH7 1-3 somites)_CNhs13976_10216-103G9_reverse Expression and Regulation ChickenEmbryoWholeBodyDay01HH723Somites_CNhs14200_ctss_fwd EmbryoWholeBody_day01+ chicken embryo whole body, day01 (HH7 2-3 somites)_CNhs14200_10217-103H1_forward Expression and Regulation ChickenEmbryoWholeBodyDay01HH713Somites_CNhs13976_ctss_fwd EmbryoWholeBody_day01+ chicken embryo whole body, day01 (HH7 1-3 somites)_CNhs13976_10216-103G9_forward Expression and Regulation ChickenEmbryoWholeBody22hrHH56_CNhs12736_ctss_rev EmbryoWholeBody_22hr- chicken embryo whole body, 22hr (HH5-6)_CNhs12736_10215-103G8_reverse Expression and Regulation ChickenEmbryoWholeBody22hrHH56_CNhs12736_ctss_fwd EmbryoWholeBody_22hr+ chicken embryo whole body, 22hr (HH5-6)_CNhs12736_10215-103G8_forward Expression and Regulation ChickenEmbryoWholeBody18hrHH4_CNhs13030_ctss_rev EmbryoWholeBody_18hr- chicken embryo whole body, 18hr (HH4)_CNhs13030_10214-103G7_reverse Expression and Regulation ChickenEmbryoWholeBody18hrHH4_CNhs13030_ctss_fwd EmbryoWholeBody_18hr+ chicken embryo whole body, 18hr (HH4)_CNhs13030_10214-103G7_forward Expression and Regulation ChickenEmbryoWholeBody14hrHH3_CNhs13029_ctss_rev EmbryoWholeBody_14hr- chicken embryo whole body, 14hr (HH3)_CNhs13029_10213-103G6_reverse Expression and Regulation ChickenEmbryoWholeBody14hrHH3_CNhs13029_ctss_fwd EmbryoWholeBody_14hr+ chicken embryo whole body, 14hr (HH3)_CNhs13029_10213-103G6_forward Expression and Regulation ChickenEmbryoWholeBody06hrHH12_CNhs13028_ctss_rev EmbryoWholeBody_06hr- chicken embryo whole body, 06hr (HH1-2)_CNhs13028_10212-103G5_reverse Expression and Regulation ChickenEmbryoWholeBody06hrHH12_CNhs13028_ctss_fwd EmbryoWholeBody_06hr+ chicken embryo whole body, 06hr (HH1-2)_CNhs13028_10212-103G5_forward Expression and Regulation ChickenEmbryoWholeBody05hr30minHH1_CNhs13027_ctss_rev EmbryoWholeBody_05hr30min- chicken embryo whole body, 05hr30min (HH1)_CNhs13027_10211-103G4_reverse Expression and Regulation ChickenEmbryoWholeBody05hr30minHH1_CNhs13027_ctss_fwd EmbryoWholeBody_05hr30min+ chicken embryo whole body, 05hr30min (HH1)_CNhs13027_10211-103G4_forward Expression and Regulation ChickenEmbryoWholeBody01hr30minHH1_CNhs12735_ctss_rev EmbryoWholeBody_01hr30min- chicken embryo whole body, 01hr30min (HH1)_CNhs12735_10210-103G3_reverse Expression and Regulation ChickenEmbryoWholeBody01hr30minHH1_CNhs12735_ctss_fwd EmbryoWholeBody_01hr30min+ chicken embryo whole body, 01hr30min (HH1)_CNhs12735_10210-103G3_forward Expression and Regulation MesenchymalStemCellsBoneMarrowDerivedDonor1_CNhs11292_ctss_rev MscBoneMarrowD1- Mesenchymal stem cells - bone marrow derived, donor1_CNhs11292_11294-117A7_reverse Expression and Regulation MesenchymalStemCellsBoneMarrowDerivedDonor1_CNhs11292_ctss_fwd MscBoneMarrowD1+ Mesenchymal stem cells - bone marrow derived, donor1_CNhs11292_11294-117A7_forward Expression and Regulation HepatocytesDonor2_CNhs11306_ctss_rev HepatocytesD2- hepatocytes, donor2_CNhs11306_11337-117F5_reverse Expression and Regulation HepatocytesDonor2_CNhs11306_ctss_fwd HepatocytesD2+ hepatocytes, donor2_CNhs11306_11337-117F5_forward Expression and Regulation HepatocytesDonor1_CNhs11922_ctss_rev HepatocytesD1- hepatocytes, donor1_CNhs11922_11260-116F9_reverse Expression and Regulation HepatocytesDonor1_CNhs11922_ctss_fwd HepatocytesD1+ hepatocytes, donor1_CNhs11922_11260-116F9_forward Expression and Regulation AorticSmoothMuscleCellsDonor3_CNhs11314_ctss_rev AorticSmcD3- Aortic Smooth Muscle cells, donor3_CNhs11314_11447-118I7_reverse Expression and Regulation AorticSmoothMuscleCellsDonor3_CNhs11314_ctss_fwd AorticSmcD3+ Aortic Smooth Muscle cells, donor3_CNhs11314_11447-118I7_forward Expression and Regulation AorticSmoothMuscleCellsDonor2_CNhs11299_ctss_rev AorticSmcD2- Aortic Smooth Muscle cells, donor2_CNhs11299_11375-118A7_reverse Expression and Regulation AorticSmoothMuscleCellsDonor2_CNhs11299_ctss_fwd AorticSmcD2+ Aortic Smooth Muscle cells, donor2_CNhs11299_11375-118A7_forward Expression and Regulation AorticSmoothMuscleCellsDonor1_CNhs11296_ctss_rev AorticSmcD1- Aortic Smooth Muscle cells, donor1_CNhs11296_11298-117B2_reverse Expression and Regulation AorticSmoothMuscleCellsDonor1_CNhs11296_ctss_fwd AorticSmcD1+ Aortic Smooth Muscle cells, donor1_CNhs11296_11298-117B2_forward Expression and Regulation cpgIslandExtUnmasked Unmasked CpG CpG Islands on All Sequence (Islands < 300 Bases are Light Green) Expression and Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 TSS_activity_TPM TSS activity(TPM) TSS activity per sample(TPM) Expression and Regulation Description The FANTOM5 track shows mapped transcription start sites (TSS) and their usage in primary cells, cell lines, and tissues to produce a comprehensive overview of gene expression across the human body by using single molecule sequencing. Display Conventions and Configuration Items in this track are colored according to their strand orientation. Blue indicates alignment to the negative strand, and red indicates alignment to the positive strand. Methods Protocol Individual biological states are profiled by HeliScopeCAGE, which is a variation of the CAGE (Cap Analysis Gene Expression) protocol based on a single molecule sequencer. The standard protocol requiring 5 µg of total RNA as a starting material is referred to as hCAGE, and an optimized version for a lower quantity (~ 100 ng) is referred to as LQhCAGE (Kanamori-Katyama et al. 2011). hCAGE LQhCAGE Samples Transcription start sites (TSSs) were mapped and their usage in human, mouse, dog, rat, macaque and chicken primary cells, cell lines, and tissues was to produce a comprehensive overview of mammalian gene expression across the human body. 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. Individual samples shown in "TSS activity" tracks are grouped as below. Primary cell Tissue Cell Line Time course Fractionation TSS peaks and enhancers TSS (CAGE) peaks across the panel of the biological states (samples) are identified by DPI (decomposition based peak identification, Forrest et al. 2014), where each of the peaks consists of neighboring and related TSSs. The peaks are used as anchors to define promoters and units of promoter-level expression analysis. Two subsets of the peaks are defined based on evidence of read counts, depending on scopes of subsequent analyses, and the first subset (referred as a robust set of the peaks, thresholded for expression analysis is shown as TSS peaks. The summary tracks consist of the TSS (CAGE) peaks, the enhancers, and summary profiles of TSS activities (total and maximum values). The summary track consists of the following tracks. TSS (CAGE) peaks the robust peaks TSS summary profiles Total counts and TPM (tags per million) in all the samples Maximum counts and TPM among the samples TSS activity 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. The read counts tracks indicate raw counts of CAGE reads, and the TPM tracks indicate normalized counts as TPM (tags per million). Categories of individual samples - Cell Line hCAGE - Cell Line LQhCAGE - fractionation hCAGE - Primary cell hCAGE - Primary cell LQhCAGE - Time course hCAGE - Tissue hCAGE Data Access FANTOM5 data can be explored interactively with the Table Browser and cross-referenced with the Data Integrator. For programmatic access, the track can be accessed using the Genome Browser's REST API. ReMap annotations can be downloaded from the Genome Browser's download server as a bigBed file. This compressed binary format can be remotely queried through command line utilities. Please note that some of the download files can be quite large. The FANTOM5 reprocessed data can be found and downloaded on the FANTOM website. Credits Thanks to Shuhei Noguchi, the FANTOM5 consortium, the Large Scale Data Managing Unit and Preventive Medicine and Applied Genomics Unit, the Center for Integrative Medical Sciences (IMS), and RIKEN for providing this data and its analysis. References Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014 Mar 27;507(7493):455-461. PMID: 24670763; PMC: PMC5215096 Arner E, Daub CO, Vitting-Seerup K, Andersson R, Lilje B, Drablos F, Lennartsson A, Ronnerblad M Hrydziuszko O, Vitezic M et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science. 2015 Feb 27;347(6225):1010-4. PMID: 25678556; PMC: PMC4681433 FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M et al. A promoter-level mammalian expression atlas. Nature. 2014 Mar 27;507(7493):462-70. PMID: 24670764; PMC: PMC4529748 Kanamori-Katayama M, Itoh M, Kawaji H, Lassmann T, Katayama S, Kojima M, Bertin N, Kaiho A, Ninomiya N, Daub CO et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 2011 Jul;21(7):1150-9. PMID: 21596820; PMC: PMC3129257 Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 2015 Jan 5;16(1):22. PMID: 25723102; PMC: PMC4310165 ChickenWingBudsDay03HH20_CNhs14212_tpm_rev WingBuds_day03- chicken wing buds, day03 (HH20)_CNhs14212_10232-103I7_reverse Expression and Regulation ChickenWingBudsDay03HH20_CNhs14212_tpm_fwd WingBuds_day03+ chicken wing buds, day03 (HH20)_CNhs14212_10232-103I7_forward Expression and Regulation ChickenLegBudsDay03HH20_CNhs14213_tpm_rev LegBuds_day03- chicken leg buds, day03 (HH20)_CNhs14213_10233-103I8_reverse Expression and Regulation ChickenLegBudsDay03HH20_CNhs14213_tpm_fwd LegBuds_day03+ chicken leg buds, day03 (HH20)_CNhs14213_10233-103I8_forward Expression and Regulation ChickenEmbryoExtraembryonicTissueDay15HH41_CNhs14211_tpm_rev Extraembryonic_day15- chicken embryo, extraembryonic tissue, day15 (HH41)_CNhs14211_10231-103I6_reverse Expression and Regulation ChickenEmbryoExtraembryonicTissueDay15HH41_CNhs14211_tpm_fwd Extraembryonic_day15+ chicken embryo, extraembryonic tissue, day15 (HH41)_CNhs14211_10231-103I6_forward Expression and Regulation ChickenEmbryoExtraembryonicTissueDay07HH32_CNhs14210_tpm_rev Extraembryonic_day07- chicken embryo, extraembryonic tissue, day07 (HH32)_CNhs14210_10230-103I5_reverse Expression and Regulation ChickenEmbryoExtraembryonicTissueDay07HH32_CNhs14210_tpm_fwd Extraembryonic_day07+ chicken embryo, extraembryonic tissue, day07 (HH32)_CNhs14210_10230-103I5_forward Expression and Regulation ChickenEmbryoWholeBodyDay20HH45_CNhs14207_tpm_rev EmbryoWholeBody_day20- chicken embryo whole body, day20 (HH45)_CNhs14207_10229-103I4_reverse Expression and Regulation ChickenEmbryoWholeBodyDay20HH45_CNhs14207_tpm_fwd EmbryoWholeBody_day20+ chicken embryo whole body, day20 (HH45)_CNhs14207_10229-103I4_forward Expression and Regulation ChickenEmbryoWholeBodyDay15HH41_CNhs14206_tpm_rev EmbryoWholeBody_day15- chicken embryo whole body, day15 (HH41)_CNhs14206_10228-103I3_reverse Expression and Regulation ChickenEmbryoWholeBodyDay15HH41_CNhs14206_tpm_fwd EmbryoWholeBody_day15+ chicken embryo whole body, day15 (HH41)_CNhs14206_10228-103I3_forward Expression and Regulation ChickenEmbryoWholeBodyDay10HH37_CNhs10485_tpm_rev EmbryoWholeBody_day10- chicken embryo whole body, day10 (HH37)_CNhs10485_10004-101A9_reverse Expression and Regulation ChickenEmbryoWholeBodyDay10HH37_CNhs10485_tpm_fwd EmbryoWholeBody_day10+ chicken embryo whole body, day10 (HH37)_CNhs10485_10004-101A9_forward Expression and Regulation ChickenEmbryoWholeBodyDay07HH32_CNhs14205_tpm_rev EmbryoWholeBody_day07- chicken embryo whole body, day07 (HH32)_CNhs14205_10227-103I2_reverse Expression and Regulation ChickenEmbryoWholeBodyDay07HH32_CNhs14205_tpm_fwd EmbryoWholeBody_day07+ chicken embryo whole body, day07 (HH32)_CNhs14205_10227-103I2_forward Expression and Regulation ChickenEmbryoWholeBodyDay05HH2527_CNhs14208_tpm_rev EmbryoWholeBody_day05- chicken embryo whole body, day05 (HH25-27)_CNhs14208_10234-103I9_reverse Expression and Regulation ChickenEmbryoWholeBodyDay05HH2627_CNhs13981_tpm_rev EmbryoWholeBody_day05- chicken embryo whole body, day05 (HH26-27)_CNhs13981_10226-103I1_reverse Expression and Regulation ChickenEmbryoWholeBodyDay05HH2527_CNhs14208_tpm_fwd EmbryoWholeBody_day05+ chicken embryo whole body, day05 (HH25-27)_CNhs14208_10234-103I9_forward Expression and Regulation ChickenEmbryoWholeBodyDay05HH2627_CNhs13981_tpm_fwd EmbryoWholeBody_day05+ chicken embryo whole body, day05 (HH26-27)_CNhs13981_10226-103I1_forward Expression and Regulation ChickenEmbryoWholeBodyDay04HH23_CNhs14204_tpm_rev EmbryoWholeBody_day04- chicken embryo whole body, day04 (HH23)_CNhs14204_10225-103H9_reverse Expression and Regulation ChickenEmbryoWholeBodyDay04HH23_CNhs14204_tpm_fwd EmbryoWholeBody_day04+ chicken embryo whole body, day04 (HH23)_CNhs14204_10225-103H9_forward Expression and Regulation ChickenEmbryoWholeBodyDay03HH19_CNhs14203_tpm_rev EmbryoWholeBody_day03- chicken embryo whole body, day03 (HH19)_CNhs14203_10223-103H7_reverse Expression and Regulation ChickenEmbryoWholeBodyDay03HH20_CNhs13980_tpm_rev EmbryoWholeBody_day03- chicken embryo whole body, day03 (HH20)_CNhs13980_10224-103H8_reverse Expression and Regulation ChickenEmbryoWholeBodyDay03HH20_CNhs13980_tpm_fwd EmbryoWholeBody_day03+ chicken embryo whole body, day03 (HH20)_CNhs13980_10224-103H8_forward Expression and Regulation ChickenEmbryoWholeBodyDay03HH19_CNhs14203_tpm_fwd EmbryoWholeBody_day03+ chicken embryo whole body, day03 (HH19)_CNhs14203_10223-103H7_forward Expression and Regulation ChickenEmbryoWholeBodyDay02_5HH17_CNhs13979_tpm_rev EmbryoWholeBody_day02_5- chicken embryo whole body, day02_5 (HH17)_CNhs13979_10222-103H6_reverse Expression and Regulation ChickenEmbryoWholeBodyDay02_5HH17_CNhs13979_tpm_fwd EmbryoWholeBody_day02_5+ chicken embryo whole body, day02_5 (HH17)_CNhs13979_10222-103H6_forward Expression and Regulation ChickenEmbryoWholeBodyDay02HH14_CNhs14202_tpm_rev EmbryoWholeBody_day02- chicken embryo whole body, day02 (HH14)_CNhs14202_10221-103H5_reverse Expression and Regulation ChickenEmbryoWholeBodyDay02HH13_CNhs13978_tpm_rev EmbryoWholeBody_day02- chicken embryo whole body, day02 (HH13)_CNhs13978_10220-103H4_reverse Expression and Regulation ChickenEmbryoWholeBodyDay02HH14_CNhs14202_tpm_fwd EmbryoWholeBody_day02+ chicken embryo whole body, day02 (HH14)_CNhs14202_10221-103H5_forward Expression and Regulation ChickenEmbryoWholeBodyDay02HH13_CNhs13978_tpm_fwd EmbryoWholeBody_day02+ chicken embryo whole body, day02 (HH13)_CNhs13978_10220-103H4_forward Expression and Regulation ChickenEmbryoWholeBodyDay01_5HH1112_CNhs14201_tpm_rev EmbryoWholeBody_day01_5- chicken embryo whole body, day01_5 (HH11-12)_CNhs14201_10219-103H3_reverse Expression and Regulation ChickenEmbryoWholeBodyDay01_5HH10_CNhs13977_tpm_rev EmbryoWholeBody_day01_5- chicken embryo whole body, day01_5 (HH10)_CNhs13977_10218-103H2_reverse Expression and Regulation ChickenEmbryoWholeBodyDay01_5HH1112_CNhs14201_tpm_fwd EmbryoWholeBody_day01_5+ chicken embryo whole body, day01_5 (HH11-12)_CNhs14201_10219-103H3_forward Expression and Regulation ChickenEmbryoWholeBodyDay01_5HH10_CNhs13977_tpm_fwd EmbryoWholeBody_day01_5+ chicken embryo whole body, day01_5 (HH10)_CNhs13977_10218-103H2_forward Expression and Regulation ChickenEmbryoWholeBodyDay01HH713Somites_CNhs13976_tpm_rev EmbryoWholeBody_day01- chicken embryo whole body, day01 (HH7 1-3 somites)_CNhs13976_10216-103G9_reverse Expression and Regulation ChickenEmbryoWholeBodyDay01HH723Somites_CNhs14200_tpm_rev EmbryoWholeBody_day01- chicken embryo whole body, day01 (HH7 2-3 somites)_CNhs14200_10217-103H1_reverse Expression and Regulation ChickenEmbryoWholeBodyDay01HH713Somites_CNhs13976_tpm_fwd EmbryoWholeBody_day01+ chicken embryo whole body, day01 (HH7 1-3 somites)_CNhs13976_10216-103G9_forward Expression and Regulation ChickenEmbryoWholeBodyDay01HH723Somites_CNhs14200_tpm_fwd EmbryoWholeBody_day01+ chicken embryo whole body, day01 (HH7 2-3 somites)_CNhs14200_10217-103H1_forward Expression and Regulation ChickenEmbryoWholeBody22hrHH56_CNhs12736_tpm_rev EmbryoWholeBody_22hr- chicken embryo whole body, 22hr (HH5-6)_CNhs12736_10215-103G8_reverse Expression and Regulation ChickenEmbryoWholeBody22hrHH56_CNhs12736_tpm_fwd EmbryoWholeBody_22hr+ chicken embryo whole body, 22hr (HH5-6)_CNhs12736_10215-103G8_forward Expression and Regulation ChickenEmbryoWholeBody18hrHH4_CNhs13030_tpm_rev EmbryoWholeBody_18hr- chicken embryo whole body, 18hr (HH4)_CNhs13030_10214-103G7_reverse Expression and Regulation ChickenEmbryoWholeBody18hrHH4_CNhs13030_tpm_fwd EmbryoWholeBody_18hr+ chicken embryo whole body, 18hr (HH4)_CNhs13030_10214-103G7_forward Expression and Regulation ChickenEmbryoWholeBody14hrHH3_CNhs13029_tpm_rev EmbryoWholeBody_14hr- chicken embryo whole body, 14hr (HH3)_CNhs13029_10213-103G6_reverse Expression and Regulation ChickenEmbryoWholeBody14hrHH3_CNhs13029_tpm_fwd EmbryoWholeBody_14hr+ chicken embryo whole body, 14hr (HH3)_CNhs13029_10213-103G6_forward Expression and Regulation ChickenEmbryoWholeBody06hrHH12_CNhs13028_tpm_rev EmbryoWholeBody_06hr- chicken embryo whole body, 06hr (HH1-2)_CNhs13028_10212-103G5_reverse Expression and Regulation ChickenEmbryoWholeBody06hrHH12_CNhs13028_tpm_fwd EmbryoWholeBody_06hr+ chicken embryo whole body, 06hr (HH1-2)_CNhs13028_10212-103G5_forward Expression and Regulation ChickenEmbryoWholeBody05hr30minHH1_CNhs13027_tpm_rev EmbryoWholeBody_05hr30min- chicken embryo whole body, 05hr30min (HH1)_CNhs13027_10211-103G4_reverse Expression and Regulation ChickenEmbryoWholeBody05hr30minHH1_CNhs13027_tpm_fwd EmbryoWholeBody_05hr30min+ chicken embryo whole body, 05hr30min (HH1)_CNhs13027_10211-103G4_forward Expression and Regulation ChickenEmbryoWholeBody01hr30minHH1_CNhs12735_tpm_rev EmbryoWholeBody_01hr30min- chicken embryo whole body, 01hr30min (HH1)_CNhs12735_10210-103G3_reverse Expression and Regulation ChickenEmbryoWholeBody01hr30minHH1_CNhs12735_tpm_fwd EmbryoWholeBody_01hr30min+ chicken embryo whole body, 01hr30min (HH1)_CNhs12735_10210-103G3_forward Expression and Regulation MesenchymalStemCellsBoneMarrowDerivedDonor1_CNhs11292_tpm_rev MscBoneMarrowD1- Mesenchymal stem cells - bone marrow derived, donor1_CNhs11292_11294-117A7_reverse Expression and Regulation MesenchymalStemCellsBoneMarrowDerivedDonor1_CNhs11292_tpm_fwd MscBoneMarrowD1+ Mesenchymal stem cells - bone marrow derived, donor1_CNhs11292_11294-117A7_forward Expression and Regulation AorticSmoothMuscleCellsDonor3_CNhs11314_tpm_rev AorticSmcD3- Aortic Smooth Muscle cells, donor3_CNhs11314_11447-118I7_reverse Expression and Regulation AorticSmoothMuscleCellsDonor3_CNhs11314_tpm_fwd AorticSmcD3+ Aortic Smooth Muscle cells, donor3_CNhs11314_11447-118I7_forward Expression and Regulation AorticSmoothMuscleCellsDonor2_CNhs11299_tpm_rev AorticSmcD2- Aortic Smooth Muscle cells, donor2_CNhs11299_11375-118A7_reverse Expression and Regulation AorticSmoothMuscleCellsDonor2_CNhs11299_tpm_fwd AorticSmcD2+ Aortic Smooth Muscle cells, donor2_CNhs11299_11375-118A7_forward Expression and Regulation AorticSmoothMuscleCellsDonor1_CNhs11296_tpm_rev AorticSmcD1- Aortic Smooth Muscle cells, donor1_CNhs11296_11298-117B2_reverse Expression and Regulation AorticSmoothMuscleCellsDonor1_CNhs11296_tpm_fwd AorticSmcD1+ Aortic Smooth Muscle cells, donor1_CNhs11296_11298-117B2_forward Expression and Regulation transMapEnsemblV5 TransMap Ensembl TransMap Ensembl and GENCODE Mappings Version 5 Genes and Gene Predictions Description This track contains GENCODE or Ensembl alignments produced by the TransMap cross-species alignment algorithm from other vertebrate species in the UCSC Genome Browser. GENCODE is Ensembl for human and mouse, for other Ensembl sources, only ones with full gene builds are used. Projection Ensembl gene annotations will not be used as sources. For closer evolutionary distances, the alignments are created using syntenically filtered BLASTZ alignment chains, resulting in a prediction of the orthologous genes in chicken. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare cDNAs against the genomic sequence. For more information about this option, click here. Several types of alignment gap may also be colored; for more information, click here. Methods Source transcript alignments were obtained from vertebrate organisms in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes, were used as available. For all vertebrate assemblies that had BLASTZ alignment chains and nets to the chicken (galGal5) genome, a subset of the alignment chains were selected as follows: For organisms whose branch distance was no more than 0.5 (as computed by phyloFit, see Conservation track description for details), syntenic filtering was used. Reciprocal best nets were used if available; otherwise, nets were selected with the netfilter -syn command. The chains corresponding to the selected nets were used for mapping. For more distant species, where the determination of synteny is difficult, the full set of chains was used for mapping. This allows for more genes to map at the expense of some mapping to paralogous regions. The post-alignment filtering step removes some of the duplications. The pslMap program was used to do a base-level projection of the source transcript alignments via the selected chains to the chicken genome, resulting in pairwise alignments of the source transcripts to the genome. The resulting alignments were filtered with pslCDnaFilter with a global near-best criteria of 0.5% in finished genomes (human and mouse) and 1.0% in other genomes. Alignments where less than 20% of the transcript mapped were discarded. To ensure unique identifiers for each alignment, cDNA and gene accessions were made unique by appending a suffix for each location in the source genome and again for each mapped location in the destination genome. The format is: accession.version-srcUniq.destUniq Where srcUniq is a number added to make each source alignment unique, and destUniq is added to give the subsequent TransMap alignments unique identifiers. For example, in the cow genome, there are two alignments of mRNA BC149621.1. These are assigned the identifiers BC149621.1-1 and BC149621.1-2. When these are mapped to the human genome, BC149621.1-1 maps to a single location and is given the identifier BC149621.1-1.1. However, BC149621.1-2 maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note that multiple TransMap mappings are usually the result of tandem duplications, where both chains are identified as syntenic. Data Access The raw data for these tracks can be accessed interactively through the Table Browser or the Data Integrator. For automated analysis, the annotations are stored in bigPsl files (containing a number of extra columns) and can be downloaded from our download server, or queried using our API. For more information on accessing track data see our Track Data Access FAQ. The files are associated with these tracks in the following way: TransMap Ensembl - galGal5.ensembl.transMapV4.bigPsl TransMap RefGene - galGal5.refseq.transMapV4.bigPsl TransMap RNA - galGal5.rna.transMapV4.bigPsl TransMap ESTs - galGal5.est.transMapV4.bigPsl Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/galGal5/transMap/V4/galGal5.refseq.transMapV4.bigPsl -chrom=chr6 -start=0 -end=1000000 stdout Credits This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data submitted to the international public sequence databases by scientists worldwide and annotations produced by the RefSeq, Ensembl, and GENCODE annotations projects. References Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S, Lau C et al. Targeted discovery of novel human exons by comparative genomics. Genome Res. 2007 Dec;17(12):1763-73. PMID: 17989246; PMC: PMC2099585 Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656 Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol. 2007 Dec;3(12):e247. PMID: 18085818; PMC: PMC2134963 transMapV5 TransMap V5 TransMap Alignments Version 5 Genes and Gene Predictions Description These tracks contain cDNA and gene alignments produced by the TransMap cross-species alignment algorithm from other vertebrate species in the UCSC Genome Browser. For closer evolutionary distances, the alignments are created using syntenically filtered LASTZ or BLASTZ alignment chains, resulting in a prediction of the orthologous genes in chicken. For more distant organisms, reciprocal best alignments are used. TransMap maps genes and related annotations in one species to another using synteny-filtered pairwise genome alignments (chains and nets) to determine the most likely orthologs. For example, for the mRNA TransMap track on the human assembly, more than 400,000 mRNAs from 25 vertebrate species were aligned at high stringency to the native assembly using BLAT. The alignments were then mapped to the human assembly using the chain and net alignments produced using BLASTZ, which has higher sensitivity than BLAT for diverged organisms. Compared to translated BLAT, TransMap finds fewer paralogs and aligns more UTR bases. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare cDNAs against the genomic sequence. For more information about this option, click here. Several types of alignment gap may also be colored; for more information, click here. Methods Source transcript alignments were obtained from vertebrate organisms in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes, were used as available. For all vertebrate assemblies that had BLASTZ alignment chains and nets to the chicken (galGal5) genome, a subset of the alignment chains were selected as follows: For organisms whose branch distance was no more than 0.5 (as computed by phyloFit, see Conservation track description for details), syntenic filtering was used. Reciprocal best nets were used if available; otherwise, nets were selected with the netfilter -syn command. The chains corresponding to the selected nets were used for mapping. For more distant species, where the determination of synteny is difficult, the full set of chains was used for mapping. This allows for more genes to map at the expense of some mapping to paralogous regions. The post-alignment filtering step removes some of the duplications. The pslMap program was used to do a base-level projection of the source transcript alignments via the selected chains to the chicken genome, resulting in pairwise alignments of the source transcripts to the genome. The resulting alignments were filtered with pslCDnaFilter with a global near-best criteria of 0.5% in finished genomes (human and mouse) and 1.0% in other genomes. Alignments where less than 20% of the transcript mapped were discarded. To ensure unique identifiers for each alignment, cDNA and gene accessions were made unique by appending a suffix for each location in the source genome and again for each mapped location in the destination genome. The format is: accession.version-srcUniq.destUniq Where srcUniq is a number added to make each source alignment unique, and destUniq is added to give the subsequent TransMap alignments unique identifiers. For example, in the cow genome, there are two alignments of mRNA BC149621.1. These are assigned the identifiers BC149621.1-1 and BC149621.1-2. When these are mapped to the human genome, BC149621.1-1 maps to a single location and is given the identifier BC149621.1-1.1. However, BC149621.1-2 maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note that multiple TransMap mappings are usually the result of tandem duplications, where both chains are identified as syntenic. Data Access The raw data for these tracks can be accessed interactively through the Table Browser or the Data Integrator. For automated analysis, the annotations are stored in bigPsl files (containing a number of extra columns) and can be downloaded from our download server, or queried using our API. For more information on accessing track data see our Track Data Access FAQ. The files are associated with these tracks in the following way: TransMap Ensembl - galGal5.ensembl.transMapV5.bigPsl TransMap RefGene - galGal5.refseq.transMapV5.bigPsl TransMap RNA - galGal5.rna.transMapV5.bigPsl TransMap ESTs - galGal5.est.transMapV5.bigPsl Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/galGal5/transMap/V5/galGal5.refseq.transMapV5.bigPsl -chrom=chr6 -start=0 -end=1000000 stdout Credits This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data submitted to the international public sequence databases by scientists worldwide and annotations produced by the RefSeq, Ensembl, and GENCODE annotations projects. References Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S, Lau C et al. Targeted discovery of novel human exons by comparative genomics. Genome Res. 2007 Dec;17(12):1763-73. PMID: 17989246; PMC: PMC2099585 Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656 Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol. 2007 Dec;3(12):e247. PMID: 18085818; PMC: PMC2134963 transMapRefSeqV5 TransMap RefGene TransMap RefSeq Gene Mappings Version 5 Genes and Gene Predictions Description This track contains RefSeq Gene alignments produced by the TransMap cross-species alignment algorithm from other vertebrate species in the UCSC Genome Browser. For closer evolutionary distances, the alignments are created using syntenically filtered BLASTZ alignment chains, resulting in a prediction of the orthologous genes in chicken. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare cDNAs against the genomic sequence. For more information about this option, click here. Several types of alignment gap may also be colored; for more information, click here. Methods Source transcript alignments were obtained from vertebrate organisms in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes, were used as available. For all vertebrate assemblies that had BLASTZ alignment chains and nets to the chicken (galGal5) genome, a subset of the alignment chains were selected as follows: For organisms whose branch distance was no more than 0.5 (as computed by phyloFit, see Conservation track description for details), syntenic filtering was used. Reciprocal best nets were used if available; otherwise, nets were selected with the netfilter -syn command. The chains corresponding to the selected nets were used for mapping. For more distant species, where the determination of synteny is difficult, the full set of chains was used for mapping. This allows for more genes to map at the expense of some mapping to paralogous regions. The post-alignment filtering step removes some of the duplications. The pslMap program was used to do a base-level projection of the source transcript alignments via the selected chains to the chicken genome, resulting in pairwise alignments of the source transcripts to the genome. The resulting alignments were filtered with pslCDnaFilter with a global near-best criteria of 0.5% in finished genomes (human and mouse) and 1.0% in other genomes. Alignments where less than 20% of the transcript mapped were discarded. To ensure unique identifiers for each alignment, cDNA and gene accessions were made unique by appending a suffix for each location in the source genome and again for each mapped location in the destination genome. The format is: accession.version-srcUniq.destUniq Where srcUniq is a number added to make each source alignment unique, and destUniq is added to give the subsequent TransMap alignments unique identifiers. For example, in the cow genome, there are two alignments of mRNA BC149621.1. These are assigned the identifiers BC149621.1-1 and BC149621.1-2. When these are mapped to the human genome, BC149621.1-1 maps to a single location and is given the identifier BC149621.1-1.1. However, BC149621.1-2 maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note that multiple TransMap mappings are usually the result of tandem duplications, where both chains are identified as syntenic. Data Access The raw data for these tracks can be accessed interactively through the Table Browser or the Data Integrator. For automated analysis, the annotations are stored in bigPsl files (containing a number of extra columns) and can be downloaded from our download server, or queried using our API. For more information on accessing track data see our Track Data Access FAQ. The files are associated with these tracks in the following way: TransMap Ensembl - galGal5.ensembl.transMapV4.bigPsl TransMap RefGene - galGal5.refseq.transMapV4.bigPsl TransMap RNA - galGal5.rna.transMapV4.bigPsl TransMap ESTs - galGal5.est.transMapV4.bigPsl Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/galGal5/transMap/V4/galGal5.refseq.transMapV4.bigPsl -chrom=chr6 -start=0 -end=1000000 stdout Credits This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data submitted to the international public sequence databases by scientists worldwide and annotations produced by the RefSeq, Ensembl, and GENCODE annotations projects. References Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S, Lau C et al. Targeted discovery of novel human exons by comparative genomics. Genome Res. 2007 Dec;17(12):1763-73. PMID: 17989246; PMC: PMC2099585 Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656 Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol. 2007 Dec;3(12):e247. PMID: 18085818; PMC: PMC2134963 transMapRnaV5 TransMap RNA TransMap GenBank RNA Mappings Version 5 Genes and Gene Predictions Description This track contains GenBank mRNA alignments produced by the TransMap cross-species alignment algorithm from other vertebrate species in the UCSC Genome Browser. For closer evolutionary distances, the alignments are created using syntenically filtered BLASTZ alignment chains, resulting in a prediction of the orthologous genes in chicken. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare cDNAs against the genomic sequence. For more information about this option, click here. Several types of alignment gap may also be colored; for more information, click here. Methods Source transcript alignments were obtained from vertebrate organisms in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes, were used as available. For all vertebrate assemblies that had BLASTZ alignment chains and nets to the chicken (galGal5) genome, a subset of the alignment chains were selected as follows: For organisms whose branch distance was no more than 0.5 (as computed by phyloFit, see Conservation track description for details), syntenic filtering was used. Reciprocal best nets were used if available; otherwise, nets were selected with the netfilter -syn command. The chains corresponding to the selected nets were used for mapping. For more distant species, where the determination of synteny is difficult, the full set of chains was used for mapping. This allows for more genes to map at the expense of some mapping to paralogous regions. The post-alignment filtering step removes some of the duplications. The pslMap program was used to do a base-level projection of the source transcript alignments via the selected chains to the chicken genome, resulting in pairwise alignments of the source transcripts to the genome. The resulting alignments were filtered with pslCDnaFilter with a global near-best criteria of 0.5% in finished genomes (human and mouse) and 1.0% in other genomes. Alignments where less than 20% of the transcript mapped were discarded. To ensure unique identifiers for each alignment, cDNA and gene accessions were made unique by appending a suffix for each location in the source genome and again for each mapped location in the destination genome. The format is: accession.version-srcUniq.destUniq Where srcUniq is a number added to make each source alignment unique, and destUniq is added to give the subsequent TransMap alignments unique identifiers. For example, in the cow genome, there are two alignments of mRNA BC149621.1. These are assigned the identifiers BC149621.1-1 and BC149621.1-2. When these are mapped to the human genome, BC149621.1-1 maps to a single location and is given the identifier BC149621.1-1.1. However, BC149621.1-2 maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note that multiple TransMap mappings are usually the result of tandem duplications, where both chains are identified as syntenic. Data Access The raw data for these tracks can be accessed interactively through the Table Browser or the Data Integrator. For automated analysis, the annotations are stored in bigPsl files (containing a number of extra columns) and can be downloaded from our download server, or queried using our API. For more information on accessing track data see our Track Data Access FAQ. The files are associated with these tracks in the following way: TransMap Ensembl - galGal5.ensembl.transMapV4.bigPsl TransMap RefGene - galGal5.refseq.transMapV4.bigPsl TransMap RNA - galGal5.rna.transMapV4.bigPsl TransMap ESTs - galGal5.est.transMapV4.bigPsl Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/galGal5/transMap/V4/galGal5.refseq.transMapV4.bigPsl -chrom=chr6 -start=0 -end=1000000 stdout Credits This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data submitted to the international public sequence databases by scientists worldwide and annotations produced by the RefSeq, Ensembl, and GENCODE annotations projects. References Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S, Lau C et al. Targeted discovery of novel human exons by comparative genomics. Genome Res. 2007 Dec;17(12):1763-73. PMID: 17989246; PMC: PMC2099585 Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656 Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol. 2007 Dec;3(12):e247. PMID: 18085818; PMC: PMC2134963 transMapEstV5 TransMap ESTs TransMap EST Mappings Version 5 Genes and Gene Predictions Description This track contains GenBank spliced EST alignments produced by the TransMap cross-species alignment algorithm from other vertebrate species in the UCSC Genome Browser. For closer evolutionary distances, the alignments are created using syntenically filtered BLASTZ alignment chains, resulting in a prediction of the orthologous genes in chicken. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare cDNAs against the genomic sequence. For more information about this option, click here. Several types of alignment gap may also be colored; for more information, click here. Methods Source transcript alignments were obtained from vertebrate organisms in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes, were used as available. For all vertebrate assemblies that had BLASTZ alignment chains and nets to the chicken (galGal5) genome, a subset of the alignment chains were selected as follows: For organisms whose branch distance was no more than 0.5 (as computed by phyloFit, see Conservation track description for details), syntenic filtering was used. Reciprocal best nets were used if available; otherwise, nets were selected with the netfilter -syn command. The chains corresponding to the selected nets were used for mapping. For more distant species, where the determination of synteny is difficult, the full set of chains was used for mapping. This allows for more genes to map at the expense of some mapping to paralogous regions. The post-alignment filtering step removes some of the duplications. The pslMap program was used to do a base-level projection of the source transcript alignments via the selected chains to the chicken genome, resulting in pairwise alignments of the source transcripts to the genome. The resulting alignments were filtered with pslCDnaFilter with a global near-best criteria of 0.5% in finished genomes (human and mouse) and 1.0% in other genomes. Alignments where less than 20% of the transcript mapped were discarded. To ensure unique identifiers for each alignment, cDNA and gene accessions were made unique by appending a suffix for each location in the source genome and again for each mapped location in the destination genome. The format is: accession.version-srcUniq.destUniq Where srcUniq is a number added to make each source alignment unique, and destUniq is added to give the subsequent TransMap alignments unique identifiers. For example, in the cow genome, there are two alignments of mRNA BC149621.1. These are assigned the identifiers BC149621.1-1 and BC149621.1-2. When these are mapped to the human genome, BC149621.1-1 maps to a single location and is given the identifier BC149621.1-1.1. However, BC149621.1-2 maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note that multiple TransMap mappings are usually the result of tandem duplications, where both chains are identified as syntenic. Data Access The raw data for these tracks can be accessed interactively through the Table Browser or the Data Integrator. For automated analysis, the annotations are stored in bigPsl files (containing a number of extra columns) and can be downloaded from our download server, or queried using our API. For more information on accessing track data see our Track Data Access FAQ. The files are associated with these tracks in the following way: TransMap Ensembl - galGal5.ensembl.transMapV4.bigPsl TransMap RefGene - galGal5.refseq.transMapV4.bigPsl TransMap RNA - galGal5.rna.transMapV4.bigPsl TransMap ESTs - galGal5.est.transMapV4.bigPsl Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/galGal5/transMap/V4/galGal5.refseq.transMapV4.bigPsl -chrom=chr6 -start=0 -end=1000000 stdout Credits This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data submitted to the international public sequence databases by scientists worldwide and annotations produced by the RefSeq, Ensembl, and GENCODE annotations projects. References Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S, Lau C et al. Targeted discovery of novel human exons by comparative genomics. Genome Res. 2007 Dec;17(12):1763-73. PMID: 17989246; PMC: PMC2099585 Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656 Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol. 2007 Dec;3(12):e247. PMID: 18085818; PMC: PMC2134963 snp147 All SNPs(147) Simple Nucleotide Polymorphisms (dbSNP 147) Variation and Repeats Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 147, available from ftp.ncbi.nih.gov/snp. A subset of the items in this track are contained in Mult. SNPs(147), which are SNPs that have been mapped to multiple locations in the reference genome assembly. The default maximum weight for this track is 1, so unless the setting is changed in the track controls, SNPs that map to multiple genomic locations will be omitted from display. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/chicken_9031/database/organism_data/ for galGal5 The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/chicken_9031/rs_fasta/ for galGal5. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b147_SNPContigLoc.bcp.gz and b147_ContigInfo.bcp.gz. b147_SNPMapInfo.bcp.gz provided the alignment weights. Functional classification was obtained from b147_SNPContigLocusId.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server (snp147*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 gold Assembly Assembly from Fragments Mapping and Sequencing Description This track shows the sequences used in the Dec 2015 chicken genome assembly. Genome assembly procedures are covered in the NCBI assembly documentation. NCBI also provides specific information about this assembly. The definition of this assembly is from the AGP file delivered with the sequence. The NCBI document AGP Specification describes the format of the AGP file. In dense mode, this track depicts the contigs that make up the currently viewed scaffold. Contig boundaries are distinguished by the use of alternating gold and brown coloration. Where gaps exist between contigs, spaces are shown between the gold and brown blocks. The relative order and orientation of the contigs within a scaffold is always known; therefore, a line is drawn in the graphical display to bridge the blocks. Component types found in this track (with counts of that type in parentheses): W - whole genome shotgun (24,806) F - finished sequence (773) A - active finishing (6) O - one other sequence (chrM/NC_001323.1) augustusGene AUGUSTUS AUGUSTUS ab initio gene predictions v3.1 Genes and Gene Predictions Description This track shows ab initio predictions from the program AUGUSTUS (version 3.1). The predictions are based on the genome sequence alone. For more information on the different gene tracks, see our Genes FAQ. Methods Statistical signal models were built for splice sites, branch-point patterns, translation start sites, and the poly-A signal. Furthermore, models were built for the sequence content of protein-coding and non-coding regions as well as for the length distributions of different exon and intron types. Detailed descriptions of most of these different models can be found in Mario Stanke's dissertation. This track shows the most likely gene structure according to a Semi-Markov Conditional Random Field model. Alternative splicing transcripts were obtained with a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2 --minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2). The different models used by Augustus were trained on a number of different species-specific gene sets, which included 1000-2000 training gene structures. The --species option allows one to choose the species used for training the models. Different training species were used for the --species option when generating these predictions for different groups of assemblies. Assembly Group Training Species Fish zebrafish Birds chicken Human and all other vertebrates human Nematodes caenorhabditis Drosophila fly A. mellifera honeybee1 A. gambiae culex S. cerevisiae saccharomyces This table describes which training species was used for a particular group of assemblies. When available, the closest related training species was used. Credits Thanks to the Stanke lab for providing the AUGUSTUS program. The training for the chicken version was done by Stefanie König and the training for the human and zebrafish versions was done by Mario Stanke. References Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656 Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25. PMID: 14534192 est Chicken ESTs Chicken ESTs Including Unspliced mRNA and EST Description This track shows alignments between chicken expressed sequence tags (ESTs) in GenBank and the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, chicken ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 mrna Chicken mRNAs Chicken mRNAs from GenBank mRNA and EST Description The mRNA track shows alignments between chicken mRNAs in GenBank and the genome. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the mRNA display. For example, to apply the filter to all mRNAs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only mRNAs that match all filter criteria will be highlighted. If "or" is selected, mRNAs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display mRNAs that match the filter criteria. If "include" is selected, the browser will display only those mRNAs that match the filter criteria. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare mRNAs against the genomic sequence. For more information about this option, go to the Codon and Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods GenBank chicken mRNAs were aligned against the genome using the blat program. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits The mRNA track was produced at UCSC from mRNA sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 cytoBandIdeo Chromosome Band (Ideogram) Ideogram for Orientation Mapping and Sequencing ensGene Ensembl Genes Ensembl Genes Genes and Gene Predictions Description These gene predictions were generated by Ensembl. For more information on the different gene tracks, see our Genes FAQ. Methods For a description of the methods used in Ensembl gene predictions, please refer to Hubbard et al. (2002), also listed in the References section below. Data access Ensembl Gene data can be explored interactively using the Table Browser or the Data Integrator. For local downloads, the genePred format files for galGal5 are available in our downloads directory as ensGene.txt.gz or in our genes download directory in GTF format. For programmatic access, the data can be queried from the REST API or directly from our public MySQL servers. Instructions on this method are available on our MySQL help page and on our blog. Previous versions of this track can be found on our archive download server. Credits We would like to thank Ensembl for providing these gene annotations. For more information, please see Ensembl's genome annotation page. References Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al. The Ensembl genome database project. Nucleic Acids Res. 2002 Jan 1;30(1):38-41. PMID: 11752248; PMC: PMC99161 evaSnpContainer EVA SNP Short Genetic Variants from European Variant Archive Variation and Repeats Description These tracks contain mappings of single nucleotide variants and small insertions and deletions (indels) from the European Variation Archive (EVA) for the chicken galGal5 genome. The dbSNP database at NCBI no longer hosts non-human variants. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode. Searching, details, and filtering Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser. A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator. Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or by color, which bins the UCSC functional effects into general classes. Mouse-over Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID, separated by spaces, describing all possible AA changes. Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms, the gene models used were the NCBI RefSeq curated when available, if not, then ensembl genes, or finally, UCSC mappings of RefSeq if neither of the previous models was possible. Track colors Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below. Color Variant Type Protein-altering variants and splice site variants Synonymous codon variants Non-coding transcript or Untranslated Region (UTR) variants Intergenic and intronic variants Sequence Ontology (SO) Variants are classified by EVA into one of the following sequence ontology terms: substitution — A single nucleotide in the reference is replaced by another, alternate allele. deletion — One or more nucleotides are deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g., a variant that is a deletion of an A may be represented as Ref = GA and Alt = G. insertion — One or more nucleotides are inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g., a variant that is an insertion of a T may be represented as Ref = G and Alt = GT. delins — Similar to a tandem repeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC. sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet. Methods Data were downloaded from the European Variation Archive EVA current_ids.vcf.gz files corresponding to the proper assembly. Chromosome names were converted to UCSC-style, and the variants were passed through the Variant Annotation Integrator to predict consequence. For every organism, the NCBI RefSeq curated models were used when available, followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models were possible. Variants were then colored according to their predicted consequence in the following fashion: Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation Synonymous codon variants - synonymous_variant, stop_retained_variant Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration Sequence Ontology ("SO:") terms were converted to the variant classes, then the files were converted to BED, and then to bigBed format. No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). Amino-acid substitutions for missense variants are based on RefSeq alignments of mRNA transcripts, which do not always match the amino acids predicted from translating the genomic sequence. Therefore, in some instances, the variant and the genomic nucleotide and associated amino acid may be reversed. E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from the perspective of the genomic sequence. Also, in bosTau9, galGal5, rheMac10, and danRer11, the mitochondrial sequence was removed or renamed to match UCSC. For complete documentation of the processing of these tracks, see the makedoc corresponding to the version of interest. For example, the EVA Release 8 MakeDoc. Data Access Note: It is not recommended to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The data can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions or our Data Access FAQ for more information. For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. Use the corresponding version number for the track of interest, e.g., evaSnp8.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/galGal5/bbi/evaSnp8.bb -chrom=chr21 -start=0 -end=100000000 stdout Credits This track was produced from the European Variation Archive release data. Consequences were predicted using UCSC's Variant Annotation Integrator and NCBI's RefSeq, as well as ensembl gene models. References Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 2021 Oct 28:gkab960. doi:10.1093/nar/gkab960. Epub ahead of print. PMID: 34718739. PMID: PMC8728205. Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, Haussler D, Kent WJ. UCSC Data Integrator and Variant Annotation Integrator. Bioinformatics. 2016 May 1;32(9):1430-2. PMID: 26740527; PMC: PMC4848401 evaSnp8 EVA SNP Release 8 Short Genetic Variants from European Variant Archive Release 8 Variation and Repeats Description These tracks contain mappings of single nucleotide variants and small insertions and deletions (indels) from the European Variation Archive (EVA) for the chicken galGal5 genome. The dbSNP database at NCBI no longer hosts non-human variants. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode. Searching, details, and filtering Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser. A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator. Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or by color, which bins the UCSC functional effects into general classes. Mouse-over Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID, separated by spaces, describing all possible AA changes. Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms, the gene models used were the NCBI RefSeq curated when available, if not, then ensembl genes, or finally, UCSC mappings of RefSeq if neither of the previous models was possible. Track colors Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below. Color Variant Type Protein-altering variants and splice site variants Synonymous codon variants Non-coding transcript or Untranslated Region (UTR) variants Intergenic and intronic variants Sequence Ontology (SO) Variants are classified by EVA into one of the following sequence ontology terms: substitution — A single nucleotide in the reference is replaced by another, alternate allele. deletion — One or more nucleotides are deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g., a variant that is a deletion of an A may be represented as Ref = GA and Alt = G. insertion — One or more nucleotides are inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g., a variant that is an insertion of a T may be represented as Ref = G and Alt = GT. delins — Similar to a tandem repeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC. sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet. Methods Data were downloaded from the European Variation Archive EVA current_ids.vcf.gz files corresponding to the proper assembly. Chromosome names were converted to UCSC-style, and the variants were passed through the Variant Annotation Integrator to predict consequence. For every organism, the NCBI RefSeq curated models were used when available, followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models were possible. Variants were then colored according to their predicted consequence in the following fashion: Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation Synonymous codon variants - synonymous_variant, stop_retained_variant Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration Sequence Ontology ("SO:") terms were converted to the variant classes, then the files were converted to BED, and then to bigBed format. No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). Amino-acid substitutions for missense variants are based on RefSeq alignments of mRNA transcripts, which do not always match the amino acids predicted from translating the genomic sequence. Therefore, in some instances, the variant and the genomic nucleotide and associated amino acid may be reversed. E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from the perspective of the genomic sequence. Also, in bosTau9, galGal5, rheMac10, and danRer11, the mitochondrial sequence was removed or renamed to match UCSC. For complete documentation of the processing of these tracks, see the makedoc corresponding to the version of interest. For example, the EVA Release 8 MakeDoc. Data Access Note: It is not recommended to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The data can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions or our Data Access FAQ for more information. For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. Use the corresponding version number for the track of interest, e.g., evaSnp8.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/galGal5/bbi/evaSnp8.bb -chrom=chr21 -start=0 -end=100000000 stdout Credits This track was produced from the European Variation Archive release data. Consequences were predicted using UCSC's Variant Annotation Integrator and NCBI's RefSeq, as well as ensembl gene models. References Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 2021 Oct 28:gkab960. doi:10.1093/nar/gkab960. Epub ahead of print. PMID: 34718739. PMID: PMC8728205. Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, Haussler D, Kent WJ. UCSC Data Integrator and Variant Annotation Integrator. Bioinformatics. 2016 May 1;32(9):1430-2. PMID: 26740527; PMC: PMC4848401 evaSnp7 EVA SNP Release 7 Short Genetic Variants from European Variant Archive Release 7 Variation and Repeats Description These tracks contain mappings of single nucleotide variants and small insertions and deletions (indels) from the European Variation Archive (EVA) for the chicken galGal5 genome. The dbSNP database at NCBI no longer hosts non-human variants. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode. Searching, details, and filtering Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser. A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator. Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or by color, which bins the UCSC functional effects into general classes. Mouse-over Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID, separated by spaces, describing all possible AA changes. Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms, the gene models used were the NCBI RefSeq curated when available, if not, then ensembl genes, or finally, UCSC mappings of RefSeq if neither of the previous models was possible. Track colors Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below. Color Variant Type Protein-altering variants and splice site variants Synonymous codon variants Non-coding transcript or Untranslated Region (UTR) variants Intergenic and intronic variants Sequence Ontology (SO) Variants are classified by EVA into one of the following sequence ontology terms: substitution — A single nucleotide in the reference is replaced by another, alternate allele. deletion — One or more nucleotides are deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g., a variant that is a deletion of an A may be represented as Ref = GA and Alt = G. insertion — One or more nucleotides are inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g., a variant that is an insertion of a T may be represented as Ref = G and Alt = GT. delins — Similar to a tandem repeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC. sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet. Methods Data were downloaded from the European Variation Archive EVA current_ids.vcf.gz files corresponding to the proper assembly. Chromosome names were converted to UCSC-style, and the variants were passed through the Variant Annotation Integrator to predict consequence. For every organism, the NCBI RefSeq curated models were used when available, followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models were possible. Variants were then colored according to their predicted consequence in the following fashion: Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation Synonymous codon variants - synonymous_variant, stop_retained_variant Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration Sequence Ontology ("SO:") terms were converted to the variant classes, then the files were converted to BED, and then to bigBed format. No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). Amino-acid substitutions for missense variants are based on RefSeq alignments of mRNA transcripts, which do not always match the amino acids predicted from translating the genomic sequence. Therefore, in some instances, the variant and the genomic nucleotide and associated amino acid may be reversed. E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from the perspective of the genomic sequence. Also, in bosTau9, galGal5, rheMac10, and danRer11, the mitochondrial sequence was removed or renamed to match UCSC. For complete documentation of the processing of these tracks, see the makedoc corresponding to the version of interest. For example, the EVA Release 8 MakeDoc. Data Access Note: It is not recommended to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The data can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions or our Data Access FAQ for more information. For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. Use the corresponding version number for the track of interest, e.g., evaSnp8.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/galGal5/bbi/evaSnp8.bb -chrom=chr21 -start=0 -end=100000000 stdout Credits This track was produced from the European Variation Archive release data. Consequences were predicted using UCSC's Variant Annotation Integrator and NCBI's RefSeq, as well as ensembl gene models. References Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 2021 Oct 28:gkab960. doi:10.1093/nar/gkab960. Epub ahead of print. PMID: 34718739. PMID: PMC8728205. Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, Haussler D, Kent WJ. UCSC Data Integrator and Variant Annotation Integrator. Bioinformatics. 2016 May 1;32(9):1430-2. PMID: 26740527; PMC: PMC4848401 evaSnp6 EVA SNP Release 6 Short Genetic Variants from European Variant Archive Release 6 Variation and Repeats Description These tracks contain mappings of single nucleotide variants and small insertions and deletions (indels) from the European Variation Archive (EVA) for the chicken galGal5 genome. The dbSNP database at NCBI no longer hosts non-human variants. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode. Searching, details, and filtering Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser. A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator. Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or by color, which bins the UCSC functional effects into general classes. Mouse-over Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID, separated by spaces, describing all possible AA changes. Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms, the gene models used were the NCBI RefSeq curated when available, if not, then ensembl genes, or finally, UCSC mappings of RefSeq if neither of the previous models was possible. Track colors Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below. Color Variant Type Protein-altering variants and splice site variants Synonymous codon variants Non-coding transcript or Untranslated Region (UTR) variants Intergenic and intronic variants Sequence Ontology (SO) Variants are classified by EVA into one of the following sequence ontology terms: substitution — A single nucleotide in the reference is replaced by another, alternate allele. deletion — One or more nucleotides are deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g., a variant that is a deletion of an A may be represented as Ref = GA and Alt = G. insertion — One or more nucleotides are inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g., a variant that is an insertion of a T may be represented as Ref = G and Alt = GT. delins — Similar to a tandem repeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC. sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet. Methods Data were downloaded from the European Variation Archive EVA current_ids.vcf.gz files corresponding to the proper assembly. Chromosome names were converted to UCSC-style, and the variants were passed through the Variant Annotation Integrator to predict consequence. For every organism, the NCBI RefSeq curated models were used when available, followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models were possible. Variants were then colored according to their predicted consequence in the following fashion: Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation Synonymous codon variants - synonymous_variant, stop_retained_variant Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration Sequence Ontology ("SO:") terms were converted to the variant classes, then the files were converted to BED, and then to bigBed format. No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). Amino-acid substitutions for missense variants are based on RefSeq alignments of mRNA transcripts, which do not always match the amino acids predicted from translating the genomic sequence. Therefore, in some instances, the variant and the genomic nucleotide and associated amino acid may be reversed. E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from the perspective of the genomic sequence. Also, in bosTau9, galGal5, rheMac10, and danRer11, the mitochondrial sequence was removed or renamed to match UCSC. For complete documentation of the processing of these tracks, see the makedoc corresponding to the version of interest. For example, the EVA Release 8 MakeDoc. Data Access Note: It is not recommended to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The data can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions or our Data Access FAQ for more information. For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. Use the corresponding version number for the track of interest, e.g., evaSnp8.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/galGal5/bbi/evaSnp8.bb -chrom=chr21 -start=0 -end=100000000 stdout Credits This track was produced from the European Variation Archive release data. Consequences were predicted using UCSC's Variant Annotation Integrator and NCBI's RefSeq, as well as ensembl gene models. References Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 2021 Oct 28:gkab960. doi:10.1093/nar/gkab960. Epub ahead of print. PMID: 34718739. PMID: PMC8728205. Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, Haussler D, Kent WJ. UCSC Data Integrator and Variant Annotation Integrator. Bioinformatics. 2016 May 1;32(9):1430-2. PMID: 26740527; PMC: PMC4848401 evaSnp5 EVA SNP Release 5 Short Genetic Variants from European Variant Archive Release 5 Variation and Repeats Description This track contains mappings of single nucleotide variants and small insertions and deletions (indels) from the European Variation Archive (EVA) Release 5 for the chicken galGal5 genome. The dbSNP database at NCBI no longer hosts non-human variants. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode. Searching, details, and filtering Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser. A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator. Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or by color, which bins the UCSC functional effects into general classes. Mouse-over Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID separated by spaces describing all possible AA changes. Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms the gene models used were the NCBI RefSeq curated when available, if not then ensembl genes, or finally UCSC mappings of RefSeq if neither of the previous models was possible. Track colors Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below. Color Variant Type Protein-altering variants and splice site variants Synonymous codon variants Non-coding transcript or Untranslated Region (UTR) variants Intergenic and intronic variants Sequence ontology (SO) Variants are classified by EVA into one of the following sequence ontology terms: substitution — A single nucleotide in the reference is replaced by another, alternate allele deletion — One or more nucleotides is deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is a deletion of an A maybe be represented as Ref = GA and Alt = G. insertion — One or more nucleotides is inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is an insertion of a T maybe be represented as Ref = G and Alt = GT delins — Similar to tandemRepeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC. sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet. Methods Data were downloaded from the European Variation Archive EVA release 5 (2023-9-7) current_ids.vcf.gz files corresponding to the proper assembly. Chromosome names were converted to UCSC-style and the variants passed through the Variant Annotation Integrator to predict consequence. For every organism the NCBI RefSeq curated models were used when available, followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models were possible. Variants were then colored according to their predicted consequence in the following fashion: Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation Synonymous codon variants - synonymous_variant, stop_retained_variant Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration Sequence Ontology ("SO:") terms were converted to the variant classes, then the files were converted to BED, and then bigBed format. No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). Amino-acid substitutions for missense variants are based on RefSeq alignments of mRNA transcripts, which do not always match the amino acids predicted from translating the genomic sequence. Therefore, in some instances, the variant and the genomic nucleotide and associated amino acid may be reversed. E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from the persepective the genomic sequence. Also, in bosTau9, galGal5, rheMac8, danRer10 and danRer11 the mitochondrial sequence was removed or renamed to match UCSC. For complete documentation of the processing of these tracks, read the EVA Release 5 MakeDoc. Data Access Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called evaSnp5.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/galGal5/bbi/evaSnp5.bb -chrom=chr21 -start=0 -end=100000000 stdout Credits This track was produced from the European Variation Archive release 5 data. Consequences were predicted using UCSC's Variant Annotation Integrator and NCBI's RefSeq as well as ensembl gene models. References Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 2021 Oct 28:gkab960. doi:10.1093/nar/gkab960. Epub ahead of print. PMID: 34718739. PMID: PMC8728205. Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, Haussler D, Kent WJ. UCSC Data Integrator and Variant Annotation Integrator. Bioinformatics. 2016 May 1;32(9):1430-2. PMID: 26740527; PMC: PMC4848401 evaSnp4 EVA SNP Release 4 Short Genetic Variants from European Variant Archive Release 4 Variation and Repeats Description This track contains mappings of single nucleotide variants and small insertions and deletions (indels) from the European Variation Archive (EVA) Release 4 for the chicken galGal5 genome. The dbSNP database at NCBI no longer hosts non-human variants. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode. Searching, details, and filtering Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser. A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator. Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or by color, which bins the UCSC functional effects into general classes. Mouse-over Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID separated by spaces describing all possible AA changes. Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms the gene models used were the NCBI RefSeq curated when available, if not then ensembl genes, or finally UCSC mappings of RefSeq if neither of the previous models was possible. Track colors Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below. Color Variant Type Protein-altering variants and splice site variants Synonymous codon variants Non-coding transcript or Untranslated Region (UTR) variants Intergenic and intronic variants Sequence ontology (SO) Variants are classified by EVA into one of the following sequence ontology terms: substitution — A single nucleotide in the reference is replaced by another, alternate allele deletion — One or more nucleotides is deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is a deletion of an A maybe be represented as Ref = GA and Alt = G. insertion — One or more nucleotides is inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is an insertion of a T maybe be represented as Ref = G and Alt = GT delins — Similar to tandemRepeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC. sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet. Methods Data were downloaded from the European Variation Archive EVA release 4 (2022-11-21) current_ids.vcf.gz files corresponding to the proper assembly. Chromosome names were converted to UCSC-style and the variants passed through the Variant Annotation Integrator to predict consequence. For every organism the NCBI RefSeq curated models were used when available, followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models were possible. Variants were then colored according to their predicted consequence in the following fashion: Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation Synonymous codon variants - synonymous_variant, stop_retained_variant Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration Sequence Ontology ("SO:") terms were converted to the variant classes, then the files were converted to BED, and then bigBed format. No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). Amino-acid substitutions for missense variants are based on RefSeq alignments of mRNA transcripts, which do not always match the amino acids predicted from translating the genomic sequence. Therefore, in some instances, the variant and the genomic nucleotide and associated amino acid may be reversed. E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from the persepective the genomic sequence. Also, in bosTau9, galGal5, rheMac8, danRer10 and danRer11 the mitochondrial sequence was removed or renamed to match UCSC. For complete documentation of the processing of these tracks, read the EVA Release 4 MakeDoc. Data Access Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called evaSnp4.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/galGal5/bbi/evaSnp4.bb -chrom=chr21 -start=0 -end=100000000 stdout Credits This track was produced from the European Variation Archive release 4 data. Consequences were predicted using UCSC's Variant Annotation Integrator and NCBI's RefSeq as well as ensembl gene models. References Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 2021 Oct 28:gkab960. doi:10.1093/nar/gkab960. Epub ahead of print. PMID: 34718739. PMID: PMC8728205. Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, Haussler D, Kent WJ. UCSC Data Integrator and Variant Annotation Integrator. Bioinformatics. 2016 May 1;32(9):1430-2. PMID: 26740527; PMC: PMC4848401 evaSnp EVA SNP Release 3 Short Genetic Variants from European Variant Archive Release 3 Variation and Repeats Description This track contains mappings of single nucleotide variants and small insertions and deletions (indels) from the European Variation Archive (EVA) Release 3 for the chicken galGal5 genome. The dbSNP database at NCBI no longer hosts non-human variants. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode. Searching, details, and filtering Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser. A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator. Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or by color, which bins the UCSC functional effects into general classes. Mouse-over Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID separated by spaces describing all possible AA changes. Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms the gene models used were ncbiRefSeqCurated, except for mm39 which used ncbiRefSeqSelect. Track colors Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below. Color Variant Type Protein-altering variants and splice site variants Synonymous codon variants Non-coding transcript or Untranslated Region (UTR) variants Intergenic and intronic variants Sequence ontology (SO) Variants are classified by EVA into one of the following sequence ontology terms: substitution — A single nucleotide in the reference is replaced by another, alternate allele deletion — One or more nucleotides is deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is a deletion of an A maybe be represented as Ref = GA and Alt = G. insertion — One or more nucleotides is inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is an insertion of a T maybe be represented as Ref = G and Alt = GT delins — Similar to tandemRepeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC. sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet. Methods Data were downloaded from the European Variation Archive EVA release 3 (2022-02-24) current_ids.vcf.gz files corresponding to the proper assembly. Chromosome names were converted to UCSC-style, a few problematic variants were removed, and the variants passed through the Variant Annotation Integrator to predict consequence. For every organism the ncbiRefSeqCurated gene models were used to predict the consequences, except for mm39 which used the ncbiRefSeqSelect models. Variants were then colored according to their predicted consequence in the following fashion: Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation Synonymous codon variants - synonymous_variant, stop_retained_variant Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration Sequence Ontology ("SO:") terms were converted to the variant classes, then the files were converted to BED, and then bigBed format. No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). Amino-acid substitutions for missense variants are based on RefSeq alignments of mRNA transcripts, which do not always match the amino acids predicted from translating the genomic sequence. Therefore, in some instances, the variant and the genomic nucleotide and associated amino acid may be reversed. E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from the persepective the genomic sequence. For complete documentation of the processing of these tracks, read the EVA Release 3 MakeDoc. Data Access Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called evaSnp.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/galGal5/bbi/evaSnp.bb -chrom=chr21 -start=0 -end=100000000 stdout Credits This track was produced from the European Variation Archive release 3 data. Consequences were predicted using UCSC's Variant Annotation Integrator and NCBI's RefSeq gene models. References Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 2021 Oct 28:gkab960. doi:10.1093/nar/gkab960. Epub ahead of print. PMID: 34718739. PMID: PMC8728205. Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, Haussler D, Kent WJ. UCSC Data Integrator and Variant Annotation Integrator. Bioinformatics. 2016 May 1;32(9):1430-2. PMID: 26740527; PMC: PMC4848401 gap Gap Gap Locations Mapping and Sequencing Description This track shows the gaps in the Dec 2015 chicken genome assembly. Genome assembly procedures are covered in the NCBI assembly documentation. NCBI also provides specific information about this assembly. The definition of the gaps in this assembly is from the AGP file delivered with the sequence. The NCBI document AGP Specification describes the format of the AGP file. Gaps are represented as black boxes in this track. If the relative order and orientation of the contigs on either side of the gap is supported by read pair data, it is a bridged gap and a white line is drawn through the black box representing the gap. This assembly contains the following principal types of gaps: centromere - gaps for centromeres are included when they can be reasonably localized (count: 16; all of size 500,000 bases) contig - gaps between contigs in scaffolds (count: 381; size range: 10 - 50,000 bases) scaffold - gaps between scaffolds in chromosome assemblies (count: 819; size range: 13 - 156,025 bases) gc5BaseBw GC Percent GC Percent in 5-Base Windows Mapping and Sequencing Description The GC percent track shows the percentage of G (guanine) and C (cytosine) bases in 5-base windows. High GC content is typically associated with gene-rich areas. This track may be configured in a variety of ways to highlight different apsects of the displayed information. Click the "Graph configuration help" link for an explanation of the configuration options. Credits The data and presentation of this graph were prepared by Hiram Clawson. genscan Genscan Genes Genscan Gene Predictions Genes and Gene Predictions Description This track shows predictions from the Genscan program written by Chris Burge. The predictions are based on transcriptional, translational and donor/acceptor splicing signals as well as the length and compositional distributions of exons, introns and intergenic regions. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The track description page offers the following filter and configuration options: Color track by codons: Select the genomic codons option to color and label each codon in a zoomed-in display to facilitate validation and comparison of gene predictions. Go to the Coloring Gene Predictions and Annotations by Codon page for more information about this feature. Methods For a description of the Genscan program and the model that underlies it, refer to Burge and Karlin (1997) in the References section below. The splice site models used are described in more detail in Burge (1998) below. Credits Thanks to Chris Burge for providing the Genscan program. References Burge C. Modeling Dependencies in Pre-mRNA Splicing Signals. In: Salzberg S, Searls D, Kasif S, editors. Computational Methods in Molecular Biology. Amsterdam: Elsevier Science; 1998. p. 127-163. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997 Apr 25;268(1):78-94. PMID: 9149143 grcIncidentDb GRC Incident GRC Incident Database Mapping and Sequencing Description This track shows locations in the chicken assembly where assembly problems have been noted or resolved, as reported by the Genome Reference Consortium (GRC). If you would like to report an assembly problem, please use the GRC issue reporting system. Methods Data for this track are extracted from the GRC incident database from the specific species *_issues.gff3 file. The track is synchronized once daily to incorporate new updates. Credits The data and presentation of this track were prepared by Hiram Clawson. ucscToINSDC INSDC Accession at INSDC - International Nucleotide Sequence Database Collaboration Mapping and Sequencing Description This track associates UCSC Genome Browser chromosome names to accession names from the International Nucleotide Sequence Database Collaboration (INSDC). The data were downloaded from the NCBI assembly database. Credits The data for this track was prepared by Hiram Clawson. nestedRepeats Interrupted Rpts Fragments of Interrupted Repeats Joined by RepeatMasker ID Variation and Repeats Description This track shows joined fragments of interrupted repeats extracted from the output of the RepeatMasker program which screens DNA sequences for interspersed repeats and low complexity DNA sequences using the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka (2000) in the References section below. The detailed annotations from RepeatMasker are in the RepeatMasker track. This track shows fragments of original repeat insertions which have been interrupted by insertions of younger repeats or through local rearrangements. The fragments are joined using the ID column of RepeatMasker output. Display Conventions and Configuration In pack or full mode, each interrupted repeat is displayed as boxes (fragments) joined by horizontal lines, labeled with the repeat name. If all fragments are on the same strand, arrows are added to the horizontal line to indicate the strand. In dense or squish mode, labels and arrows are omitted and in dense mode, all items are collapsed to fit on a single row. Items are shaded according to the average identity score of their fragments. Usually, the shade of an item is similar to the shades of its fragments unless some fragments are much more diverged than others. The score displayed above is the average identity score, clipped to a range of 50% - 100% and then mapped to the range 0 - 1000 for shading in the browser. Methods UCSC has used the most current versions of the RepeatMasker software and repeat libraries available to generate these data. Note that these versions may be newer than those that are publicly available on the Internet. Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. See the FAQ for more information. Credits Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track. References Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. https://www.repeatmasker.org/. 1996-2010. Repbase Update is described in: Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. PMID: 10973072 For a discussion of repeats in mammalian genomes, see: Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):657-63. PMID: 10607616 Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. PMID: 8994846 microsat Microsatellite Microsatellites - Di-nucleotide and Tri-nucleotide Repeats Variation and Repeats Description This track displays regions that are likely to be useful as microsatellite markers. These are sequences of at least 15 perfect di-nucleotide and tri-nucleotide repeats and tend to be highly polymorphic in the population. Methods The data shown in this track are a subset of the Simple Repeats track, selecting only those repeats of period 2 and 3, with 100% identity and no indels and with at least 15 copies of the repeat. The Simple Repeats track is created using the Tandem Repeats Finder. For more information about this program, see Benson (1999). Credits Tandem Repeats Finder was written by Gary Benson. References Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 Jan 15;27(2):573-80. PMID: 9862982; PMC: PMC148217 snp147Mult Mult. SNPs(147) Simple Nucleotide Polymorphisms (dbSNP 147) That Map to Multiple Genomic Loci Variation and Repeats Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 147, available from ftp.ncbi.nih.gov/snp. Only SNPs that have been mapped to multiple locations in the reference genome assembly are included in this subset. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The default maximum weight for this track is 3, unlike the other dbSNP build 147 tracks which have a maximum weight of 1. That enables these multiply-mapped SNPs to appear in the display, while by default they will not appear in the All SNPs(147) track because of its maximum weight filter. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/chicken_9031/database/organism_data/ for galGal5 The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/chicken_9031/rs_fasta/ for galGal5. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b147_SNPContigLoc.bcp.gz and b147_ContigInfo.bcp.gz. b147_SNPMapInfo.bcp.gz provided the alignment weights. Functional classification was obtained from b147_SNPContigLocusId.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server (snp147*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 xenoMrna Other mRNAs Non-Chicken mRNAs from GenBank mRNA and EST Description This track displays translated blat alignments of vertebrate and invertebrate mRNA in GenBank from organisms other than chicken. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The strand information (+/-) for this track is in two parts. The first + indicates the orientation of the query sequence whose translated protein produced the match (here always 5' to 3', hence +). The second + or - indicates the orientation of the matching translated genomic sequence. Because the two orientations of a DNA sequence give different predicted protein sequences, there are four combinations. ++ is not the same as --, nor is +- the same as -+. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the mRNA display. For example, to apply the filter to all mRNAs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only mRNAs that match all filter criteria will be highlighted. If "or" is selected, mRNAs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display mRNAs that match the filter criteria. If "include" is selected, the browser will display only those mRNAs that match the filter criteria. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare mRNAs against the genomic sequence. For more information about this option, go to the Codon and Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods The mRNAs were aligned against the chicken genome using translated blat. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only those alignments having a base identity level within 1% of the best and at least 25% base identity with the genomic sequence were kept. Credits The mRNA track was produced at UCSC from mRNA sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 xenoRefGene Other RefSeq Non-Chicken RefSeq Genes Genes and Gene Predictions Description This track shows known protein-coding and non-protein-coding genes for organisms other than chicken, taken from the NCBI RNA reference sequences collection (RefSeq). The data underlying this track are updated weekly. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark). The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name instead of the gene name, show both the gene and accession names, or turn off the label completely. Codon coloring: This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. Hide non-coding genes: By default, both the protein-coding and non-protein-coding genes are displayed. If you wish to see only the coding genes, click this box. Methods The RNAs were aligned against the chicken genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 25% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from RNA sequence data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 ucscToRefSeq RefSeq Acc RefSeq Accession Mapping and Sequencing Description This track associates UCSC Genome Browser chromosome names to accession identifiers from the NCBI Reference Sequence Database (RefSeq). The data were downloaded from the NCBI assembly database. Credits The data for this track was prepared by Hiram Clawson. simpleRepeat Simple Repeats Simple Tandem Repeats by TRF Variation and Repeats Description This track displays simple tandem repeats (possibly imperfect repeats) located by Tandem Repeats Finder (TRF) which is specialized for this purpose. These repeats can occur within coding regions of genes and may be quite polymorphic. Repeat expansions are sometimes associated with specific diseases. Methods For more information about the TRF program, see Benson (1999). Credits TRF was written by Gary Benson. References Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 Jan 15;27(2):573-80. PMID: 9862982; PMC: PMC148217 intronEst Spliced ESTs Chicken ESTs That Have Been Spliced mRNA and EST Description This track shows alignments between chicken expressed sequence tags (ESTs) in GenBank and the genome that show signs of splicing when aligned against the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. To be considered spliced, an EST must show evidence of at least one canonical intron (i.e., the genomic sequence between EST alignment blocks must be at least 32 bases in length and have GT/AG ends). By requiring splicing, the level of contamination in the EST databases is drastically reduced at the expense of eliminating many genuine 3' ESTs. For a display of all ESTs (including unspliced), see the chicken EST track. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, darker shading indicates a larger number of aligned ESTs. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, chicken ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence are displayed in this track. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 uniprot UniProt UniProt SwissProt/TrEMBL Protein Annotations Genes and Gene Predictions Description This track shows protein sequences and annotations on them from the UniProt/SwissProt database, mapped to genomic coordinates. UniProt/SwissProt data has been curated from scientific publications by the UniProt staff, UniProt/TrEMBL data has been predicted by various computational algorithms. The annotations are divided into multiple subtracks, based on their "feature type" in UniProt. The first two subtracks below - one for SwissProt, one for TrEMBL - show the alignments of protein sequences to the genome, all other tracks below are the protein annotations mapped through these alignments to the genome. Track Name Description UCSC Alignment, SwissProt = curated protein sequences Protein sequences from SwissProt mapped to the genome. All other tracks are (start,end) SwissProt annotations on these sequences mapped through this alignment. Even protein sequences without a single curated annotation (splice isoforms) are visible in this track. Each UniProt protein has one main isoform, which is colored in dark. Alternative isoforms are sequences that do not have annotations on them and are colored in light-blue. They can be hidden with the TrEMBL/Isoform filter (see below). UCSC Alignment, TrEMBL = predicted protein sequences Protein sequences from TrEMBL mapped to the genome. All other tracks below are (start,end) TrEMBL annotations mapped to the genome using this track. This track is hidden by default. To show it, click its checkbox on the track configuration page. UniProt Signal Peptides Regions found in proteins destined to be secreted, generally cleaved from mature protein. UniProt Extracellular Domains Protein domains with the comment "Extracellular". UniProt Transmembrane Domains Protein domains of the type "Transmembrane". UniProt Cytoplasmic Domains Protein domains with the comment "Cytoplasmic". UniProt Polypeptide Chains Polypeptide chain in mature protein after post-processing. UniProt Regions of Interest Regions that have been experimentally defined, such as the role of a region in mediating protein-protein interactions or some other biological process. UniProt Domains Protein domains, zinc finger regions and topological domains. UniProt Disulfide Bonds Disulfide bonds. UniProt Amino Acid Modifications Glycosylation sites, modified residues and lipid moiety-binding regions. UniProt Amino Acid Mutations Mutagenesis sites and sequence variants. UniProt Protein Primary/Secondary Structure Annotations Beta strands, helices, coiled-coil regions and turns. UniProt Sequence Conflicts Differences between Genbank sequences and the UniProt sequence. UniProt Repeats Regions of repeated sequence motifs or repeated domains. UniProt Other Annotations All other annotations, e.g. compositional bias For consistency and convenience for users of mutation-related tracks, the subtrack "UniProt/SwissProt Variants" is a copy of the track "UniProt Variants" in the track group "Phenotype and Literature", or "Variation and Repeats", depending on the assembly. Display Conventions and Configuration Genomic locations of UniProt/SwissProt annotations are labeled with a short name for the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide" etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt record for more details. TrEMBL annotations are always shown in light blue, except in the Signal Peptides, Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks. Mouse over a feature to see the full UniProt annotation comment. For variants, the mouse over will show the full name of the UniProt disease acronym. The subtracks for domains related to subcellular location are sorted from outside to inside of the cell: Signal peptide, extracellular, transmembrane, and cytoplasmic. Features in the "UniProt Modifications" (modified residues) track are drawn in light green. Disulfide bonds are shown in dark grey. Topological domains in maroon and zinc finger regions in olive green. Duplicate annotations are removed as far as possible: if a TrEMBL annotation has the same genome position and same feature type, comment, disease and mutated amino acids as a SwissProt annotation, it is not shown again. Two annotations mapped through different protein sequence alignments but with the same genome coordinates are only shown once. On the configuration page of this track, you can choose to hide any TrEMBL annotations. This filter will also hide the UniProt alternative isoform protein sequences because both types of information are less relevant to most users. Please contact us if you want more detailed filtering features. Note that for the human hg38 assembly and SwissProt annotations, there also is a public track hub prepared by UniProt itself, with genome annotations maintained by UniProt using their own mapping method based on those Gencode/Ensembl gene models that are annotated in UniProt for a given protein. For proteins that differ from the genome, UniProt's mapping method will, in most cases, map a protein and its annotations to an unexpected location (see below for details on UCSC's mapping method). Methods Briefly, UniProt protein sequences were aligned to the transcripts associated with the protein, the top-scoring alignments were retained, and the result was projected to the genome through a transcript-to-genome alignment. Depending on the genome, the transcript-genome alignments was either provided by the source database (NBCI RefSeq), created at UCSC (UCSC RefSeq) or derived from the transcripts (Ensembl/Augustus). The transcript set is NCBI RefSeq for hg38, UCSC RefSeq for hg19 (due to alt/fix haplotype misplacements in the NCBI RefSeq set on hg19). For other genomes, RefSeq, Ensembl and Augustus are tried, in this order. The resulting protein-genome alignments of this process are available in the file formats for liftOver or pslMap from our data archive (see "Data Access" section below). An important step of the mapping process protein -> transcript -> genome is filtering the alignment from protein to transcript. Due to differences between the UniProt proteins and the transcripts (proteins were made many years before the transcripts were made, and human genomes have variants), the transcript with the highest BLAST score when aligning the protein to all transcripts is not always the correct transcript for a protein sequence. Therefore, the protein sequence is aligned to only a very short list of one or sometimes more transcripts, selected by a three-step procedure: Use transcripts directly annotated by UniProt: for organisms that have a RefSeq transcript track, proteins are aligned to the RefSeq transcripts that are annotated by UniProt for this particular protein. Use transcripts for NCBI Gene ID annotated by UniProt: If no transcripts are annotated on the protein, or the annotated ones have been deprecated by NCBI, but a NCBI Gene ID is annotated, the RefSeq transcripts for this Gene ID are used. This can result in multiple matching transcripts for a protein. Use best matching transcript: If no NCBI Gene is annotated, then BLAST scores are used to pick the transcripts. There can be multiple transcripts for one protein, as their coding sequences can be identical. All transcripts within 1% of the highest observed BLAST score are used. For strategy 2 and 3, many of the transcripts found do not differ in coding sequence, so the resulting alignments on the genome will be identical. Therefore, any identical alignments are removed in a final filtering step. The details page of these alignments will contain a list of all transcripts that result in the same protein-genome alignment. On hg38, only a handful of edge cases (pseudogenes, very recently added proteins) remain in 2023 where strategy 3 has to be used. In other words, when an NCBI or UCSC RefSeq track is used for the mapping and to align a protein sequence to the correct transcript, we use a three stage process: If UniProt has annotated a given RefSeq transcript for a given protein sequence, the protein is aligned to this transcript. Any difference in the version suffix is tolerated in this comparison. If no transcript is annotated or the transcript cannot be found in the NCBI/UCSC RefSeq track, the UniProt-annotated NCBI Gene ID is resolved to a set of NCBI RefSeq transcript IDs via the most current version of NCBI genes tables. Only the top match of the resulting alignments and all others within 1% of its score are used for the mapping. If no transcript can be found after step (2), the protein is aligned to all transcripts, the top match, and all others within 1% of its score are used. This system was designed to resolve the problem of incorrect mappings of proteins, mostly on hg38, due to differences between the SwissProt sequences and the genome reference sequence, which has changed since the proteins were defined. The problem is most pronounced for gene families composed of either very repetitive or very similar proteins. To make sure that the alignments always go to the best chromosome location, all _alt and _fix reference patch sequences are ignored for the alignment, so the patches are entirely free of UniProt annotations. Please contact us if you have feedback on this process or example edge cases. We are not aware of a way to evaluate the results completely and in an automated manner. Proteins were aligned to transcripts with TBLASTN, converted to PSL, filtered with pslReps (93% query coverage, keep alignments within top 1% score), lifted to genome positions with pslMap and filtered again with pslReps. UniProt annotations were obtained from the UniProt XML file. The UniProt annotations were then mapped to the genome through the alignment described above using the pslMap program. This approach draws heavily on the LS-SNP pipeline by Mark Diekhans. Like all Genome Browser source code, the main script used to build this track can be found on Github. Older releases This track is automatically updated on an ongoing basis, every 2-3 months. The current version name is always shown on the track details page, it includes the release of UniProt, the version of the transcript set and a unique MD5 that is based on the protein sequences, the transcript sequences, the mapping file between both and the transcript-genome alignment. The exact transcript that was used for the alignment is shown when clicking a protein alignment in one of the two alignment tracks. For reproducibility of older analysis results and for manual inspection, previous versions of this track are available for browsing in the form of the UCSC UniProt Archive Track Hub (click this link to connect the hub now). The underlying data of all releases of this track (past and current) can be obtained from our downloads server, including the UniProt protein-to-genome alignment. Data Access The raw data of the current track can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the genome annotation is stored in a bigBed file that can be downloaded from the download server. The exact filenames can be found in the track configuration file. Annotations can be converted to ASCII text by our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/galGal5/uniprot/unipStruct.bb -chrom=chr6 -start=0 -end=1000000 stdout Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. Lifting from UniProt to genome coordinates in pipelines To facilitate mapping protein coordinates to the genome, we provide the alignment files in formats that are suitable for our command line tools. Our command line programs liftOver or pslMap can be used to map coordinates on protein sequences to genome coordinates. The filenames are unipToGenome.over.chain.gz (liftOver) and unipToGenomeLift.psl.gz (pslMap). Example commands: wget -q https://hgdownload.soe.ucsc.edu/goldenPath/archive/hg38/uniprot/2022_03/unipToGenome.over.chain.gz wget -q https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver chmod a+x liftOver echo 'Q99697 1 10 annotationOnProtein' > prot.bed liftOver prot.bed unipToGenome.over.chain.gz genome.bed cat genome.bed Credits This track was created by Maximilian Haeussler at UCSC, with a lot of input from Chris Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff, Alejo Mujica, Regeneron Pharmaceuticals and Pia Riestra, GeneDx. Thanks to UniProt for making all data available for download. References UniProt Consortium. Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5. PMID: 22102590; PMC: PMC3245120 Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A. The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants. Hum Mutat. 2004 May;23(5):464-70. PMID: 15108278 unipConflict Seq. Conflicts UniProt Sequence Conflicts Genes and Gene Predictions unipRepeat Repeats UniProt Repeats Genes and Gene Predictions unipStruct Structure UniProt Protein Primary/Secondary Structure Annotations Genes and Gene Predictions unipOther Other Annot. UniProt Other Annotations Genes and Gene Predictions unipMut Mutations UniProt Amino Acid Mutations Genes and Gene Predictions unipModif AA Modifications UniProt Amino Acid Modifications Genes and Gene Predictions unipDomain Domains UniProt Domains Genes and Gene Predictions unipDisulfBond Disulf. Bonds UniProt Disulfide Bonds Genes and Gene Predictions unipChain Chains UniProt Mature Protein Products (Polypeptide Chains) Genes and Gene Predictions unipLocCytopl Cytoplasmic UniProt Cytoplasmic Domains Genes and Gene Predictions unipLocTransMemb Transmembrane UniProt Transmembrane Domains Genes and Gene Predictions unipInterest Interest UniProt Regions of Interest Genes and Gene Predictions unipLocExtra Extracellular UniProt Extracellular Domain Genes and Gene Predictions unipLocSignal Signal Peptide UniProt Signal Peptides Genes and Gene Predictions unipAliTrembl TrEMBL Aln. UCSC alignment of TrEMBL proteins to genome Genes and Gene Predictions unipAliSwissprot SwissProt Aln. UCSC alignment of SwissProt proteins to genome (dark blue: main isoform, light blue: alternative isoforms) Genes and Gene Predictions windowmaskerSdust WM + SDust Genomic Intervals Masked by WindowMasker + SDust Variation and Repeats Description This track depicts masked sequence as determined by WindowMasker. The WindowMasker tool is included in the NCBI C++ toolkit. The source code for the entire toolkit is available from the NCBI FTP site. Methods To create this track, WindowMasker was run with the following parameters: windowmasker -mk_counts true -input galGal5.fa -output wm_counts windowmasker -ustat wm_counts -sdust true -input galGal5.fa -output repeats.bed The repeats.bed (BED3) file was loaded into the "windowmaskerSdust" table for this track. References Morgulis A, Gertz EM, Schäffer AA, Agarwala R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics. 2006 Jan 15;22(2):134-41. PMID: 16287941 chainNetAquChr2 Golden eagle Chain/Net Golden eagle (Oct. 2014 (aquChr-1.0.2/aquChr2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of golden eagle (Oct. 2014 (aquChr-1.0.2/aquChr2)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both golden eagle and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the golden eagle assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best golden eagle/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The golden eagle sequence used in this annotation is from the Oct. 2014 (aquChr-1.0.2/aquChr2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the golden eagle/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single golden eagle chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetAquChr2Viewnet Net Golden eagle (Oct. 2014 (aquChr-1.0.2/aquChr2)), Chain and Net Alignments Comparative Genomics netAquChr2 Golden eagle Net Golden eagle (Oct. 2014 (aquChr-1.0.2/aquChr2)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of golden eagle (Oct. 2014 (aquChr-1.0.2/aquChr2)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both golden eagle and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the golden eagle assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best golden eagle/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The golden eagle sequence used in this annotation is from the Oct. 2014 (aquChr-1.0.2/aquChr2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the golden eagle/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single golden eagle chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetAquChr2Viewchain Chain Golden eagle (Oct. 2014 (aquChr-1.0.2/aquChr2)), Chain and Net Alignments Comparative Genomics chainAquChr2 Golden eagle Chain Golden eagle (Oct. 2014 (aquChr-1.0.2/aquChr2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of golden eagle (Oct. 2014 (aquChr-1.0.2/aquChr2)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both golden eagle and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the golden eagle assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best golden eagle/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The golden eagle sequence used in this annotation is from the Oct. 2014 (aquChr-1.0.2/aquChr2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the golden eagle/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single golden eagle chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetRn6 Rat Chain/Net Rat (Jul. 2014 (RGSC 6.0/rn6)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of rat (Jul. 2014 (RGSC 6.0/rn6)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both rat and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the rat assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best rat/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The rat sequence used in this annotation is from the Jul. 2014 (RGSC 6.0/rn6) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the rat/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single rat chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetRn6Viewnet Net Rat (Jul. 2014 (RGSC 6.0/rn6)), Chain and Net Alignments Comparative Genomics netRn6 Rat Net Rat (Jul. 2014 (RGSC 6.0/rn6)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of rat (Jul. 2014 (RGSC 6.0/rn6)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both rat and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the rat assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best rat/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The rat sequence used in this annotation is from the Jul. 2014 (RGSC 6.0/rn6) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the rat/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single rat chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetRn6Viewchain Chain Rat (Jul. 2014 (RGSC 6.0/rn6)), Chain and Net Alignments Comparative Genomics chainRn6 Rat Chain Rat (Jul. 2014 (RGSC 6.0/rn6)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of rat (Jul. 2014 (RGSC 6.0/rn6)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both rat and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the rat assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best rat/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The rat sequence used in this annotation is from the Jul. 2014 (RGSC 6.0/rn6) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the rat/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single rat chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMm10 Mouse Chain/Net Mouse (Dec. 2011 (GRCm38/mm10)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of mouse (Dec. 2011 (GRCm38/mm10)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both mouse and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the mouse assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best mouse/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The mouse sequence used in this annotation is from the Dec. 2011 (GRCm38/mm10) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the mouse/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single mouse chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMm10Viewnet Net Mouse (Dec. 2011 (GRCm38/mm10)), Chain and Net Alignments Comparative Genomics netMm10 Mouse Net Mouse (Dec. 2011 (GRCm38/mm10)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of mouse (Dec. 2011 (GRCm38/mm10)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both mouse and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the mouse assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best mouse/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The mouse sequence used in this annotation is from the Dec. 2011 (GRCm38/mm10) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the mouse/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single mouse chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMm10Viewchain Chain Mouse (Dec. 2011 (GRCm38/mm10)), Chain and Net Alignments Comparative Genomics chainMm10 Mouse Chain Mouse (Dec. 2011 (GRCm38/mm10)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of mouse (Dec. 2011 (GRCm38/mm10)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both mouse and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the mouse assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best mouse/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The mouse sequence used in this annotation is from the Dec. 2011 (GRCm38/mm10) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the mouse/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single mouse chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetHg38 Human Chain/Net Human (Dec. 2013 (GRCh38/hg38)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of human (Dec. 2013 (GRCh38/hg38)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both human and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the human assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best human/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The human sequence used in this annotation is from the Dec. 2013 (GRCh38/hg38) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the human/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single human chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetHg38Viewnet Net Human (Dec. 2013 (GRCh38/hg38)), Chain and Net Alignments Comparative Genomics netHg38 Human Net Human (Dec. 2013 (GRCh38/hg38)) Alignment net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of human (Dec. 2013 (GRCh38/hg38)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both human and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the human assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best human/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The human sequence used in this annotation is from the Dec. 2013 (GRCh38/hg38) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the human/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single human chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetHg38Viewchain Chain Human (Dec. 2013 (GRCh38/hg38)), Chain and Net Alignments Comparative Genomics chainHg38 Human Chain Human (Dec. 2013 (GRCh38/hg38)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of human (Dec. 2013 (GRCh38/hg38)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both human and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the human assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best human/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The human sequence used in this annotation is from the Dec. 2013 (GRCh38/hg38) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the human/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single human chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMelGal5 Turkey Chain/Net Turkey (Nov. 2014 (Turkey_5.0/melGal5)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of turkey (Nov. 2014 (Turkey_5.0/melGal5)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both turkey and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the turkey assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best turkey/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The turkey sequence used in this annotation is from the Nov. 2014 (Turkey_5.0/melGal5) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the turkey/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single turkey chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMelGal5Viewnet Net Turkey (Nov. 2014 (Turkey_5.0/melGal5)), Chain and Net Alignments Comparative Genomics netMelGal5 Turkey Net Turkey (Nov. 2014 (Turkey_5.0/melGal5)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of turkey (Nov. 2014 (Turkey_5.0/melGal5)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both turkey and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the turkey assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best turkey/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The turkey sequence used in this annotation is from the Nov. 2014 (Turkey_5.0/melGal5) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the turkey/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single turkey chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMelGal5Viewchain Chain Turkey (Nov. 2014 (Turkey_5.0/melGal5)), Chain and Net Alignments Comparative Genomics chainMelGal5 Turkey Chain Turkey (Nov. 2014 (Turkey_5.0/melGal5)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of turkey (Nov. 2014 (Turkey_5.0/melGal5)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both turkey and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the turkey assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best turkey/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The turkey sequence used in this annotation is from the Nov. 2014 (Turkey_5.0/melGal5) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the turkey/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single turkey chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetXenTro9 X. tropicalis Chain/Net X. tropicalis (Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of X. tropicalis (Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both X. tropicalis and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the X. tropicalis assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best X. tropicalis/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The X. tropicalis sequence used in this annotation is from the Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the X. tropicalis/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single X. tropicalis chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetXenTro9Viewnet Net X. tropicalis (Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)), Chain and Net Alignments Comparative Genomics netXenTro9 X. tropicalis Net X. tropicalis (Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of X. tropicalis (Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both X. tropicalis and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the X. tropicalis assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best X. tropicalis/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The X. tropicalis sequence used in this annotation is from the Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the X. tropicalis/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single X. tropicalis chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetXenTro9Viewchain Chain X. tropicalis (Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)), Chain and Net Alignments Comparative Genomics chainXenTro9 X. tropicalis Chain X. tropicalis (Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of X. tropicalis (Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both X. tropicalis and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the X. tropicalis assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best X. tropicalis/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The X. tropicalis sequence used in this annotation is from the Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the X. tropicalis/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single X. tropicalis chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetDanRer10 Zebrafish Chain/Net Zebrafish (Sep. 2014 (GRCz10/danRer10)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebrafish (Sep. 2014 (GRCz10/danRer10)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebrafish and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebrafish assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebrafish/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebrafish sequence used in this annotation is from the Sep. 2014 (GRCz10/danRer10) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebrafish/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebrafish chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetDanRer10Viewnet Net Zebrafish (Sep. 2014 (GRCz10/danRer10)), Chain and Net Alignments Comparative Genomics netDanRer10 Zebrafish Net Zebrafish (Sep. 2014 (GRCz10/danRer10)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebrafish (Sep. 2014 (GRCz10/danRer10)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebrafish and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebrafish assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebrafish/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebrafish sequence used in this annotation is from the Sep. 2014 (GRCz10/danRer10) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebrafish/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebrafish chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetDanRer10Viewchain Chain Zebrafish (Sep. 2014 (GRCz10/danRer10)), Chain and Net Alignments Comparative Genomics chainDanRer10 Zebrafish Chain Zebrafish (Sep. 2014 (GRCz10/danRer10)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebrafish (Sep. 2014 (GRCz10/danRer10)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebrafish and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebrafish assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebrafish/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebrafish sequence used in this annotation is from the Sep. 2014 (GRCz10/danRer10) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebrafish/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebrafish chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961