cytoBandIdeo Chromosome Band (Ideogram) bed 4 + Chromosome Bands Localized by FISH Mapping Clones (for Ideogram) 1 0.1 0 0 0 127 127 127 0 0 0 map 1 group map\ longLabel Chromosome Bands Localized by FISH Mapping Clones (for Ideogram)\ priority .1\ shortLabel Chromosome Band (Ideogram)\ track cytoBandIdeo\ type bed 4 +\ visibility dense\ netRBestGorGor1 Gorilla RBest Net netAlign gorGor1 chainGorGor1 Gorilla (Oct. 2008 (Sanger 0.1/gorGor1)) Reciprocal Best Alignment Net 0 1 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Reciprocal Best Alignment Net\ otherDb gorGor1\ parent rBestNet\ priority 1\ shortLabel $o_Organism RBest Net\ spectrum on\ track netRBestGorGor1\ type netAlign gorGor1 chainGorGor1\ visibility hide\ netSyntenyPanTro2 Chimp Syn Net netAlign panTro2 chainPanTro2 Chimp (Mar. 2006 (CGSC 2.1/panTro2)) Syntenic Alignment Net 0 1 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb panTro2\ parent syntenicNet\ priority 1\ shortLabel $o_Organism Syn Net\ spectrum on\ track netSyntenyPanTro2\ type netAlign panTro2 chainPanTro2\ visibility hide\ encodeEgaspPartAceCons ACEScan Cons Alt genePred ACEScan Conserved Alternative Exon Predictions 0 1 100 12 100 177 133 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 100,12,100\ longLabel ACEScan Conserved Alternative Exon Predictions\ parent encodeEgaspPartial\ priority 1\ shortLabel ACEScan Cons Alt\ track encodeEgaspPartAceCons\ encodeEgaspFullAceview AceView genePred AceView Gene Predictions 0 1 22 150 20 138 202 137 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 22,150,20\ longLabel AceView Gene Predictions\ parent encodeEgaspFull\ priority 1\ shortLabel AceView\ track encodeEgaspFullAceview\ encodeAffyChIpHl60PvalBrg1Hr00 Affy Brg1 RA 0h wig 0.0 534.54 Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 0hrs) P-Value 0 1 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 225,0,0\ longLabel Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 0hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 1\ shortLabel Affy Brg1 RA 0h\ subGroups factor=Brg1 time=0h\ track encodeAffyChIpHl60PvalBrg1Hr00\ encodeTransFragsAffyDistal Affy Dist bed 4 Affy Distal 0 1 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Affy Distal\ parent encodeTransFrags\ priority 1\ shortLabel Affy Dist\ track encodeTransFragsAffyDistal\ affyExonTissues Affy Exon Tissues expRatio Affymetrix Exon Array 1.0: Normal Tissues 2 1 0 0 0 127 127 127 0 0 0

Methods

\

\ RNA (from a commercial source) from 11 tissues were hybridized to Affymetrix \ Human Exon 1.0 ST arrays. For each tissue, 3 replicate experiments were \ performed \ for a total of 33 arrays. The raw intensity signal from the arrays\ was normalized\ with a quantile normalization method, then run through the PLIER algorithm.\ The normalized data were then converted to median-centered log-ratios, \ which are displayed as green for negative log-ratios (below-median expression),\ and red for positive (above-median expression).

\ \

The probe sets for this microarray track are shown in the \ the Affy Exon Probes track.

\ \

Credits

\

\ This track was produced by Andy Pohl, Kayla Smith, and Pauline Fujita of the \ genome browser group at UCSC, Melissa Cline of the \ Ares lab at UCSC, and \ Chuck Sugnet at Affymetrix, based on \ \ sample exon array data available from Affymetrix, produced by Tyson Clark. \

\ \

References

\

\ Pohl AA, Sugnet CW, Clark TA, Smith K, Fujita PA, Cline MS.\ Affy Exon Tissues: Exon Levels in Normal Tissues in Human, Mouse, and Rat.\ Bioinformatics.2009 Sept 15;25(18):2442-3. \

\

Links

\

\ expression 1 expScale 3.0\ expStep 0.5\ expTable affyExonTissuesExps\ group expression\ groupings affyExonTissuesGroups\ longLabel Affymetrix Exon Array 1.0: Normal Tissues\ priority 1.0\ shortLabel Affy Exon Tissues\ superTrack affyAllExonSuper dense\ track affyExonTissues\ type expRatio\ visibility full\ encodeAffyChIpHl60SignalStrictH3K9K14DHr00 Affy H3K9ac2 0h wig -2.78 3.97 Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 0hrs) Strict Signal 0 1 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 225,0,0\ longLabel Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 0hrs) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 1\ shortLabel Affy H3K9ac2 0h\ subGroups factor=H3K9K14ac2 time=0h\ track encodeAffyChIpHl60SignalStrictH3K9K14DHr00\ encodeAffyChIpHl60SitesStrictH3K9K14DHr00 Affy H3K9ac2 0h bed 3 . Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 0hrs) Strict Sites 0 1 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 225,0,0\ longLabel Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 0hrs) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 1\ shortLabel Affy H3K9ac2 0h\ subGroups factor=H3K9K14ac2 time=0h\ track encodeAffyChIpHl60SitesStrictH3K9K14DHr00\ encodeAffyChIpHl60PvalStrictH3K9K14DHr00 Affy H3K9ac2 0h wig 0 696.62 Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 0hrs) Strict P-Value 0 1 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 225,0,0\ longLabel Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 0hrs) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 1\ shortLabel Affy H3K9ac2 0h\ subGroups factor=H3K9K14ac2 time=0h\ track encodeAffyChIpHl60PvalStrictH3K9K14DHr00\ encodeAffyRnaGm06990SitesIntronsProximal Affy In Prx GM06990 bed 4 . Affy Intronic Proximal GM06990 Transfrags 0 1 248 0 8 251 127 131 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 248,0,8\ longLabel Affy Intronic Proximal GM06990 Transfrags\ parent encodeNoncodingTransFrags\ priority 1\ shortLabel Affy In Prx GM06990\ subGroups region=intronicProximal celltype=gm06990 source=affy\ track encodeAffyRnaGm06990SitesIntronsProximal\ encodeAffyRnaGm06990Signal Affy RNA GM06990 wig -1168.00 1686.5 Affymetrix PolyA+ RNA (GM06990) Signal 0 1 150 90 0 202 172 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 150,90,0\ longLabel Affymetrix PolyA+ RNA (GM06990) Signal\ parent encodeAffyRnaSignal\ priority 1\ shortLabel Affy RNA GM06990\ track encodeAffyRnaGm06990Signal\ encodeAffyRnaGm06990Sites Affy RNA GM06990 bed 3 . Affymetrix PolyA+ RNA (GM06990) Sites 0 1 150 90 0 202 172 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 150,90,0\ longLabel Affymetrix PolyA+ RNA (GM06990) Sites\ parent encodeAffyRnaTransfrags\ priority 1\ shortLabel Affy RNA GM06990\ track encodeAffyRnaGm06990Sites\ snpArrayAffy6 Affy SNP 6.0 bed 6 + Affymetrix SNP 6.0 0 1 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Affymetrix SNP 6.0\ parent snpArray\ priority 1\ shortLabel Affy SNP 6.0\ track snpArrayAffy6\ type bed 6 +\ encodeHapMapAlleleFreqCEU Allele Freq CEU bed 6 + HapMap Minor Allele Frequencies CEPH (CEU) 0 1 0 0 0 127 127 127 1 0 7 chr2,chr4,chr7,chr8,chr9,chr12,chr18, encodeVariation 1 longLabel HapMap Minor Allele Frequencies CEPH (CEU)\ parent encodeHapMapAlleleFreq\ priority 1\ shortLabel Allele Freq CEU\ track encodeHapMapAlleleFreqCEU\ encodeRegulomeAmpliconOdd Amplicon (Odd) bed 5 . Amplicon (Odd) 0 1 0 0 0 127 127 127 1 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 1 longLabel Amplicon (Odd)\ parent encodeRegulomeAmplicon\ priority 1\ shortLabel Amplicon (Odd)\ track encodeRegulomeAmpliconOdd\ encodeEgaspUpdAugustusAbinitio Augustus Update genePred Augustus Ab Initio Gene Predictions 0 1 12 50 200 133 152 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,50,200\ longLabel Augustus Ab Initio Gene Predictions\ parent encodeEgaspUpdate\ priority 1\ shortLabel Augustus Update\ track encodeEgaspUpdAugustusAbinitio\ encodeBuFirstExonCerebrum BU Cere. Cortex bed 12 + Boston University First Exon Activity in Cerebral Cortex 0 1 0 0 0 127 127 127 0 0 10 chr11,chr13,chr15,chr16,chr19,chr2,chr5,chr7,chr9,chrX, encodeTxLevels 1 longLabel Boston University First Exon Activity in Cerebral Cortex\ parent encodeBuFirstExon\ priority 1\ shortLabel BU Cere. Cortex\ track encodeBuFirstExonCerebrum\ encodeRegulomeQualityCACO2 CACO2 bed 5 . CACO2 Quality 0 1 60 50 240 157 152 247 1 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 1 color 60,50,240\ longLabel CACO2 Quality\ parent encodeRegulomeQuality\ priority 1\ shortLabel CACO2\ track encodeRegulomeQualityCACO2\ encodeRegulomeProbCACO2 CACO2 bedGraph 4 CACO2 DNaseI HSs 0 1 60 50 240 157 152 247 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 60,50,240\ longLabel CACO2 DNaseI HSs\ parent encodeRegulomeProb\ priority 1\ shortLabel CACO2\ track encodeRegulomeProbCACO2\ encodeRegulomeBaseCACO2 CACO2 wig 0.0 3.0 CACO2 DNaseI Sensitivity 0 1 60 50 240 157 152 247 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 60,50,240\ longLabel CACO2 DNaseI Sensitivity\ parent encodeRegulomeBase\ priority 1\ shortLabel CACO2\ track encodeRegulomeBaseCACO2\ cccTrendPvalBd CCC Bipolar Dis chromGraph Case Control Consortium bipolar disorder trend -log10 P-value 0 1 0 0 0 127 127 127 0 0 0 phenDis 0 longLabel Case Control Consortium bipolar disorder trend -log10 P-value\ parent caseControl\ priority 1\ shortLabel CCC Bipolar Dis\ track cccTrendPvalBd\ cytoBand Chromosome Band bed 4 + Chromosome Bands Localized by FISH Mapping Clones 0 1 0 0 0 127 127 127 0 0 0

Description

\

\ The chromosome band track represents the approximate \ location of bands seen on Giemsa-stained chromosomes.\ Chromosomes are displayed in the browser with the short arm first. \ Cytologically identified bands on the chromosome are numbered outward \ from the centromere on the short (p) and long (q) arms. At low resolution, \ bands are classified using the nomenclature \ [chromosome][arm][band], where band is a \ single digit. Examples of bands on chromosome 3 include 3p2, 3p1, cen, 3q1, \ and 3q2. At a finer resolution, some of the bands are subdivided into \ sub-bands, adding a second digit to the band number, e.g. 3p26. This \ resolution produces about 500 bands. A final subdivision into a \ total of 862 sub-bands is made by adding a period and another digit to the \ band, resulting in 3p26.3, 3p26.2, etc.

\ \

Methods

\

\ A full description of the method by which the chromosome band locations are \ estimated can be found in Furey and Haussler, 2003.\

\ Barbara Trask, Vivian Cheung, Norma Nowak and others in the BAC Resource\ Consortium used fluorescent in-situ hybridization (FISH) to determine a \ cytogenetic location for large genomic clones on the chromosomes.\ The results from these experiments are the primary source of information used\ in estimating the chromosome band locations.\ For more information about the process, see the paper, BAC Resource Consortium, \ et al., 2001. and the accompanying web site,\ Human BAC Resource.

\

\ BAC clone placements in the human sequence are determined at UCSC using a \ combination of full BAC clone sequence, BAC end sequence, and STS marker \ information.

\ \

Credits

\

\ We would like to thank all the labs that have contributed to this resource:\

\ \

References

\

\ Furey TS and Haussler D.\ Integration of the cytogenetic map with the draft human genome \ sequence, Hum Molec Genet. 2003;12(9):1037-44.\

\ \

\ BAC Resource Consortium, Cheung VG, Nowak N, Jang W, Kirsch IR, Zhao S, \ Chen XN, Furey TS, Kim UJ, Kuo WL et al..\ Integration of cytogenetic landmarks into the draft sequence of\ the human genome, Nature. 2001;409:953-98.\

\ \ map 1 group map\ longLabel Chromosome Bands Localized by FISH Mapping Clones\ priority 1\ shortLabel Chromosome Band\ track cytoBand\ type bed 4 +\ visibility hide\ snp131Common Common SNPs (131) bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 131) from HapMap and/or 1000Genomes 3 1 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$ varRep 1 longLabel Simple Nucleotide Polymorphisms (dbSNP build 131) from HapMap and/or 1000Genomes\ parent snp131Composite\ priority 1\ shortLabel Common SNPs (131)\ subGroups view=common\ track snp131Common\ visibility pack\ encodePseudogeneConsensus Consensus Pseudogenes genePred Consensus of Yale, Havana-Gencode, UCSC and GIS ENCODE Pseudogenes 0 1 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, http://vega.sanger.ac.uk/Homo_sapiens/geneview?transcript=$$ encodeGenes 1 longLabel Consensus of Yale, Havana-Gencode, UCSC and GIS ENCODE Pseudogenes\ parent encodePseudogene\ priority 1\ shortLabel Consensus Pseudogenes\ track encodePseudogeneConsensus\ url http://vega.sanger.ac.uk/Homo_sapiens/geneview?transcript=$$\ url2 http://www.pseudogene.org/cgi-bin/search-results.cgi?tax_id=9606&set_search=25&criterion0=pgene_by_acc&operator0=%3D&sort=0&output=html&searchValue0_0=$$\ url2Label Yale Pseudogene Link:\ urlLabel Vega Genes Link:\ urlName gene\ kiddEichlerDiscAbc14 Discordant ABC14 bed 12 HGSV Individual ABC14 (CEPH) Discordant Clone End Alignments 0 1 0 0 0 127 127 127 0 0 0 http://mrhgsv.gs.washington.edu/cgi-bin/hgc?i=$$&c=$S&l=$[&r=$]&db=$D&position=$S:$[-$] varRep 1 longLabel HGSV Individual ABC14 (CEPH) Discordant Clone End Alignments\ parent kiddEichlerDisc\ priority 1\ shortLabel Discordant ABC14\ track kiddEichlerDiscAbc14\ encodeDNDSsmall dN/dS 0.0 to 0.2 bed 4 + ENCODE Exons dN/dS 0.0 to 0.2 0 1 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel ENCODE Exons dN/dS 0.0 to 0.2\ parent encodeDNDS\ priority 1\ shortLabel dN/dS 0.0 to 0.2\ track encodeDNDSsmall\ encodeNhgriDnaseHsNonAct DNase CD4 Unact. bed 5 . NHGRI DNaseI-Hypersensitive Sites (CD4+ T-Cells Unactivated) 0 1 0 0 0 127 127 127 1 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 longLabel NHGRI DNaseI-Hypersensitive Sites (CD4+ T-Cells Unactivated)\ parent encodeNhgriDnaseHs\ priority 1\ shortLabel DNase CD4 Unact.\ track encodeNhgriDnaseHsNonAct\ encodeAffyEc1BrainCerebellumSignal EC1 Sgnl BrainC wig 0 62385 Affy Ext Trans Signal (1-base window) (Brain Cerebellum) 0 1 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 0 color 248,0,8\ longLabel Affy Ext Trans Signal (1-base window) (Brain Cerebellum)\ parent encodeAffyEcSignal\ priority 1\ shortLabel EC1 Sgnl BrainC\ track encodeAffyEc1BrainCerebellumSignal\ encodeAffyEc1BrainCerebellumSites EC1 Sites BrainC bed 3 . Affy Ext Trans Sites (1-base window) (Brain Cerebellum) 0 1 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 1 color 248,0,8\ longLabel Affy Ext Trans Sites (1-base window) (Brain Cerebellum)\ parent encodeAffyEcSites\ priority 1\ shortLabel EC1 Sites BrainC\ track encodeAffyEc1BrainCerebellumSites\ eioJcviNASPos EIO/JCVI CD34+ NAS bed 3 . CD34+ cells Nuclease Accessible sites 0 1 100 30 150 177 142 202 0 0 0 regulation 1 color 100,30,150\ longLabel CD34+ cells Nuclease Accessible sites\ parent eioJcviNAS\ priority 1\ shortLabel EIO/JCVI CD34+ NAS\ track eioJcviNASPos\ encodeUncFaireSignal FAIRE Signal bedGraph 4 University of North Carolina FAIRE Signal 0 1 20 150 20 50 100 50 0 0 21 chr1,chr4,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chr10,chrX, encodeChrom 0 longLabel University of North Carolina FAIRE Signal\ parent encodeUncFaire\ priority 1\ shortLabel FAIRE Signal\ track encodeUncFaireSignal\ fox2ClipSeq FOX2 CLIP-seq bed 9 . FOX2 adaptor-trimmed CLIP-seq reads 3 1 0 0 0 127 127 127 0 0 0 regulation 1 itemRgb on\ longLabel FOX2 adaptor-trimmed CLIP-seq reads\ noInherit on\ noScoreFilter .\ parent fox2ClipSeqCompViewreads\ priority 1\ shortLabel FOX2 CLIP-seq\ subGroups view=reads\ track fox2ClipSeq\ type bed 9 .\ encodeAllGencodeExonic Gencode Exonic bed 4 Consensus Gencode Exonic 0 1 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Consensus Gencode Exonic\ parent encodeWorkshopSelections\ priority 1\ shortLabel Gencode Exonic\ track encodeAllGencodeExonic\ encodeGencodeGeneKnownMar07 Gencode Ref genePred Gencode Reference Genes 0 1 33 91 51 144 173 153 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 33,91,51\ longLabel Gencode Reference Genes\ parent encodeGencodeGeneMar07\ priority 1\ shortLabel Gencode Ref\ track encodeGencodeGeneKnownMar07\ encodeGencodeGeneKnownOct05 Gencode Ref genePred Gencode Reference Genes 0 1 33 91 51 144 173 153 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 33,91,51\ longLabel Gencode Reference Genes\ parent encodeGencodeGeneOct05\ priority 1\ shortLabel Gencode Ref\ track encodeGencodeGeneKnownOct05\ encodeGisRnaPetMCF7 GIS RNA MCF7 bed 12 Gene Identification Signature Paired-End Tags of PolyA+ RNA (log phase MCF7) 0 1 0 0 0 127 127 127 0 0 23 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 longLabel Gene Identification Signature Paired-End Tags of PolyA+ RNA (log phase MCF7)\ parent encodeGisRnaPet\ priority 1\ shortLabel GIS RNA MCF7\ track encodeGisRnaPetMCF7\ snpRecombHotspotHapmap HapMap bed 3 . Oxford Recombination Hotspots from HapMap Phase I Release 16c.1 0 1 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Oxford Recombination Hotspots from HapMap Phase I Release 16c.1\ parent snpRecombHotspot\ priority 1\ shortLabel HapMap\ track snpRecombHotspotHapmap\ encodeHapMapCovCEU HapMap Cov CEU wig 0.0 100.0 HapMap Resequencing Coverage CEPH (CEU) 0 1 0 0 0 127 127 127 0 0 7 chr2,chr4,chr7,chr8,chr9,chr12,chr18, encodeVariation 0 longLabel HapMap Resequencing Coverage CEPH (CEU)\ parent encodeHapMapCov\ priority 1\ shortLabel HapMap Cov CEU\ track encodeHapMapCovCEU\ snpRecombRateHapmap HapMap Phase I bedGraph 4 Oxford Recombination Rates from HapMap Phase I Release 16c.1 0 1 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 longLabel Oxford Recombination Rates from HapMap Phase I Release 16c.1\ parent snpRecombRate\ priority 1\ shortLabel HapMap Phase I\ track snpRecombRateHapmap\ hapmapSnpsASW HapMap SNPs ASW bed 6 + HapMap SNPs from the ASW Population (African Ancestry in SouthWestern United States) 0 1 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HapMap SNPs from the ASW Population (African Ancestry in SouthWestern United States)\ parent hapmapSnps\ priority 1\ shortLabel HapMap SNPs ASW\ track hapmapSnpsASW\ hgdpHzyAfrica Hetzgty Africa bedGraph 4 Human Genome Diversity Proj Smoothd Expec Heterozygosity (Africa) 0 1 224 0 0 239 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 224, 0, 0\ longLabel Human Genome Diversity Proj Smoothd Expec Heterozygosity (Africa)\ parent hgdpHzy\ priority 1\ shortLabel Hetzgty Africa\ track hgdpHzyAfrica\ hgdpIhsBantu iHS Bantu bedGraph 4 Human Genome Diversity Project iHS (Bantu populations in Africa) 0 1 224 0 0 239 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 color 224, 0, 0\ longLabel Human Genome Diversity Project iHS (Bantu populations in Africa)\ parent hgdpIhs\ priority 1\ shortLabel iHS Bantu\ track hgdpIhsBantu\ encodeGencodeIntronsProximal Intronic Prox bed 4 . Gencode Intronic Proximal Regions 0 1 0 0 0 127 127 127 0 0 0 encodeAnalysis 1 longLabel Gencode Intronic Proximal Regions\ parent encodeGencodeRegions\ priority 1\ shortLabel Intronic Prox\ track encodeGencodeIntronsProximal\ iscaRetrospectiveBenign ISCA Ret Benign gvf Internat. Stds. for Cytogen. Arrays Consort. (ISCA) - Retrospective (Benign) 0 1 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Internat. Stds. for Cytogen. Arrays Consort. (ISCA) - Retrospective (Benign)\ parent iscaRetrospectiveComposite\ priority 1\ shortLabel ISCA Ret Benign\ track iscaRetrospectiveBenign\ L1_LINE L1_LINE bed 5 L1 LINEs for Intersection 0 1 0 0 0 127 127 127 0 0 0 encodeAnalysis 1 longLabel L1 LINEs for Intersection\ parent encodeWorkshopIntersections\ priority 1\ shortLabel L1_LINE\ track L1_LINE\ hapmapLdYri LD YRI bed 4 + Linkage Disequilibrium for the Yoruba (YRI) 0 1 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 longLabel Linkage Disequilibrium for the Yoruba (YRI)\ parent hapmapLd\ priority 1\ shortLabel LD YRI\ track hapmapLdYri\ encodeUcsdChipHeLaH3H4dmH3K4_p0 LI H3K4me2 -gIF bedGraph 4 Ludwig Institute ChIP-chip: H3K4me2 ab, HeLa cells, no gamma interferon 0 1 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: H3K4me2 ab, HeLa cells, no gamma interferon\ parent encodeLIChIPgIF\ priority 1\ shortLabel LI H3K4me2 -gIF\ track encodeUcsdChipHeLaH3H4dmH3K4_p0\ encodeUcsdNgHeLaH3K4me3_p0 LI Ng H3K4m3 -gIF bedGraph 4 Ludwig Institute/UCSD ChIP/Chip Ng: HeLa, H3K4me3, no gamma interferon 0 1 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Ludwig Institute/UCSD ChIP/Chip Ng: HeLa, H3K4me3, no gamma interferon\ parent encodeUcsdNgGif\ priority 1\ shortLabel LI Ng H3K4m3 -gIF\ track encodeUcsdNgHeLaH3K4me3_p0\ encodeUcsdChipRnapHela_f LI Pol2 HeLa bedGraph 4 Ludwig Institute ChIP-chip: Pol2 8WG16 ab, HeLa cells 0 1 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Ludwig Institute ChIP-chip: Pol2 8WG16 ab, HeLa cells\ parent encodeLIChIP\ priority 1\ shortLabel LI Pol2 HeLa\ track encodeUcsdChipRnapHela_f\ jaxQtlAsIs MGI Mouse QTL bed 4 . MGI Mouse QTLs Coarsely Mapped to Human 0 1 200 100 0 227 177 127 0 0 0 http://www.informatics.jax.org/searches/accession_report.cgi?id=$$ phenDis 1 color 200,100,0\ longLabel MGI Mouse QTLs Coarsely Mapped to $Organism\ parent jaxQtlMapped\ priority 1\ shortLabel MGI Mouse QTL\ track jaxQtlAsIs\ encodeMlaganPhastConsEl MLAGAN PhastCons bed 5 . MLAGAN PhastCons Conserved Elements 0 1 170 100 50 212 177 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 170,100,50\ longLabel MLAGAN PhastCons Conserved Elements\ parent encodeMlaganElements\ priority 1\ shortLabel MLAGAN PhastCons\ track encodeMlaganPhastConsEl\ encodeMlaganPhastCons MLAGAN PhastCons wig 0.0 1.0 MLAGAN PhastCons Conservation 0 1 170 100 50 212 177 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 0 autoScale Off\ color 170,100,50\ longLabel MLAGAN PhastCons Conservation\ maxHeightPixels 100:25:11\ noInherit on\ parent encodeMlaganCons\ priority 1\ shortLabel MLAGAN PhastCons\ track encodeMlaganPhastCons\ type wig 0.0 1.0\ windowingFunction mean\ encodeNhgriDnaseHs2 NHGRI DNAseI HS bed 4 NHGRI DNAseI HS 0 1 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel NHGRI DNAseI HS\ parent encodeDnase\ priority 1\ shortLabel NHGRI DNAseI HS\ track encodeNhgriDnaseHs2\ nimhBipolarUs NIMH Bipolar Us chromGraph NIMH Bipolar disorder (US) -log10 P-value 0 1 0 0 0 127 127 127 0 0 0 phenDis 0 longLabel NIMH Bipolar disorder (US) -log10 P-value\ parent nimhBipolar\ priority 1\ shortLabel NIMH Bipolar Us\ track nimhBipolarUs\ numtS NumtS bed 6 . Human NumtS 0 1 0 60 120 127 157 187 1 0 0

Description and display conventions

\

\ NumtS (Nuclear mitochondrial sequences) are mitochondrial fragments inserted in nuclear genomic sequences. The most credited hypothesis concerning their generation suggests that in presence of mutagenic agents or under stress conditions fragments of mtDNA escape from mitochondria, reach the nucleus and insert into chromosomes during break repair, although NumtS can derive from duplication of genomic fragments. NumtS may be cause of contamination during human mtDNA sequencing and hence frequent false low heteroplasmic evidences have been reported.\ The Bioinformatics group chaired by M.Attimonelli (Bari, Italy) has produced the RHNumtS compilation annotating more than 500 Human NumtS. To allow the scientific community to access to the compilation and to perform genomics comparative analyses inclusive of the NumtS data, the group has designed the Human NumtS tracks below described.\

\ \

\ The NumtS tracks show the High Score Pairs (HSPs) obtained by aligning the mitochondrial reference genome (NC_012920) with the hg18 release of the human genome.\

\
    \
  1. "NumtS (Nuclear mitochondrial Sequences)" Track\

    \ The "NumtS mitochondrial sequences" track shows the mapping of the HSPs returned by BlastN on the nuclear genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the mitochondrial mapping is provided, thus allowing a fast cross among the NumtS genomic contexts.\

    \
  2. \ \
  3. "NumtS assembled" Track\

    \ The "NumtS assembled" track shows items obtained by assembling HSPs annotated in the "NumtS" track fulfilling the following conditions:\

    \

    \ Exceptions for the second condition arise when a long repetitive element is present between two HSPs.\

    \
  4. \ \
  5. "NumtS on mitochondrion" Track\

    \ The "NumtS on mitochondrion" track shows the mapping of the HSPs on the mitochondrial genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the nuclear mapping is provided.\

    \ \
  6. "NumtS on mitochondrion with chromosome placement" Track\

    \ The "NumtS on mitochondrion with chromosome placement" shows the mapping of the HSPs on the mitochondrial genome, but the items are coloured according to the colours assigned to each human chromosome on the UCSC genome browser. No shading is here provided. For every item, a link pointing to the nuclear mapping is provided.\

    \
\ \

Methods

\

\ NumtS mappings were obtained by running Blast2seq (program: BlastN) between each chromosome of of the Human Genome hg18 build and the human mitochondrial reference sequence (rCRS, AC: NC_012920), fixing the e-value threshold to 1e-03. The assembling of the HSPs was performed with spreadsheet interpolation and manual inspection.\

\ \

Verification

\

\ NumtS predicted in silico were validated by carrying out PCR amplification and sequencing on blood-extracted DNA of a healthy individual of European origin. PCR amplification was successful for 275 NumtS and provided amplicons of the expected length. All PCR fragments were sequenced on both strands, and submitted to the EMBL databank.\

\

\ Furthermore, 541 NumtS were validated by merging NumtS nuclear coordinates with HapMap annotations. Our analysis has been carried on eight HapMap individuals (NA18517, NA18507, NA18956, NA19240, NA18555, NA12878, NA19129, NA12156). For each sample, clones with a single best concordant placement (according to the fosmid end-sequence-pair analysis described in Kidd et al., 2008), have been considered. The analysis showed that 541 NumtS (at least 30bp for each one) had been sequenced in such samples.\

\ \

Credits

\

\ These data were provided by Domenico Simone and Marcella Attimonelli at Department of Biochemistry and Molecular Biology "Ernesto Quagliariello" (University of Bari, Italy). Primer designing was carried out by Francesco Calabrese and Giuseppe Mineccia. PCR validation was carried out by Martin Lang, Domenico Simone and Giuseppe Gasparre. Merging with HapMap annotations has been performed by Domenico Simone.\

\ \

References

\

\ Simone D, Calabrese FM, Lang M, Gasparre G, Attimonelli M: Validation and UCSC tracks of the extended RHNumtS compilation (submitted). \

\ \

\ Lascaro D, Castellana S, Gasparre G, Romeo G, Saccone S, Attimonelli M. The RHNumtS compilation: features and bioinformatics approaches to locate and quantify Human NumtS. BMC\ Genomics. 2008 June 3;9:267.\

\ \

\ Kidd JM, Cooper GM, Donahue WF, et al.\ \ Mapping and sequencing of structural variation from eight human genomes.\ Nature, 2008, 453(7191):56-64.\

\ \ \ \ varRep 1 color 0,60,120\ html numtSeq\ longLabel Human NumtS\ parent numtSeq\ priority 1\ shortLabel NumtS\ track numtS\ type bed 6 .\ useScore 1\ hapmapLdPhYri Phased YRI ld2 Linkage Disequilibrium for the Yoruba (YRI) from phased genotypes 0 1 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 longLabel Linkage Disequilibrium for the Yoruba (YRI) from phased genotypes\ parent hapmapLdPh\ priority 1\ shortLabel Phased YRI\ track hapmapLdPhYri\ encodeGencodeRaceFragsPrimer RACEfrags Primer genePred Gencode 5' RACE primer 0 1 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 longLabel Gencode 5' RACE primer\ parent encodeGencodeRaceFrags\ priority 1\ shortLabel RACEfrags Primer\ track encodeGencodeRaceFragsPrimer\ encodeRikenCagePlus Riken CAGE + bedGraph 4 Riken CAGE Plus Strand - Predicted Gene Start Sites 0 1 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 109,51,43\ longLabel Riken CAGE Plus Strand - Predicted Gene Start Sites\ parent encodeRikenCage\ priority 1\ shortLabel Riken CAGE +\ track encodeRikenCagePlus\ encodeRikenCageMappedTagsPositive Riken CAGE MT + bedGraph 4 Riken CAGE Mapped Tags overlap count, Plus strand - TEST TRACK ONLY 0 1 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 109,51,43\ longLabel Riken CAGE Mapped Tags overlap count, Plus strand - TEST TRACK ONLY\ parent encodeRikenCageMappedTagsScore\ priority 1\ shortLabel Riken CAGE MT +\ track encodeRikenCageMappedTagsPositive\ decodeSexAveraged Sex Average bigWig 0.0 108.804 deCODE recombination map, sex-average 2 1 109 51 43 182 153 149 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 color 109,51,43\ configurable on\ longLabel deCODE recombination map, sex-average\ parent avgView\ priority 1\ shortLabel Sex Average\ subGroups view=avg\ track decodeSexAveraged\ type bigWig 0.0 108.804\ cnpSharp Sharp CNPs bed 4 + Copy Number Polymorphisms from BAC Microarray Analysis (Sharp) 0 1 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows 160 regions detected as putative copy number \ polymorphisms by BAC microarray analysis in a population of 47 individuals, \ comprising 8 Chinese, 4 Japanese, 10 Czech, 2 Druze, 7 Biaka, 9 Mbuti, \ and 7 Amerindians.

\ \

Methods

\

\ Hybridizations were all performed in duplicate incorporating a dye-reversal \ using a custom array consisting of 2194 end-sequence or FISH confirmed BACs, \ targeted to regions of the genome flanked by segmental duplications. \ The false positive rate was estimated as ~3 clones per 4,000 tested.

\ \

References

\

\ Sharp, A.J., Locke D..P, McGrath S.D., Cheng Z., Bailey J.A., Samonte R.V., \ Pertz L.M., Clark R.A., Schwartz S., Segraves R., Oseroff V.V., Albertson D.G., \ Pinkel D. and Eichler E..E \ Segmental duplications and copy number variation in the human genome. \ Am J Hum Genet 77(1), 78-88 (2005).

\ varRep 1 longLabel Copy Number Polymorphisms from BAC Microarray Analysis (Sharp)\ noInherit on\ parent cnp\ priority 1\ shortLabel Sharp CNPs\ track cnpSharp\ type bed 4 +\ encodeSangerChipH3K4me1 SI H3K4m1 GM06990 bedGraph 4 Sanger Institute ChIP/Chip (H3K4me1 ab, GM06990 cells) 0 1 10 10 130 132 132 192 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 10,10,130\ longLabel Sanger Institute ChIP/Chip (H3K4me1 ab, GM06990 cells)\ parent encodeSangerChipH3H4\ priority 1\ shortLabel SI H3K4m1 GM06990\ track encodeSangerChipH3K4me1\ tajdSnpAd SNPs AD bed 4 . SNPs from African Descent 0 1 200 100 0 0 100 200 0 0 0 varRep 1 altColor 0,100,200\ color 200,100,0\ longLabel SNPs from African Descent\ parent tajdSnp\ priority 1\ shortLabel SNPs AD\ track tajdSnpAd\ stanfordChipGMO6990GABP Stan GMO6690 GABP bedGraph 4 Stanford ChIP-chip (GMO6990 cells, GABP ChIP) 0 1 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 0 longLabel Stanford ChIP-chip (GMO6990 cells, GABP ChIP)\ parent stanfordChip\ priority 1\ shortLabel Stan GMO6690 GABP\ track stanfordChipGMO6990GABP\ encodeStanfordChipGMO6990GABP Stan GMO6690 GABP bedGraph 4 Stanford ChIP-chip (GMO6990 cells, GABP ChIP) 0 1 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (GMO6990 cells, GABP ChIP)\ parent encodeStanfordChipJohnson\ priority 1\ shortLabel Stan GMO6690 GABP\ track encodeStanfordChipGMO6990GABP\ encodeStanfordChipHCT116Sp1 Stan HCT116 Sp1 bedGraph 4 Stanford ChIP-chip (HCT116 cells, Sp1 ChIP) 0 1 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (HCT116 cells, Sp1 ChIP)\ parent encodeStanfordChip\ priority 1\ shortLabel Stan HCT116 Sp1\ track encodeStanfordChipHCT116Sp1\ encodeStanfordMethBe2C Stan Meth Be2C bedGraph 4 Stanford Methylation Digest (Be2C cells) 0 1 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest (Be2C cells)\ parent encodeStanfordMeth\ priority 1\ shortLabel Stan Meth Be2C\ track encodeStanfordMethBe2C\ encodeStanfordMethSmoothedBe2C Stan Meth Sc Be2C bedGraph 4 Stanford Methylation Digest Smoothed Score (Be2C cells) 0 1 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest Smoothed Score (Be2C cells)\ parent encodeStanfordMethSmoothed\ priority 1\ shortLabel Stan Meth Sc Be2C\ track encodeStanfordMethSmoothedBe2C\ encodeStanfordPromotersAGS Stan Pro AGS bed 9 + Stanford Promoter Activity (AGS cells) 0 1 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (AGS cells)\ parent encodeStanfordPromoters\ priority 1\ shortLabel Stan Pro AGS\ track encodeStanfordPromotersAGS\ encodeStanfordChipSmoothedHCT116Sp1 Stan Sc HCT116 Sp1 bedGraph 4 Stanford ChIP-chip Smoothed Score (HCT116 cells, Sp1 ChIP) 0 1 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip Smoothed Score (HCT116 cells, Sp1 ChIP)\ parent encodeStanfordChipSmoothed\ priority 1\ shortLabel Stan Sc HCT116 Sp1\ track encodeStanfordChipSmoothedHCT116Sp1\ encodeStanfordNRSFEnriched Stanf NRSF Enriched bed 6 . Stanford NRSF/REST Enriched 0 1 0 128 0 127 191 127 0 0 23 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,chrY,chrM, encodeChip 1 color 0,128,0\ longLabel Stanford NRSF/REST Enriched\ parent encodeStanfordNRSF\ priority 1\ shortLabel Stanf NRSF Enriched\ track encodeStanfordNRSFEnriched\ encodeTbaPhastCons TBA PhastCons wig 0.0 1.0 TBA PhastCons Conservation 0 1 170 100 50 212 177 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 0 autoScale Off\ color 170,100,50\ longLabel TBA PhastCons Conservation\ maxHeightPixels 100:25:11\ noInherit on\ parent encodeTbaCons\ priority 1\ shortLabel TBA PhastCons\ track encodeTbaPhastCons\ type wig 0.0 1.0\ windowingFunction mean\ encodeTbaPhastConsEl TBA PhastCons bed 5 . TBA PhastCons Conserved Elements 0 1 170 100 50 212 177 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 170,100,50\ longLabel TBA PhastCons Conserved Elements\ parent encodeTbaElements\ priority 1\ shortLabel TBA PhastCons\ track encodeTbaPhastConsEl\ hiSeqDepthTopPt1Pct Top 0.001 Depth bed 3 Top 0.001 of Read Depth Distribution 0 1 139 69 19 197 162 137 0 0 0 map 1 longLabel Top 0.001 of Read Depth Distribution\ parent hiSeqDepth\ priority 1\ shortLabel Top 0.001 Depth\ track hiSeqDepthTopPt1Pct\ encodeAllUnionEl Union bed 5 . TBA and MLAGAN PhastCons/BinCons/GERP Union Conserved Elements 0 1 80 70 180 167 162 217 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 80,70,180\ longLabel TBA and MLAGAN PhastCons/BinCons/GERP Union Conserved Elements\ parent encodeAllElements\ priority 1\ shortLabel Union\ track encodeAllUnionEl\ encodeUtexChipHeLaMycRaw UT Myc HeLa bedGraph 4 University of Texas, Austin ChIP-chip (c-Myc, HeLa) 0 1 120 30 50 187 142 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 120,30,50\ longLabel University of Texas, Austin ChIP-chip (c-Myc, HeLa)\ parent encodeUtexChip\ priority 1\ shortLabel UT Myc HeLa\ subGroups dataType=raw\ track encodeUtexChipHeLaMycRaw\ encodeUvaDnaRep0 UVa DNA Rep 0h bed 3 . University of Virginia Temporal Profiling of DNA Replication (0-2 hrs) 0 1 60 75 60 10 130 10 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 longLabel University of Virginia Temporal Profiling of DNA Replication (0-2 hrs)\ parent encodeUvaDnaRep\ priority 1\ shortLabel UVa DNA Rep 0h\ track encodeUvaDnaRep0\ encodeUvaDnaRepEarly UVa DNA Rep Early bed 3 . University of Virginia Temporal Profiling of DNA Replication (Early) 0 1 50 50 100 152 152 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 color 50,50,100\ longLabel University of Virginia Temporal Profiling of DNA Replication (Early)\ parent encodeUvaDnaRepSeg\ priority 1\ shortLabel UVa DNA Rep Early\ track encodeUvaDnaRepEarly\ encodeUvaDnaRepOriginsNSGM UVa Ori-NS GM bed 3 . University of Virginia DNA Replication Origins, Ori-NS, GM06990 0 1 205 0 0 230 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 color 205,0,0\ dataVersion May 2007\ longLabel University of Virginia DNA Replication Origins, Ori-NS, GM06990\ origAssembly hg17\ parent encodeUvaDnaRepOrigins\ priority 1\ shortLabel UVa Ori-NS GM\ track encodeUvaDnaRepOriginsNSGM\ kiddEichlerValidAbc14 Validated ABC14 bed 9 HGSV Individual ABC14 (CEPH) Validated Sites of Structural Variation 0 1 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HGSV Individual ABC14 (CEPH) Validated Sites of Structural Variation\ parent kiddEichlerValid\ priority 1\ shortLabel Validated ABC14\ track kiddEichlerValidAbc14\ hgdpXpehhBantu XP-EHH Bantu bedGraph 4 Human Genome Diversity Project XP-EHH (Bantu populations in Africa) 0 1 224 0 0 239 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 224, 0, 0\ longLabel Human Genome Diversity Project XP-EHH (Bantu populations in Africa)\ parent hgdpXpehh\ priority 1\ shortLabel XP-EHH Bantu\ track hgdpXpehhBantu\ encodeYaleChIPSTAT1HeLaMaskLess36mer36bpPval Yale 36-36 PVal bedGraph 4 Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 36-mer, 36bp Win, P-Values 0 1 50 50 200 152 152 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,50,200\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 36-mer, 36bp Win, P-Values\ parent encodeYaleChIPSTAT1Pval\ priority 1\ shortLabel Yale 36-36 PVal\ track encodeYaleChIPSTAT1HeLaMaskLess36mer36bpPval\ encodeYaleChIPSTAT1HeLaMaskLess36mer36bpSig Yale 36-36 Sig bedGraph 4 Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 36-mer, 36bp Win, Signal 0 1 112 63 175 224 66 81 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 altColor 224,66,81\ color 112,63,175\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 36-mer, 36bp Win, Signal\ parent encodeYaleChIPSTAT1Sig\ priority 1\ shortLabel Yale 36-36 Sig\ track encodeYaleChIPSTAT1HeLaMaskLess36mer36bpSig\ encodeYaleChIPSTAT1HeLaMaskLess36mer36bpSite Yale 36-36 Sites bed . Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 36-mer, 36bp Win, Binding Sites 0 1 200 50 50 50 50 200 0 0 18 chr1,chr10,chr11,chr13,chr14,chr15,chr16,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 altColor 50,50,200\ color 200,50,50\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 36-mer, 36bp Win, Binding Sites\ parent encodeYaleChIPSTAT1Sites\ priority 1\ shortLabel Yale 36-36 Sites\ track encodeYaleChIPSTAT1HeLaMaskLess36mer36bpSite\ encodeYaleMASNB4RNANprotTMFWDMless36mer36bp Yale NB4 NgF RNA bedGraph 4 Yale NB4 RNA Trans Map, MAS Array, Forward Direction, NimbleGen Protocol 0 1 200 50 50 50 50 200 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22, encodeTxLevels 0 altColor 50,50,200\ color 200,50,50\ longLabel Yale NB4 RNA Trans Map, MAS Array, Forward Direction, NimbleGen Protocol\ parent encodeYaleMASPlacRNATransMap\ priority 1\ shortLabel Yale NB4 NgF RNA\ track encodeYaleMASNB4RNANprotTMFWDMless36mer36bp\ encodeYaleMASNB4RNANProtTarsFWDMless36mer36bp Yale NB4 NgF TAR bed 6 . Yale NB4 RNA TARs, MAS array, Forward Direction, NimbleGen Protocol 0 1 200 50 50 50 50 200 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22, encodeTxLevels 1 altColor 50,50,200\ color 200,50,50\ longLabel Yale NB4 RNA TARs, MAS array, Forward Direction, NimbleGen Protocol\ parent encodeYaleMASPlacRNATars\ priority 1\ shortLabel Yale NB4 NgF TAR\ track encodeYaleMASNB4RNANProtTarsFWDMless36mer36bp\ encodeYaleAffyNeutRNATransMapAll Yale RNA Neu Sum wig -2730 3394 Yale Neutrophil RNA Transcript Map, Summary 0 1 50 70 50 152 162 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,70,50\ longLabel Yale Neutrophil RNA Transcript Map, Summary\ parent encodeYaleAffyRNATransMap\ priority 1\ shortLabel Yale RNA Neu Sum\ subGroups celltype=neutro samples=summary\ track encodeYaleAffyNeutRNATransMapAll\ encodeYaleAffyNeutRNATarsAll Yale TAR Neu Sum bed 3 . Yale Neutrophil RNA Transcriptionally Active Region (TAR), Summary 0 1 50 70 50 152 162 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,70,50\ longLabel Yale Neutrophil RNA Transcriptionally Active Region (TAR), Summary\ parent encodeYaleAffyRNATars\ priority 1\ shortLabel Yale TAR Neu Sum\ subGroups celltype=neutro samples=summary\ track encodeYaleAffyNeutRNATarsAll\ hapmapLdHotspotYRI YRI bedGraph 4 Hotspots of Linkage Disequilibrium in the Yoruban HapMap (YRI) 0 1 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 longLabel Hotspots of Linkage Disequilibrium in the Yoruban HapMap (YRI)\ parent hapmapLdHotspot\ priority 1\ shortLabel YRI\ track hapmapLdHotspotYRI\ netRBestCalJac1 Marmoset RBest Net netAlign calJac1 chainCalJac1 Marmoset (June 2007 (WUGSC 2.0.2/calJac1)) Reciprocal Best Alignment Net 0 2 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Reciprocal Best Alignment Net\ otherDb calJac1\ parent rBestNet\ priority 2\ shortLabel $o_Organism RBest Net\ spectrum on\ track netRBestCalJac1\ type netAlign calJac1 chainCalJac1\ visibility hide\ netSyntenyPonAbe2 Orangutan Syn Net netAlign ponAbe2 chainPonAbe2 Orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)) Syntenic Alignment Net 0 2 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb ponAbe2\ parent syntenicNet\ priority 2\ shortLabel $o_Organism Syn Net\ spectrum on\ track netSyntenyPonAbe2\ type netAlign ponAbe2 chainPonAbe2\ visibility hide\ encodeEgaspPartAceOther ACEScan Other genePred ACEScan Unconserved Alternative and Constitutive Exon Predictions 0 2 66 12 133 160 133 194 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 66,12,133\ longLabel ACEScan Unconserved Alternative and Constitutive Exon Predictions\ parent encodeEgaspPartial\ priority 2\ shortLabel ACEScan Other\ track encodeEgaspPartAceOther\ encodeAffyChIpHl60SitesBrg1Hr00 Affy Brg1 RA 0h bed 3 . Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 0hrs) Sites 0 2 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 225,0,0\ longLabel Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 0hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 2\ shortLabel Affy Brg1 RA 0h\ subGroups factor=Brg1 time=0h\ track encodeAffyChIpHl60SitesBrg1Hr00\ affyAllExonProbes Affy Exon Probes bed 6 . Affymetrix Exon Array 1.0: Probesets 3 2 0 0 0 127 127 127 1 0 0

Description

\

\ The Exon GeneChip contains over one million probe \ sets\ designed to interrogate individual exons rather than the 3' ends of transcripts\ as in traditional GeneChips. Exons were derived from a variety of\ annotations that have been divided into the classes Core, Extended\ and Full. \

\ \

\ Probe sets are colored by class with the Core probe sets being\ the darkest and the Full being the lightest color. Additionally, probe\ sets that do not overlap the exons of a transcript cluster, but fall\ inside of its introns, are considered bounded by that transcript\ cluster and are colored slightly lighter. Probe sets that overlap the\ coding portion of the Core class are colored slightly darker.

\

\ The microarray track using this probe set can be displayed by turning\ on the Affy Exon Tissue track.

\ \

Credits and References

\

\ The exons interrogated by the probe sets displayed in this track are\ from the Affymetrix Exon 1.0 GeneChip and were derived from a\ number of sources. In addition to the millions of cDNA sequences\ contributed to the \ GenBank, \ dbEst and \ RefSeq \ databases by\ individual labs and scientists, the following annotations were used:\

\ Ensembl: \ Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J,\ Curwen V, Down T et al..\ The Ensembl genome database project.\ Nucleic Acids Research. 2002 Jan 1;30(1):38-41.

\

\ Exoniphy: Siepel, A., Haussler, D. \ Computational identification of evolutionarily conserved \ exons.\ Proc. 8th Int'l Conf. on Research in Computational Molecular Biology, \ 177-186 (2004).

\

\ Geneid Genes:\ Parra, G., Blanco, E., Guigo, R. \ Geneid in Drosophila.\ Genome Res. 10(4), 511-515 (2000).

\

\ Genscan Genes:\ Burge, C., Karlin, S. \ Prediction of Complete Gene Structures in Human Genomic DNA.\ J. Mol. Biol. 268(1), 78-94 (1997).

\

\ microRNA:\ Griffiths-Jones, S. \ The microRNA Registry. \ Nucl. Acids Res. 32, D109-D111 (2004).

\

\ MITOMAP:\ Brandon, M. C., Lott, M. T., Nguyen, K. C., Spolim, S., Navathe, S. B., \ Baldi, P. & Wallace, D. C.\ MITOMAP: a human mitochondrial genome database--2004 update\ Nucl. Acids Res. 33(Database Issue):D611-613 (2005).

\

\ RNA Genes:\ Lowe, T. M., Eddy, S. R. \ tRNAscan-SE: A Program for Improved Detection of Transfer RNA \ Genes in Genomic Sequence.\ Nucleic Acids Res., 25(5), 955-964 (1997).

\

\ SGP Genes: \ Wiehe, T., Gebauer-Jung, S., Mitchell-Olds, T., Guigo, R. \ SGP-1: prediction and validation of homologous genes based on \ sequence alignments.\ Genome Res., 11(9), 1574-83 (2001).

\

\ Twinscan Genes:\ Korf, I., Flicek, P., Duan, D., Brent, M.R. \ Integrating genomic homology into gene structure prediction.\ Bioinformatics 17, S140-148 (2001).\

\ Vega Genes \ and Pseudogenes: The HAVANA group, \ Wellcome Trust Sanger \ Institute.

\ expression 1 exonArrows on\ group expression\ longLabel Affymetrix Exon Array 1.0: Probesets\ priority 2.0\ shortLabel Affy Exon Probes\ spectrum on\ superTrack affyAllExonSuper dense\ track affyAllExonProbes\ type bed 6 .\ useScore 1\ visibility pack\ encodeAffyChIpHl60SignalStrictH3K9K14DHr02 Affy H3K9ac2 2h wig -2.78 3.97 Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 2hrs) Strict Signal 0 2 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 225,0,0\ longLabel Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 2hrs) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 2\ shortLabel Affy H3K9ac2 2h\ subGroups factor=H3K9K14ac2 time=2h\ track encodeAffyChIpHl60SignalStrictH3K9K14DHr02\ encodeAffyChIpHl60SitesStrictH3K9K14DHr02 Affy H3K9ac2 2h bed 3 . Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 2hrs) Strict Sites 0 2 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 225,0,0\ longLabel Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 2hrs) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 2\ shortLabel Affy H3K9ac2 2h\ subGroups factor=H3K9K14ac2 time=2h\ track encodeAffyChIpHl60SitesStrictH3K9K14DHr02\ encodeAffyChIpHl60PvalStrictH3K9K14DHr02 Affy H3K9ac2 2h wig 0 696.62 Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 2hrs) Strict P-Value 0 2 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 225,0,0\ longLabel Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 2hrs) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 2\ shortLabel Affy H3K9ac2 2h\ subGroups factor=H3K9K14ac2 time=2h\ track encodeAffyChIpHl60PvalStrictH3K9K14DHr02\ encodeAffyRnaHeLaSitesIntronsProximal Affy In Prx HeLa bed 4 . Affy Intronic Proximal HeLa Transfrags 0 2 236 0 20 245 127 137 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 236,0,20\ longLabel Affy Intronic Proximal HeLa Transfrags\ parent encodeNoncodingTransFrags\ priority 2\ shortLabel Affy In Prx HeLa\ subGroups region=intronicProximal celltype=hela source=affy\ track encodeAffyRnaHeLaSitesIntronsProximal\ encodeTransFragsAffyIntronicDistal Affy Intron Dist bed 4 Affy Intronic Distal 0 2 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Affy Intronic Distal\ parent encodeTransFrags\ priority 2\ shortLabel Affy Intron Dist\ track encodeTransFragsAffyIntronicDistal\ encodeAffyRnaHeLaSignal Affy RNA HeLa wig -1168.00 1686.5 Affymetrix PolyA+ RNA (HeLaS3) Signal 0 2 220 132 12 237 193 133 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 220,132,12\ longLabel Affymetrix PolyA+ RNA (HeLaS3) Signal\ parent encodeAffyRnaSignal\ priority 2\ shortLabel Affy RNA HeLa\ track encodeAffyRnaHeLaSignal\ encodeAffyRnaHeLaSites Affy RNA HeLa bed 3 . Affymetrix PolyA+ RNA (HeLaS3) Sites 0 2 220 132 12 237 193 133 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 220,132,12\ longLabel Affymetrix PolyA+ RNA (HeLaS3) Sites\ parent encodeAffyRnaTransfrags\ priority 2\ shortLabel Affy RNA HeLa\ track encodeAffyRnaHeLaSites\ snpArrayAffy6SV Affy SNP 6.0 SV bed 6 + Affymetrix SNP 6.0 Structural Variation 0 2 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Affymetrix SNP 6.0 Structural Variation\ parent snpArray\ priority 2\ shortLabel Affy SNP 6.0 SV\ track snpArrayAffy6SV\ type bed 6 +\ encodeHapMapAlleleFreqCHB Allele Freq CHB bed 6 + HapMap Minor Allele Frequencies Chinese (CHB) 0 2 0 0 0 127 127 127 1 0 7 chr2,chr4,chr7,chr8,chr9,chr12,chr18, encodeVariation 1 longLabel HapMap Minor Allele Frequencies Chinese (CHB)\ parent encodeHapMapAlleleFreq\ priority 2\ shortLabel Allele Freq CHB\ track encodeHapMapAlleleFreqCHB\ encodeRegulomeAmpliconEven Amplicon (Even) bed 5 . Amplicon (Even) 0 2 0 0 0 127 127 127 1 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 1 longLabel Amplicon (Even)\ parent encodeRegulomeAmplicon\ priority 2\ shortLabel Amplicon (Even)\ track encodeRegulomeAmpliconEven\ encodeEgaspUpdAugustusEst Augustus/EST Upd genePred Augustus + EST/Protein Evidence Gene Predictions 0 2 12 65 165 133 160 210 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,65,165\ longLabel Augustus + EST/Protein Evidence Gene Predictions\ parent encodeEgaspUpdate\ priority 2\ shortLabel Augustus/EST Upd\ track encodeEgaspUpdAugustusEst\ encodeBuFirstExonColon BU Colon bed 12 + Boston University First Exon Activity in Colon 0 2 0 0 0 127 127 127 0 0 10 chr11,chr13,chr15,chr16,chr19,chr2,chr5,chr7,chr9,chrX, encodeTxLevels 1 longLabel Boston University First Exon Activity in Colon\ parent encodeBuFirstExon\ priority 2\ shortLabel BU Colon\ track encodeBuFirstExonColon\ cccTrendPvalCad CCC Coronary Art chromGraph Case Control Consortium coronary artery disease trend -log10 P-value 0 2 0 0 0 127 127 127 0 0 0 phenDis 0 longLabel Case Control Consortium coronary artery disease trend -log10 P-value\ parent caseControl\ priority 2\ shortLabel CCC Coronary Art\ track cccTrendPvalCad\ hapmapLdHotspotCEU CEU bedGraph 4 Hotspots of Linkage Disequilibrium in the CEPH HapMap (CEU) 0 2 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 longLabel Hotspots of Linkage Disequilibrium in the CEPH HapMap (CEU)\ parent hapmapLdHotspot\ priority 2\ shortLabel CEU\ track hapmapLdHotspotCEU\ fox2ClipSeqDensityForwardStrand Density Forward wig 0 2401 FOX2 adaptor-trimmed CLIP-seq Density Forward Strand 2 2 0 0 0 127 127 127 0 0 0 regulation 0 configurable on\ graphTypeDefault Bar\ longLabel FOX2 adaptor-trimmed CLIP-seq Density Forward Strand\ maxHeightPixels 128:36:16\ noInherit on\ parent fox2ClipSeqCompViewdensity\ priority 2\ shortLabel Density Forward\ spanList 1\ subGroups view=density\ track fox2ClipSeqDensityForwardStrand\ type wig 0 2401\ windowingFunction mean\ kiddEichlerDiscAbc13 Discordant ABC13 bed 12 HGSV Individual ABC13 (Yoruba) Discordant Clone End Alignments 0 2 0 0 0 127 127 127 0 0 0 http://mrhgsv.gs.washington.edu/cgi-bin/hgc?i=$$&c=$S&l=$[&r=$]&db=$D&position=$S:$[-$] varRep 1 longLabel HGSV Individual ABC13 (Yoruba) Discordant Clone End Alignments\ parent kiddEichlerDisc\ priority 2\ shortLabel Discordant ABC13\ track kiddEichlerDiscAbc13\ encodeDNDSmedium dN/dS 0.2 to 0.5 bed 4 + ENCODE Exons dN/dS 0.2 to 0.5 0 2 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel ENCODE Exons dN/dS 0.2 to 0.5\ parent encodeDNDS\ priority 2\ shortLabel dN/dS 0.2 to 0.5\ track encodeDNDSmedium\ encodeNhgriDnaseHsAct DNase CD4 Activ. bed 5 . NHGRI DNaseI-Hypersensitive Sites (CD4+ T-Cells Activated) 0 2 0 0 0 127 127 127 1 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 longLabel NHGRI DNaseI-Hypersensitive Sites (CD4+ T-Cells Activated)\ parent encodeNhgriDnaseHs\ priority 2\ shortLabel DNase CD4 Activ.\ track encodeNhgriDnaseHsAct\ encodeEgaspFullDogfish DOGFISH-C genePred DOGFISH-C Gene Predictions 0 2 12 20 150 133 137 202 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,20,150\ longLabel DOGFISH-C Gene Predictions\ parent encodeEgaspFull\ priority 2\ shortLabel DOGFISH-C\ track encodeEgaspFullDogfish\ encodeAffyEc51BrainCerebellumSignal EC51 Sgnl BrainC wig 0 62385 Affy Ext Trans Signal (51-base window) (Brain Cerebellum) 0 2 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 0 color 248,0,8\ longLabel Affy Ext Trans Signal (51-base window) (Brain Cerebellum)\ parent encodeAffyEcSignal\ priority 2\ shortLabel EC51 Sgnl BrainC\ track encodeAffyEc51BrainCerebellumSignal\ encodeAffyEc51BrainCerebellumSites EC51 Sites BrainC bed 3 . Affy Ext Trans Sites (51-base window) (Brain Cerebellum) 0 2 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 1 color 248,0,8\ longLabel Affy Ext Trans Sites (51-base window) (Brain Cerebellum)\ parent encodeAffyEcSites\ priority 2\ shortLabel EC51 Sites BrainC\ track encodeAffyEc51BrainCerebellumSites\ eioJcviNASNeg EIO/JCVI CD34- NAS bed 3 . CD34- cells Nuclease Accessible sites 0 2 100 30 250 177 142 252 0 0 0 regulation 1 color 100,30,250\ longLabel CD34- cells Nuclease Accessible sites\ parent eioJcviNAS\ priority 2\ shortLabel EIO/JCVI CD34- NAS\ track eioJcviNASNeg\ encodeUncFairePeaks FAIRE PeakFinder bedGraph 4 University of North Carolina FAIRE Peaks (PeakFinder) 0 2 20 150 20 50 100 50 0 0 21 chr1,chr4,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chr10,chrX, encodeChrom 0 autoScale off\ longLabel University of North Carolina FAIRE Peaks (PeakFinder)\ maxHeightPixels 128:24:16\ noInherit on\ parent encodeUncFaire\ priority 2\ shortLabel FAIRE PeakFinder\ spanList 38\ track encodeUncFairePeaks\ type bedGraph 4\ viewLimits 0.4:3.7\ windowingFunction mean\ encodeGencodeGenePutativeMar07 Gencode Putative genePred Gencode Putative Genes 0 2 84 188 0 169 221 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 84,188,0\ longLabel Gencode Putative Genes\ parent encodeGencodeGeneMar07\ priority 2\ shortLabel Gencode Putative\ track encodeGencodeGenePutativeMar07\ encodeGencodeGenePutativeOct05 Gencode Putative genePred Gencode Putative Genes 0 2 84 188 0 169 221 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 84,188,0\ longLabel Gencode Putative Genes\ parent encodeGencodeGeneOct05\ priority 2\ shortLabel Gencode Putative\ track encodeGencodeGenePutativeOct05\ encodeGisRnaPetHCT116 GIS RNA HCT116 bed 12 Gene Identification Signature Paired-End Tags of PolyA+ RNA (5FU-stim HCT116) 0 2 58 119 40 156 187 147 0 0 23 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 58,119,40\ longLabel Gene Identification Signature Paired-End Tags of PolyA+ RNA (5FU-stim HCT116)\ parent encodeGisRnaPet\ priority 2\ shortLabel GIS RNA HCT116\ track encodeGisRnaPetHCT116\ encodeRegulomeQualityGM06990 GM06990 bed 5 . GM06990 Quality 0 2 90 50 210 172 152 232 1 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 1 color 90,50,210\ longLabel GM06990 Quality\ parent encodeRegulomeQuality\ priority 2\ shortLabel GM06990\ track encodeRegulomeQualityGM06990\ encodeRegulomeProbGM06990 GM06990 bedGraph 4 GM06990 DNaseI HSs 0 2 90 50 210 172 152 232 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 90,50,210\ longLabel GM06990 DNaseI HSs\ parent encodeRegulomeProb\ priority 2\ shortLabel GM06990\ track encodeRegulomeProbGM06990\ encodeRegulomeBaseGM06990 GM06990 wig 0.0 3.0 GM06990 DNaseI Sensitivity 0 2 90 50 210 172 152 232 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 90,50,210\ longLabel GM06990 DNaseI Sensitivity\ parent encodeRegulomeBase\ priority 2\ shortLabel GM06990\ track encodeRegulomeBaseGM06990\ encodeHapMapCovCHB HapMap Cov CHB wig 0.0 100.0 HapMap Resequencing Coverage Chinese (CHB) 0 2 0 0 0 127 127 127 0 0 7 chr2,chr4,chr7,chr8,chr9,chr12,chr18, encodeVariation 0 longLabel HapMap Resequencing Coverage Chinese (CHB)\ parent encodeHapMapCov\ priority 2\ shortLabel HapMap Cov CHB\ track encodeHapMapCovCHB\ snpRecombRateHapmapPhase2 HapMap Phase II bedGraph 4 Oxford Recombination Rates from HapMap Phase II Release 21 0 2 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 longLabel Oxford Recombination Rates from HapMap Phase II Release 21\ parent snpRecombRate\ priority 2\ shortLabel HapMap Phase II\ track snpRecombRateHapmapPhase2\ hapmapSnpsCEU HapMap SNPs CEU bed 6 + HapMap SNPs from the CEU Population (Northern and Western European Ancestry in Utah, US - CEPH) 0 2 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HapMap SNPs from the CEU Population (Northern and Western European Ancestry in Utah, US - CEPH)\ parent hapmapSnps\ priority 2\ shortLabel HapMap SNPs CEU\ track hapmapSnpsCEU\ encodePseudogeneHavana Havana-Gencode Pseudogenes genePred Havana-Gencode Annotated Pseudogenes and Immunoglobulin Segments 0 2 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, http://vega.sanger.ac.uk/Homo_sapiens/geneview?transcript=$$ encodeGenes 1 longLabel Havana-Gencode Annotated Pseudogenes and Immunoglobulin Segments\ parent encodePseudogene\ priority 2\ shortLabel Havana-Gencode Pseudogenes\ track encodePseudogeneHavana\ url http://vega.sanger.ac.uk/Homo_sapiens/geneview?transcript=$$\ urlLabel Vega Genes Link:\ hgdpHzyBantu Hetzgty Bantu bedGraph 4 Human Genome Diversity Proj Smoothd Expec Heterozygosity (Bantu pops. in Africa) 0 2 224 0 0 239 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 224, 0, 0\ longLabel Human Genome Diversity Proj Smoothd Expec Heterozygosity (Bantu pops. in Africa)\ parent hgdpHzy\ priority 2\ shortLabel Hetzgty Bantu\ track hgdpHzyBantu\ cnpIafrate Iafrate CNPs bed 4 + Copy Number Polymorphisms from BAC Microarray Analysis (Iafrate) 0 2 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows 255 regions detected as putative copy number polymorphisms by BAC microarray analysis \ in a population of 55 individuals, 16 of which had previously characterized chromosome abnormalities.\ \

Methods

\

\ Hybridizations were all performed in duplicate incorporating a dye-reversal using proprietary 1 Mb \ GenomeChip V1.2 Human BAC Arrays consisting of 2,632 BAC clones (Spectral Genomics, Houston, TX). \ The false positive rate was estimated as ~1 clone per 5,264 tested. \

\ Further information is available at \ http://projects.tcag.ca/variation.\ \

References

\

\ Iafrate JA, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C (2004) \ Detection of large-scale variation in the human genome. Nature Genet 36:949-951\ varRep 1 longLabel Copy Number Polymorphisms from BAC Microarray Analysis (Iafrate)\ noInherit on\ parent cnp\ priority 2\ shortLabel Iafrate CNPs\ track cnpIafrate\ type bed 4 +\ hgdpIhsMideast iHS Mideast bedGraph 4 Human Genome Diversity Project iHS (Mideast) 0 2 0 0 200 127 127 227 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 color 0,0,200\ longLabel Human Genome Diversity Project iHS (Mideast)\ parent hgdpIhs\ priority 2\ shortLabel iHS Mideast\ track hgdpIhsMideast\ encodeGencodeIntronsDistal Intronic Dist bed 4 . Gencode Intronic Distal Regions 0 2 0 0 0 127 127 127 0 0 0 encodeAnalysis 1 longLabel Gencode Intronic Distal Regions\ parent encodeGencodeRegions\ priority 2\ shortLabel Intronic Dist\ track encodeGencodeIntronsDistal\ encodeAllIntronsProximal Intronic Prox bed 4 Consensus Intronic Proximal 0 2 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Consensus Intronic Proximal\ parent encodeWorkshopSelections\ priority 2\ shortLabel Intronic Prox\ track encodeAllIntronsProximal\ iscaRetrospectiveLikelyBenign ISCA Ret Lik.Ben gvf Internat. Stds. for Cytogen. Arrays Consort. (ISCA) - Retrospective (Likely Benign) 0 2 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Internat. Stds. for Cytogen. Arrays Consort. (ISCA) - Retrospective (Likely Benign)\ parent iscaRetrospectiveComposite\ priority 2\ shortLabel ISCA Ret Lik.Ben\ track iscaRetrospectiveLikelyBenign\ L2_LINE L2_LINE bed 5 L2 LINEs for Intersection 0 2 0 0 0 127 127 127 0 0 0 encodeAnalysis 1 longLabel L2 LINEs for Intersection\ parent encodeWorkshopIntersections\ priority 2\ shortLabel L2_LINE\ track L2_LINE\ hapmapLdCeu LD CEU bed 4 + Linkage Disequilibrium for the CEPH (CEU) 0 2 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 longLabel Linkage Disequilibrium for the CEPH (CEU)\ parent hapmapLd\ priority 2\ shortLabel LD CEU\ track hapmapLdCeu\ encodeUcsdChipHeLaH3H4dmH3K4_p30 LI H3K4me2 +gIF bedGraph 4 Ludwig Institute ChIP-chip: H3K4me2 ab, HeLa cells, 30 min. after gamma interferon 0 2 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: H3K4me2 ab, HeLa cells, 30 min. after gamma interferon\ parent encodeLIChIPgIF\ priority 2\ shortLabel LI H3K4me2 +gIF\ track encodeUcsdChipHeLaH3H4dmH3K4_p30\ encodeUcsdNgHeLaH3K4me3_p30 LI Ng H3K4m3 +gIF bedGraph 4 Ludwig Institute/UCSD ChIP/Chip Ng: HeLa, H3K4me3, 30 min after gamma interferon 0 2 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Ludwig Institute/UCSD ChIP/Chip Ng: HeLa, H3K4me3, 30 min after gamma interferon\ parent encodeUcsdNgGif\ priority 2\ shortLabel LI Ng H3K4m3 +gIF\ track encodeUcsdNgHeLaH3K4me3_p30\ encodeUcsdChipRnapThp1_f LI Pol2 THP1 bedGraph 4 Ludwig Institute ChIP-chip: Pol2 8WG16 ab, THP1 cells 0 2 0 63 135 127 159 195 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 0,63,135\ longLabel Ludwig Institute ChIP-chip: Pol2 8WG16 ab, THP1 cells\ parent encodeLIChIP\ priority 2\ shortLabel LI Pol2 THP1\ track encodeUcsdChipRnapThp1_f\ jaxQtlPadded MGI Mouse QTL Padded bed 4 . MGI Mouse QTL Peak-Score Markers Padded to 100k and Coarsely Mapped to Human 0 2 200 100 0 227 177 127 0 0 0 http://www.informatics.jax.org/searches/accession_report.cgi?id=$$ phenDis 1 color 200,100,0\ longLabel MGI Mouse QTL Peak-Score Markers Padded to 100k and Coarsely Mapped to $Organism\ parent jaxQtlMapped\ priority 2\ shortLabel MGI Mouse QTL Padded\ track jaxQtlPadded\ snp131Misc Misc SNPs (131) bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 131) not from HapMap and/or 1000Genomes 1 2 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$ varRep 1 longLabel Simple Nucleotide Polymorphisms (dbSNP build 131) not from HapMap and/or 1000Genomes\ parent snp131Composite\ priority 2\ shortLabel Misc SNPs (131)\ subGroups view=misc\ track snp131Misc\ visibility dense\ encodeMlaganBinConsEl MLAGAN BinCons bed 5 . MLAGAN BinCons Conserved Elements 0 2 170 5 10 212 130 132 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 170,5,10\ longLabel MLAGAN BinCons Conserved Elements\ parent encodeMlaganElements\ priority 2\ shortLabel MLAGAN BinCons\ track encodeMlaganBinConsEl\ encodeMlaganBinCons MLAGAN BinCons wig 0.0 1.0 MLAGAN BinCons Conservation 0 2 190 70 80 222 162 167 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 0 color 190,70,80\ longLabel MLAGAN BinCons Conservation\ parent encodeMlaganCons\ priority 2\ shortLabel MLAGAN BinCons\ track encodeMlaganBinCons\ encodeAllNcUnionEl NC Union bed 5 . TBA and MLAGAN PhastCons/BinCons/GERP Union NonCoding Conserved Elements 0 2 80 105 145 167 180 200 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 80,105,145\ longLabel TBA and MLAGAN PhastCons/BinCons/GERP Union NonCoding Conserved Elements\ parent encodeAllElements\ priority 2\ shortLabel NC Union\ track encodeAllNcUnionEl\ nimhBipolarDe NIMH Bipolar De chromGraph NIMH Bipolar disorder (German) -log10 P-value 0 2 0 0 0 127 127 127 0 0 0 phenDis 0 longLabel NIMH Bipolar disorder (German) -log10 P-value\ parent nimhBipolar\ priority 2\ shortLabel NIMH Bipolar De\ track nimhBipolarDe\ numtSAssembled NumtS assembled bed 12 . Human NumtS assembled 0 2 0 60 120 127 157 187 1 0 0

Description and display conventions

\

\ NumtS (Nuclear mitochondrial sequences) are mitochondrial fragments inserted in nuclear genomic sequences. The most credited hypothesis concerning their generation suggests that in presence of mutagenic agents or under stress conditions fragments of mtDNA escape from mitochondria, reach the nucleus and insert into chromosomes during break repair, although NumtS can derive from duplication of genomic fragments. NumtS may be cause of contamination during human mtDNA sequencing and hence frequent false low heteroplasmic evidences have been reported.\ The Bioinformatics group chaired by M.Attimonelli (Bari, Italy) has produced the RHNumtS compilation annotating more than 500 Human NumtS. To allow the scientific community to access to the compilation and to perform genomics comparative analyses inclusive of the NumtS data, the group has designed the Human NumtS tracks below described.\

\ \

\ The NumtS tracks show the High Score Pairs (HSPs) obtained by aligning the mitochondrial reference genome (NC_012920) with the hg18 release of the human genome.\

\
    \
  1. "NumtS (Nuclear mitochondrial Sequences)" Track\

    \ The "NumtS mitochondrial sequences" track shows the mapping of the HSPs returned by BlastN on the nuclear genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the mitochondrial mapping is provided, thus allowing a fast cross among the NumtS genomic contexts.\

    \
  2. \ \
  3. "NumtS assembled" Track\

    \ The "NumtS assembled" track shows items obtained by assembling HSPs annotated in the "NumtS" track fulfilling the following conditions:\

    \

    \ Exceptions for the second condition arise when a long repetitive element is present between two HSPs.\

    \
  4. \ \
  5. "NumtS on mitochondrion" Track\

    \ The "NumtS on mitochondrion" track shows the mapping of the HSPs on the mitochondrial genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the nuclear mapping is provided.\

    \ \
  6. "NumtS on mitochondrion with chromosome placement" Track\

    \ The "NumtS on mitochondrion with chromosome placement" shows the mapping of the HSPs on the mitochondrial genome, but the items are coloured according to the colours assigned to each human chromosome on the UCSC genome browser. No shading is here provided. For every item, a link pointing to the nuclear mapping is provided.\

    \
\ \

Methods

\

\ NumtS mappings were obtained by running Blast2seq (program: BlastN) between each chromosome of of the Human Genome hg18 build and the human mitochondrial reference sequence (rCRS, AC: NC_012920), fixing the e-value threshold to 1e-03. The assembling of the HSPs was performed with spreadsheet interpolation and manual inspection.\

\ \

Verification

\

\ NumtS predicted in silico were validated by carrying out PCR amplification and sequencing on blood-extracted DNA of a healthy individual of European origin. PCR amplification was successful for 275 NumtS and provided amplicons of the expected length. All PCR fragments were sequenced on both strands, and submitted to the EMBL databank.\

\

\ Furthermore, 541 NumtS were validated by merging NumtS nuclear coordinates with HapMap annotations. Our analysis has been carried on eight HapMap individuals (NA18517, NA18507, NA18956, NA19240, NA18555, NA12878, NA19129, NA12156). For each sample, clones with a single best concordant placement (according to the fosmid end-sequence-pair analysis described in Kidd et al., 2008), have been considered. The analysis showed that 541 NumtS (at least 30bp for each one) had been sequenced in such samples.\

\ \

Credits

\

\ These data were provided by Domenico Simone and Marcella Attimonelli at Department of Biochemistry and Molecular Biology "Ernesto Quagliariello" (University of Bari, Italy). Primer designing was carried out by Francesco Calabrese and Giuseppe Mineccia. PCR validation was carried out by Martin Lang, Domenico Simone and Giuseppe Gasparre. Merging with HapMap annotations has been performed by Domenico Simone.\

\ \

References

\

\ Simone D, Calabrese FM, Lang M, Gasparre G, Attimonelli M: Validation and UCSC tracks of the extended RHNumtS compilation (submitted). \

\ \

\ Lascaro D, Castellana S, Gasparre G, Romeo G, Saccone S, Attimonelli M. The RHNumtS compilation: features and bioinformatics approaches to locate and quantify Human NumtS. BMC\ Genomics. 2008 June 3;9:267.\

\ \

\ Kidd JM, Cooper GM, Donahue WF, et al.\ \ Mapping and sequencing of structural variation from eight human genomes.\ Nature, 2008, 453(7191):56-64.\

\ \ \ \ varRep 1 color 0,60,120\ html numtSeq\ longLabel Human NumtS assembled\ parent numtSeq\ priority 2\ shortLabel NumtS assembled\ track numtSAssembled\ type bed 12 .\ useScore 1\ snpRecombHotspotPerlegen Perlegen bed 3 . Oxford Recombination Hotspots from Perlegen Data 0 2 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Oxford Recombination Hotspots from Perlegen Data\ parent snpRecombHotspot\ priority 2\ shortLabel Perlegen\ track snpRecombHotspotPerlegen\ hapmapLdPhCeu Phased CEU ld2 Linkage Disequilibrium for the CEPH (CEU) from phased genotypes 0 2 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 longLabel Linkage Disequilibrium for the CEPH (CEU) from phased genotypes\ parent hapmapLdPh\ priority 2\ shortLabel Phased CEU\ track hapmapLdPhCeu\ encodeGencodeRaceFragsBrain RACEfrags Brain genePred Gencode RACEfrags from Brain 0 2 248 0 8 251 127 131 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 248,0,8\ longLabel Gencode RACEfrags from Brain\ parent encodeGencodeRaceFrags\ priority 2\ shortLabel RACEfrags Brain\ track encodeGencodeRaceFragsBrain\ encodeRegulomeDnaseHs Regulome DNAseI HS bed 4 Regulome DNAseI HS 0 2 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Regulome DNAseI HS\ parent encodeDnase\ priority 2\ shortLabel Regulome DNAseI HS\ track encodeRegulomeDnaseHs\ encodeRikenCageMinus Riken CAGE - bedGraph 4 Riken CAGE Minus Strand - Predicted Gene Start Sites 0 2 43 51 109 149 153 182 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 43,51,109\ longLabel Riken CAGE Minus Strand - Predicted Gene Start Sites\ parent encodeRikenCage\ priority 2\ shortLabel Riken CAGE -\ track encodeRikenCageMinus\ encodeRikenCageMappedTagsNegative Riken CAGE MT - bedGraph 4 Riken CAGE Mapped Tags overlap count, Minus strand - TEST TRACK ONLY 0 2 43 51 109 149 153 182 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 43,51,109\ longLabel Riken CAGE Mapped Tags overlap count, Minus strand - TEST TRACK ONLY\ parent encodeRikenCageMappedTagsScore\ priority 2\ shortLabel Riken CAGE MT -\ track encodeRikenCageMappedTagsNegative\ decodeSexAveragedCarrier Sex Average Carrier bigWig 0.0 76.046 deCODE recombination map, sex-average carrier 2 2 209 45 51 232 150 153 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 color 209,45,51\ configurable on\ longLabel deCODE recombination map, sex-average carrier\ parent avgView off\ priority 2\ shortLabel Sex Average Carrier\ subGroups view=avg\ track decodeSexAveragedCarrier\ type bigWig 0.0 76.046\ encodeSangerChipH3K4me2 SI H3K4m2 GM06990 bedGraph 4 Sanger Institute ChIP/Chip (H3K4me2 ab, GM06990 cells) 0 2 10 10 130 132 132 192 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 10,10,130\ longLabel Sanger Institute ChIP/Chip (H3K4me2 ab, GM06990 cells)\ parent encodeSangerChipH3H4\ priority 2\ shortLabel SI H3K4m2 GM06990\ track encodeSangerChipH3K4me2\ stanfordChipGMO6990SRF Stan GMO6690 SRF bedGraph 4 Stanford ChIP-chip (GMO6990 cells, SRF ChIP) 0 2 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 0 longLabel Stanford ChIP-chip (GMO6990 cells, SRF ChIP)\ parent stanfordChip\ priority 2\ shortLabel Stan GMO6690 SRF\ track stanfordChipGMO6990SRF\ encodeStanfordChipGMO6990SRF Stan GMO6690 SRF bedGraph 4 Stanford ChIP-chip (GMO6990 cells, SRF ChIP) 0 2 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (GMO6990 cells, SRF ChIP)\ parent encodeStanfordChipJohnson\ priority 2\ shortLabel Stan GMO6690 SRF\ track encodeStanfordChipGMO6990SRF\ encodeStanfordChipHCT116Sp3 Stan HCT116 Sp3 bedGraph 4 Stanford ChIP-chip (HCT116 cells, Sp3 ChIP) 0 2 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (HCT116 cells, Sp3 ChIP)\ parent encodeStanfordChip\ priority 2\ shortLabel Stan HCT116 Sp3\ track encodeStanfordChipHCT116Sp3\ encodeStanfordMethCRL1690 Stan Meth CRL1690 bedGraph 4 Stanford Methylation Digest (CRL1690 cells) 0 2 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest (CRL1690 cells)\ parent encodeStanfordMeth\ priority 2\ shortLabel Stan Meth CRL1690\ track encodeStanfordMethCRL1690\ encodeStanfordMethSmoothedCRL1690 Stan Meth Sc CRL1690 bedGraph 4 Stanford Methylation Digest Smoothed Score (CRL1690 cells) 0 2 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest Smoothed Score (CRL1690 cells)\ parent encodeStanfordMethSmoothed\ priority 2\ shortLabel Stan Meth Sc CRL1690\ track encodeStanfordMethSmoothedCRL1690\ encodeStanfordPromotersBe2C Stan Pro Be2c bed 9 + Stanford Promoter Activity (Be2c cells) 0 2 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (Be2c cells)\ parent encodeStanfordPromoters\ priority 2\ shortLabel Stan Pro Be2c\ track encodeStanfordPromotersBe2C\ encodeStanfordChipSmoothedHCT116Sp3 Stan Sc HCT116 Sp3 bedGraph 4 Stanford ChIP-chip Smoothed Score (HCT116 cells, Sp3 ChIP) 0 2 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip Smoothed Score (HCT116 cells, Sp3 ChIP)\ parent encodeStanfordChipSmoothed\ priority 2\ shortLabel Stan Sc HCT116 Sp3\ track encodeStanfordChipSmoothedHCT116Sp3\ encodeStanfordNRSFControl Stanf NRSF Control bed 6 . Stanford NRSF/REST Control 0 2 0 128 0 127 191 127 0 0 23 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,chrY,chrM, encodeChip 1 color 0,128,0\ longLabel Stanford NRSF/REST Control\ parent encodeStanfordNRSF\ priority 2\ shortLabel Stanf NRSF Control\ track encodeStanfordNRSFControl\ mapGenethon STS Markers bed 5 + Various STS Markers 0 2 0 0 0 127 127 127 0 0 0 map 1 group map\ longLabel Various STS Markers\ priority 2\ shortLabel STS Markers\ track mapGenethon\ type bed 5 +\ visibility hide\ tajdAd Tajima's D AD bedGraph 4 Tajima's D from African Descent 0 2 200 100 0 0 100 200 0 0 0 varRep 0 altColor 0,100,200\ color 200,100,0\ longLabel Tajima's D from African Descent\ parent tajD\ priority 2\ shortLabel Tajima's D AD\ track tajdAd\ encodeTbaBinCons TBA BinCons wig 0.0 1.0 TBA BinCons Conservation 0 2 190 70 80 222 162 167 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 0 color 190,70,80\ longLabel TBA BinCons Conservation\ parent encodeTbaCons\ priority 2\ shortLabel TBA BinCons\ track encodeTbaBinCons\ encodeTbaBinConsEl TBA BinCons bed 5 . TBA BinCons Conserved Elements 0 2 190 70 80 222 162 167 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 190,70,80\ longLabel TBA BinCons Conserved Elements\ parent encodeTbaElements\ priority 2\ shortLabel TBA BinCons\ track encodeTbaBinConsEl\ hiSeqDepthTopPt5Pct Top 0.005 Depth bed 3 Top 0.005 of Read Depth Distribution 0 2 139 69 19 197 162 137 0 0 0 map 1 longLabel Top 0.005 of Read Depth Distribution\ parent hiSeqDepth\ priority 2\ shortLabel Top 0.005 Depth\ track hiSeqDepthTopPt5Pct\ encodeUtexChip2091fibMycRaw UT Myc Fb bedGraph 4 University of Texas, Austin ChIP-chip (c-Myc, 2091 fibroblasts) 0 2 120 30 50 187 142 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 120,30,50\ longLabel University of Texas, Austin ChIP-chip (c-Myc, 2091 fibroblasts)\ parent encodeUtexChip\ priority 2\ shortLabel UT Myc Fb\ subGroups dataType=raw\ track encodeUtexChip2091fibMycRaw\ encodeUvaDnaRep2 UVa DNA Rep 2h bed 3 . University of Virginia Temporal Profiling of DNA Replication (2-4 hrs) 0 2 60 75 60 10 130 10 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 longLabel University of Virginia Temporal Profiling of DNA Replication (2-4 hrs)\ parent encodeUvaDnaRep\ priority 2\ shortLabel UVa DNA Rep 2h\ track encodeUvaDnaRep2\ encodeUvaDnaRepMid UVa DNA Rep Mid bed 3 . University of Virginia Temporal Profiling of DNA Replication (Mid) 0 2 30 80 130 142 167 192 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 color 30,80,130\ longLabel University of Virginia Temporal Profiling of DNA Replication (Mid)\ parent encodeUvaDnaRepSeg\ priority 2\ shortLabel UVa DNA Rep Mid\ track encodeUvaDnaRepMid\ encodeUvaDnaRepOriginsNSHela UVa Ori-NS HeLa bed 3 . University of Virginia DNA Replication Origins, Ori-NS, HeLa 0 2 250 0 0 252 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 color 250,0,0\ dataVersion May 2007\ longLabel University of Virginia DNA Replication Origins, Ori-NS, HeLa\ origAssembly hg17\ parent encodeUvaDnaRepOrigins\ priority 2\ shortLabel UVa Ori-NS HeLa\ track encodeUvaDnaRepOriginsNSHela\ kiddEichlerValidAbc13 Validated ABC13 bed 9 HGSV Individual ABC13 (Yoruba) Validated Sites of Structural Variation 0 2 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HGSV Individual ABC13 (Yoruba) Validated Sites of Structural Variation\ parent kiddEichlerValid\ priority 2\ shortLabel Validated ABC13\ track kiddEichlerValidAbc13\ hgdpXpehhMideast XP-EHH Mideast bedGraph 4 Human Genome Diversity Project XP-EHH (Mideast) 0 2 0 0 200 127 127 227 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 0,0,200\ longLabel Human Genome Diversity Project XP-EHH (Mideast)\ parent hgdpXpehh\ priority 2\ shortLabel XP-EHH Mideast\ track hgdpXpehhMideast\ encodeYaleChIPSTAT1HeLaMaskLess50mer38bpPval Yale 50-38 PVal bedGraph 4 Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 50-mer, 38bp Win, P-Values 0 2 50 50 200 152 152 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,50,200\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 50-mer, 38bp Win, P-Values\ parent encodeYaleChIPSTAT1Pval\ priority 2\ shortLabel Yale 50-38 PVal\ track encodeYaleChIPSTAT1HeLaMaskLess50mer38bpPval\ encodeYaleChIPSTAT1HeLaMaskLess50mer38bpSig Yale 50-38 Sig bedGraph 4 Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 50-mer, 38bp Win, Signal 0 2 112 63 175 224 66 81 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 altColor 224,66,81\ color 112,63,175\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 50-mer, 38bp Win, Signal\ parent encodeYaleChIPSTAT1Sig\ priority 2\ shortLabel Yale 50-38 Sig\ track encodeYaleChIPSTAT1HeLaMaskLess50mer38bpSig\ encodeYaleChIPSTAT1HeLaMaskLess50mer38bpSite Yale 50-38 Sites bed . Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 50-mer, 38bp Win, Binding Sites 0 2 200 50 50 50 50 200 0 0 18 chr1,chr10,chr11,chr13,chr14,chr15,chr16,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 altColor 50,50,200\ color 200,50,50\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 50-mer, 38bp Win, Binding Sites\ parent encodeYaleChIPSTAT1Sites\ priority 2\ shortLabel Yale 50-38 Sites\ track encodeYaleChIPSTAT1HeLaMaskLess50mer38bpSite\ encodeYaleMASNB4RNANprotTMREVMless36mer36bp Yale NB4 NgR RNA bedGraph 4 Yale NB4 RNA Trans Map, MAS Array, Reverse Direction, NimbleGen Protocol 0 2 50 50 200 200 50 50 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22, encodeTxLevels 0 altColor 200,50,50\ color 50,50,200\ longLabel Yale NB4 RNA Trans Map, MAS Array, Reverse Direction, NimbleGen Protocol\ parent encodeYaleMASPlacRNATransMap\ priority 2\ shortLabel Yale NB4 NgR RNA\ track encodeYaleMASNB4RNANprotTMREVMless36mer36bp\ encodeYaleMASNB4RNANProtTarsREVMless36mer36bp Yale NB4 NgR TAR bed 6 . Yale NB4 RNA TARs, MAS array, Reverse Direction, NimbleGen Protocol 0 2 50 50 200 200 50 50 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22, encodeTxLevels 1 altColor 200,50,50\ color 50,50,200\ longLabel Yale NB4 RNA TARs, MAS array, Reverse Direction, NimbleGen Protocol\ parent encodeYaleMASPlacRNATars\ priority 2\ shortLabel Yale NB4 NgR TAR\ track encodeYaleMASNB4RNANProtTarsREVMless36mer36bp\ encodeYaleAffyNeutRNATransMap01 Yale RNA Neu 1 wig -2730 3394 Yale Neutrophil RNA Transcript Map, Sample 1 0 2 50 205 50 152 230 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,205,50\ longLabel Yale Neutrophil RNA Transcript Map, Sample 1\ parent encodeYaleAffyRNATransMap\ priority 2\ shortLabel Yale RNA Neu 1\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATransMap01\ encodeYaleAffyNeutRNATars01 Yale TAR Neu 1 bed 3 . Yale Neutrophil RNA Transcriptionally Active Region, Sample 1 0 2 50 205 50 152 230 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,205,50\ longLabel Yale Neutrophil RNA Transcriptionally Active Region, Sample 1\ parent encodeYaleAffyRNATars\ priority 2\ shortLabel Yale TAR Neu 1\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATars01\ netRBestTarSyr1 Tarsier RBest Net netAlign tarSyr1 chainTarSyr1 Tarsier (Aug. 2008 (Broad/tarSyr1)) Reciprocal Best Alignment Net 0 3 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Reciprocal Best Alignment Net\ otherDb tarSyr1\ parent rBestNet\ priority 3\ shortLabel $o_Organism RBest Net\ spectrum on\ track netRBestTarSyr1\ type netAlign tarSyr1 chainTarSyr1\ visibility hide\ netSyntenyRheMac2 Rhesus Syn Net netAlign rheMac2 chainRheMac2 Rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)) Syntenic Alignment Net 0 3 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb rheMac2\ parent syntenicNet\ priority 3\ shortLabel $o_Organism Syn Net\ spectrum on\ track netSyntenyRheMac2\ type netAlign rheMac2 chainRheMac2\ visibility hide\ encodeAffyChIpHl60PvalBrg1Hr02 Affy Brg1 RA 2h wig 0.0 534.54 Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 2hrs) P-Value 0 3 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 225,0,0\ longLabel Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 2hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 3\ shortLabel Affy Brg1 RA 2h\ subGroups factor=Brg1 time=2h\ track encodeAffyChIpHl60PvalBrg1Hr02\ encodeAffyChIpHl60SignalStrictH3K9K14DHr08 Affy H3K9ac2 8h wig -2.78 3.97 Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 8hrs) Strict Signal 0 3 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 225,0,0\ longLabel Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 8hrs) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 3\ shortLabel Affy H3K9ac2 8h\ subGroups factor=H3K9K14ac2 time=8h\ track encodeAffyChIpHl60SignalStrictH3K9K14DHr08\ encodeAffyChIpHl60SitesStrictH3K9K14DHr08 Affy H3K9ac2 8h bed 3 . Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 8hrs) Strict Sites 0 3 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 225,0,0\ longLabel Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 8hrs) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 3\ shortLabel Affy H3K9ac2 8h\ subGroups factor=H3K9K14ac2 time=8h\ track encodeAffyChIpHl60SitesStrictH3K9K14DHr08\ encodeAffyChIpHl60PvalStrictH3K9K14DHr08 Affy H3K9ac2 8h wig 0 696.62 Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 8hrs) Strict P-Value 0 3 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 225,0,0\ longLabel Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 8hrs) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 3\ shortLabel Affy H3K9ac2 8h\ subGroups factor=H3K9K14ac2 time=8h\ track encodeAffyChIpHl60PvalStrictH3K9K14DHr08\ encodeAffyRnaHl60SitesHr00IntronsProximal Affy In Prx HL60 bed 4 . Affy Intronic Proximal HL60 Transfrags 0 3 224 0 32 239 127 143 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 224,0,32\ longLabel Affy Intronic Proximal HL60 Transfrags\ parent encodeNoncodingTransFrags\ priority 3\ shortLabel Affy In Prx HL60\ subGroups region=intronicProximal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr00IntronsProximal\ encodeTransFragsAffyProximal Affy Prox bed 4 Affy Proximal 0 3 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Affy Proximal\ parent encodeTransFrags\ priority 3\ shortLabel Affy Prox\ track encodeTransFragsAffyProximal\ encodeAffyRnaHl60SignalHr00 Affy RNA RA 0h wig -1168.00 1686.5 Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 0hrs) Signal 0 3 50 50 150 152 152 202 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,50,150\ longLabel Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 0hrs) Signal\ parent encodeAffyRnaSignal\ priority 3\ shortLabel Affy RNA RA 0h\ track encodeAffyRnaHl60SignalHr00\ encodeAffyRnaHl60SitesHr00 Affy RNA RA 0h bed 3 . Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 0hrs) Sites 0 3 50 50 150 152 152 202 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,50,150\ longLabel Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 0hrs) Sites\ parent encodeAffyRnaTransfrags\ priority 3\ shortLabel Affy RNA RA 0h\ track encodeAffyRnaHl60SitesHr00\ snpArrayAffy5 Affy SNP 5.0 bed 6 + Affymetrix SNP 5.0 0 3 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Affymetrix SNP 5.0\ parent snpArray off\ priority 3\ shortLabel Affy SNP 5.0\ track snpArrayAffy5\ type bed 6 +\ encodeHapMapAlleleFreqJPT Allele Freq JPT bed 6 + HapMap Minor Allele Frequencies Japanese (JPT) 0 3 0 0 0 127 127 127 1 0 7 chr2,chr4,chr7,chr8,chr9,chr12,chr18, encodeVariation 1 longLabel HapMap Minor Allele Frequencies Japanese (JPT)\ parent encodeHapMapAlleleFreq\ priority 3\ shortLabel Allele Freq JPT\ track encodeHapMapAlleleFreqJPT\ Alu_SINE Alu_SINE bed 5 Alu SINEs for Intersection 0 3 0 0 0 127 127 127 0 0 0 encodeAnalysis 1 longLabel Alu SINEs for Intersection\ parent encodeWorkshopIntersections\ priority 3\ shortLabel Alu_SINE\ track Alu_SINE\ encodeEgaspUpdAugustusDual August/Mouse Upd genePred Augustus + Mouse Homology Gene Predictions 0 3 12 85 135 133 170 195 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,85,135\ longLabel Augustus + Mouse Homology Gene Predictions\ parent encodeEgaspUpdate\ priority 3\ shortLabel August/Mouse Upd\ track encodeEgaspUpdAugustusDual\ encodeEgaspPartAugustusAbinitio Augustus genePred Augustus Ab Initio Gene Predictions 0 3 12 50 200 133 152 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,50,200\ longLabel Augustus Ab Initio Gene Predictions\ parent encodeEgaspPartial\ priority 3\ shortLabel Augustus\ track encodeEgaspPartAugustusAbinitio\ encodeBuFirstExonHeart BU Heart bed 12 + Boston University First Exon Activity in Heart 0 3 0 0 0 127 127 127 0 0 10 chr11,chr13,chr15,chr16,chr19,chr2,chr5,chr7,chr9,chrX, encodeTxLevels 1 longLabel Boston University First Exon Activity in Heart\ parent encodeBuFirstExon\ priority 3\ shortLabel BU Heart\ track encodeBuFirstExonHeart\ cccTrendPvalCd CCC Crohns Dis chromGraph Case Control Consortium Crohn's disease trend -log10 P-value 0 3 0 0 0 127 127 127 0 0 0 phenDis 0 longLabel Case Control Consortium Crohn's disease trend -log10 P-value\ parent caseControl\ priority 3\ shortLabel CCC Crohns Dis\ track cccTrendPvalCd\ hapmapLdHotspotCJ CHB + JPT bedGraph 4 Hotspots of Linkage Disequilibrium in the Chinese/Japanese HapMap (CHB and JPT) 0 3 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 longLabel Hotspots of Linkage Disequilibrium in the Chinese/Japanese HapMap (CHB and JPT)\ parent hapmapLdHotspot\ priority 3\ shortLabel CHB + JPT\ track hapmapLdHotspotCJ\ encodeDnaseHs Combined DNAseI HS bed 4 Combined DNAseI HS 0 3 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Combined DNAseI HS\ parent encodeDnase\ priority 3\ shortLabel Combined DNAseI HS\ track encodeDnaseHs\ fox2ClipSeqDensityReverseStrand Density Reverse wig 0 1406 FOX2 adaptor-trimmed CLIP-seq Density Reverse Strand 2 3 0 0 0 127 127 127 0 0 0 regulation 0 configurable on\ graphTypeDefault Bar\ longLabel FOX2 adaptor-trimmed CLIP-seq Density Reverse Strand\ maxHeightPixels 128:36:16\ noInherit on\ parent fox2ClipSeqCompViewdensity\ priority 3\ shortLabel Density Reverse\ spanList 1\ subGroups view=density\ track fox2ClipSeqDensityReverseStrand\ type wig 0 1406\ windowingFunction mean\ kiddEichlerDiscAbc12 Discordant ABC12 bed 12 HGSV Individual ABC12 (CEPH) Discordant Clone End Alignments 0 3 0 0 0 127 127 127 0 0 0 http://mrhgsv.gs.washington.edu/cgi-bin/hgc?i=$$&c=$S&l=$[&r=$]&db=$D&position=$S:$[-$] varRep 1 longLabel HGSV Individual ABC12 (CEPH) Discordant Clone End Alignments\ parent kiddEichlerDisc\ priority 3\ shortLabel Discordant ABC12\ track kiddEichlerDiscAbc12\ encodeDNDSlarge dN/dS 0.5 to 1.5 bed 4 + ENCODE Exons dN/dS 0.5 to 1.5 0 3 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel ENCODE Exons dN/dS 0.5 to 1.5\ parent encodeDNDS\ priority 3\ shortLabel dN/dS 0.5 to 1.5\ track encodeDNDSlarge\ encodeAffyEc1BrainFrontalLobeSignal EC1 Sgnl BrainF wig 0 62385 Affy Ext Trans Signal (1-base window) (Brain Frontal Lobe) 0 3 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 0 color 248,0,8\ longLabel Affy Ext Trans Signal (1-base window) (Brain Frontal Lobe)\ parent encodeAffyEcSignal\ priority 3\ shortLabel EC1 Sgnl BrainF\ track encodeAffyEc1BrainFrontalLobeSignal\ encodeAffyEc1BrainFrontalLobeSites EC1 Sites BrainF bed 3 . Affy Ext Trans Sites (1-base window) (Brain Frontal Lobe) 0 3 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 1 color 248,0,8\ longLabel Affy Ext Trans Sites (1-base window) (Brain Frontal Lobe)\ parent encodeAffyEcSites\ priority 3\ shortLabel EC1 Sites BrainF\ track encodeAffyEc1BrainFrontalLobeSites\ encodeEgaspFullEnsembl Ensembl genePred Ensembl Gene Predictions 0 3 22 150 20 138 202 137 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 22,150,20\ longLabel Ensembl Gene Predictions\ parent encodeEgaspFull\ priority 3\ shortLabel Ensembl\ track encodeEgaspFullEnsembl\ encodeUncFairePeaksChipotle FAIRE ChIPOTle bedGraph 4 University of North Carolina FAIRE Peaks (ChIPOTle) 0 3 0 0 255 50 100 50 0 0 21 chr1,chr4,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chr10,chrX, encodeChrom 0 autoScale off\ color 0,0,255\ longLabel University of North Carolina FAIRE Peaks (ChIPOTle)\ maxHeightPixels 128:24:16\ noInherit on\ parent encodeUncFaire\ priority 3\ shortLabel FAIRE ChIPOTle\ spanList 38\ track encodeUncFairePeaksChipotle\ type bedGraph 4\ viewLimits 0.4:3.7\ windowingFunction mean\ encodeGencodeGenePolymorphicMar07 Gencode Polymorph genePred Gencode Polymorphic 0 3 160 32 240 207 143 247 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 160,32,240\ longLabel Gencode Polymorphic\ parent encodeGencodeGeneMar07\ priority 3\ shortLabel Gencode Polymorph\ track encodeGencodeGenePolymorphicMar07\ encodeGencodeGenePseudoOct05 Gencode Pseudo genePred Gencode Pseudogenes 0 3 0 91 191 127 173 223 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 0,91,191\ longLabel Gencode Pseudogenes\ parent encodeGencodeGeneOct05\ priority 3\ shortLabel Gencode Pseudo\ track encodeGencodeGenePseudoOct05\ ntHumChimpCodingDiff H-C Coding Diffs bed 9 . Neandertal Alleles in Human/Chimp Coding Non-synonymous Differences in Human Lineage 0 3 0 0 0 127 127 127 0 0 24 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY,

Description

\

\ This track displays Neandertal alleles for human-chimp protein-coding\ differences on the human lineage using orangutan as the outgroup to \ determine which allele is more likely to be ancestral. \

\ \

Display Conventions and Configuration

\

\ Neandertal ancestral alleles are colored blue; derived \ (human) alleles are colored green. \

\

\ The item names show the number of Neandertal reads for the ancestral\ and derived alleles, followed by the ancestral and derived codons enclosed in\ parentheses.\ For example, if no Neandertal reads matched the ancestral base G and\ three Neandertal reads matched the derived base A, and the ancestral and\ derived codons were GTA and ATA respectively, then the item name would\ be "0G>3A(GTA>ATA)".\ If N Neandertal reads match neither ancestral nor derived\ base, then a "+N?" is added before the codons\ (i.e. "0G>3A+N?(GTA>ATA)").\

\ \

Methods

\

\ Neandertal DNA was extracted from a ~49,000-year-old bone\ (Sidrón 1253), which was excavated in El Sidrón cave,\ Asturias, Spain. Non-synonymous changes that occurred on the human\ lineage since the ancestral split with chimpanzee were identified by\ aligning human, chimpanzee and orangutan protein sequences for all\ orthologous proteins in\ HomoloGene\ (Build 58) . Comparison of these three species allowed the assignment\ of human/chimpanzee differences to their respective evolutionary\ lineages. An Agilent custom oligonucleotide array covering the 13,841\ non-synonymous changes inferred to have occurred in the human lineage\ was designed and used to capture Neandertal sequences. \

\ \

Reference

\

\ Burbano HA, Hodges E, Green RE, Briggs AW, Krause J, Meyer M, Good JM, \ Maricic T, Johnson PLF, Xuan Z et al.\ Targeted Investigation of the Neandertal Genome by Array-Based \ Sequence Capture.\ Science. 2010 7 May;328(5979):723-5.\

\ neandertal 1 chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY\ group neandertal\ itemRgb on\ longLabel Neandertal Alleles in Human/Chimp Coding Non-synonymous Differences in Human Lineage\ noScoreFilter .\ priority 3\ shortLabel H-C Coding Diffs\ track ntHumChimpCodingDiff\ type bed 9 .\ visibility hide\ encodeHapMapCovJPT HapMap Cov JPT wig 0.0 100.0 HapMap Resequencing Coverage Japanese (JPT) 0 3 0 0 0 127 127 127 0 0 7 chr2,chr4,chr7,chr8,chr9,chr12,chr18, encodeVariation 0 longLabel HapMap Resequencing Coverage Japanese (JPT)\ parent encodeHapMapCov\ priority 3\ shortLabel HapMap Cov JPT\ track encodeHapMapCovJPT\ hapmapSnpsCHB HapMap SNPs CHB bed 6 + HapMap SNPs from the CHB Population (Han Chinese in Beijing, China) 0 3 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HapMap SNPs from the CHB Population (Han Chinese in Beijing, China)\ parent hapmapSnps\ priority 3\ shortLabel HapMap SNPs CHB\ track hapmapSnpsCHB\ hgdpHzyMideast Hetzgty Mideast bedGraph 4 Human Genome Diversity Proj Smoothd Expec Heterozygosity (Mideast) 0 3 0 0 200 127 127 227 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 0,0,200\ longLabel Human Genome Diversity Proj Smoothd Expec Heterozygosity (Mideast)\ parent hgdpHzy\ priority 3\ shortLabel Hetzgty Mideast\ track hgdpHzyMideast\ hgdpIhsEurope iHS Europe bedGraph 4 Human Genome Diversity Project iHS (Europe) 0 3 240 144 0 247 199 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 color 240,144,0\ longLabel Human Genome Diversity Project iHS (Europe)\ parent hgdpIhs\ priority 3\ shortLabel iHS Europe\ track hgdpIhsEurope\ encodeGencodeIntergenicProximal Intergenic Prox bed 4 . Gencode Intergenic Proximal Regions 0 3 0 0 0 127 127 127 0 0 0 encodeAnalysis 1 longLabel Gencode Intergenic Proximal Regions\ parent encodeGencodeRegions\ priority 3\ shortLabel Intergenic Prox\ track encodeGencodeIntergenicProximal\ encodeAllIntersectEl Intersect bed 5 . TBA and MLAGAN PhastCons/BinCons/GERP Intersection Conserved Elements 0 3 80 145 105 167 200 180 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 80,145,105\ longLabel TBA and MLAGAN PhastCons/BinCons/GERP Intersection Conserved Elements\ parent encodeAllElements\ priority 3\ shortLabel Intersect\ track encodeAllIntersectEl\ encodeAllIntronsDistal Intronic Dist bed 4 Consensus Intronic Distal 0 3 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Consensus Intronic Distal\ parent encodeWorkshopSelections\ priority 3\ shortLabel Intronic Dist\ track encodeAllIntronsDistal\ iscaRetrospectivePathogenic ISCA Ret Pathog. gvf Internat. Stds. for Cytogen. Arrays Consort. (ISCA) - Retrospective (Pathogenic) 0 3 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Internat. Stds. for Cytogen. Arrays Consort. (ISCA) - Retrospective (Pathogenic)\ parent iscaRetrospectiveComposite\ priority 3\ shortLabel ISCA Ret Pathog.\ track iscaRetrospectivePathogenic\ hapmapLdChb LD CHB bed 4 + Linkage Disequilibrium for the Han Chinese (CHB) 0 3 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 longLabel Linkage Disequilibrium for the Han Chinese (CHB)\ parent hapmapLd\ priority 3\ shortLabel LD CHB\ track hapmapLdChb\ encodeUcsdChipHeLaH3H4tmH3K4_p0 LI H3K4me3 -gIF bedGraph 4 Ludwig Institute ChIP-chip: H3K4me3 ab, HeLa cells, no gamma interferon 0 3 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: H3K4me3 ab, HeLa cells, no gamma interferon\ parent encodeLIChIPgIF\ priority 3\ shortLabel LI H3K4me3 -gIF\ track encodeUcsdChipHeLaH3H4tmH3K4_p0\ encodeUcsdNgHeLaRnap_p0 LI Ng Pol2 -gIF bedGraph 4 Ludwig Institute/UCSD ChIP/Chip Ng: HeLa, Pol2, no gamma interferon 0 3 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Ludwig Institute/UCSD ChIP/Chip Ng: HeLa, Pol2, no gamma interferon\ parent encodeUcsdNgGif\ priority 3\ shortLabel LI Ng Pol2 -gIF\ track encodeUcsdNgHeLaRnap_p0\ encodeUcsdChipRnapImr90_f LI Pol2 IMR90 bedGraph 4 Ludwig Institute ChIP-chip: Pol2 8WG16 ab, IMR90 cells 0 3 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: Pol2 8WG16 ab, IMR90 cells\ parent encodeLIChIP\ priority 3\ shortLabel LI Pol2 IMR90\ track encodeUcsdChipRnapImr90_f\ encodeMlaganGerpEl MLAGAN GERP bed 5 . MLAGAN GERP Conserved Elements 0 3 120 80 120 187 167 187 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 120,80,120\ longLabel MLAGAN GERP Conserved Elements\ parent encodeMlaganElements\ priority 3\ shortLabel MLAGAN GERP\ track encodeMlaganGerpEl\ encodeMlaganGerpCons MLAGAN GERP Cons wig 0.0 3.0 MLAGAN GERP Conservation 0 3 120 80 120 187 167 187 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 0 autoScale Off\ color 120,80,120\ longLabel MLAGAN GERP Conservation\ maxHeightPixels 100:25:11\ noInherit on\ parent encodeMlaganCons\ priority 3\ shortLabel MLAGAN GERP Cons\ track encodeMlaganGerpCons\ type wig 0.0 3.0\ windowingFunction mean\ snp131NonUnique Non-Unique SNPs bed 6 + Non-uniquely Mapped Simple Nucleotide Polymorphisms (dbSNP build 131) 0 3 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$ varRep 1 longLabel Non-uniquely Mapped Simple Nucleotide Polymorphisms (dbSNP build 131)\ parent snp131Composite\ priority 3\ shortLabel Non-Unique SNPs\ subGroups view=nonu\ track snp131NonUnique\ visibility hide\ numtSMitochondrion NumtS on mitochon bed 6 . Human NumtS on mitochondrion 0 3 0 60 120 127 157 187 1 0 0

Description and display conventions

\

\ NumtS (Nuclear mitochondrial sequences) are mitochondrial fragments inserted in nuclear genomic sequences. The most credited hypothesis concerning their generation suggests that in presence of mutagenic agents or under stress conditions fragments of mtDNA escape from mitochondria, reach the nucleus and insert into chromosomes during break repair, although NumtS can derive from duplication of genomic fragments. NumtS may be cause of contamination during human mtDNA sequencing and hence frequent false low heteroplasmic evidences have been reported.\ The Bioinformatics group chaired by M.Attimonelli (Bari, Italy) has produced the RHNumtS compilation annotating more than 500 Human NumtS. To allow the scientific community to access to the compilation and to perform genomics comparative analyses inclusive of the NumtS data, the group has designed the Human NumtS tracks below described.\

\ \

\ The NumtS tracks show the High Score Pairs (HSPs) obtained by aligning the mitochondrial reference genome (NC_012920) with the hg18 release of the human genome.\

\
    \
  1. "NumtS (Nuclear mitochondrial Sequences)" Track\

    \ The "NumtS mitochondrial sequences" track shows the mapping of the HSPs returned by BlastN on the nuclear genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the mitochondrial mapping is provided, thus allowing a fast cross among the NumtS genomic contexts.\

    \
  2. \ \
  3. "NumtS assembled" Track\

    \ The "NumtS assembled" track shows items obtained by assembling HSPs annotated in the "NumtS" track fulfilling the following conditions:\

    \

    \ Exceptions for the second condition arise when a long repetitive element is present between two HSPs.\

    \
  4. \ \
  5. "NumtS on mitochondrion" Track\

    \ The "NumtS on mitochondrion" track shows the mapping of the HSPs on the mitochondrial genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the nuclear mapping is provided.\

    \ \
  6. "NumtS on mitochondrion with chromosome placement" Track\

    \ The "NumtS on mitochondrion with chromosome placement" shows the mapping of the HSPs on the mitochondrial genome, but the items are coloured according to the colours assigned to each human chromosome on the UCSC genome browser. No shading is here provided. For every item, a link pointing to the nuclear mapping is provided.\

    \
\ \

Methods

\

\ NumtS mappings were obtained by running Blast2seq (program: BlastN) between each chromosome of of the Human Genome hg18 build and the human mitochondrial reference sequence (rCRS, AC: NC_012920), fixing the e-value threshold to 1e-03. The assembling of the HSPs was performed with spreadsheet interpolation and manual inspection.\

\ \

Verification

\

\ NumtS predicted in silico were validated by carrying out PCR amplification and sequencing on blood-extracted DNA of a healthy individual of European origin. PCR amplification was successful for 275 NumtS and provided amplicons of the expected length. All PCR fragments were sequenced on both strands, and submitted to the EMBL databank.\

\

\ Furthermore, 541 NumtS were validated by merging NumtS nuclear coordinates with HapMap annotations. Our analysis has been carried on eight HapMap individuals (NA18517, NA18507, NA18956, NA19240, NA18555, NA12878, NA19129, NA12156). For each sample, clones with a single best concordant placement (according to the fosmid end-sequence-pair analysis described in Kidd et al., 2008), have been considered. The analysis showed that 541 NumtS (at least 30bp for each one) had been sequenced in such samples.\

\ \

Credits

\

\ These data were provided by Domenico Simone and Marcella Attimonelli at Department of Biochemistry and Molecular Biology "Ernesto Quagliariello" (University of Bari, Italy). Primer designing was carried out by Francesco Calabrese and Giuseppe Mineccia. PCR validation was carried out by Martin Lang, Domenico Simone and Giuseppe Gasparre. Merging with HapMap annotations has been performed by Domenico Simone.\

\ \

References

\

\ Simone D, Calabrese FM, Lang M, Gasparre G, Attimonelli M: Validation and UCSC tracks of the extended RHNumtS compilation (submitted). \

\ \

\ Lascaro D, Castellana S, Gasparre G, Romeo G, Saccone S, Attimonelli M. The RHNumtS compilation: features and bioinformatics approaches to locate and quantify Human NumtS. BMC\ Genomics. 2008 June 3;9:267.\

\ \

\ Kidd JM, Cooper GM, Donahue WF, et al.\ \ Mapping and sequencing of structural variation from eight human genomes.\ Nature, 2008, 453(7191):56-64.\

\ \ \ \ varRep 1 color 0,60,120\ html numtSeq\ longLabel Human NumtS on mitochondrion\ parent numtSeq\ priority 3\ shortLabel NumtS on mitochon\ track numtSMitochondrion\ type bed 6 .\ useScore 1\ snpRecombRatePerlegen Perlegen bedGraph 4 Oxford Recombination Rates from Perlegen Data 0 3 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 longLabel Oxford Recombination Rates from Perlegen Data\ parent snpRecombRate\ priority 3\ shortLabel Perlegen\ track snpRecombRatePerlegen\ encodeGencodeRaceFragsColon RACEfrags Colon genePred Gencode RACEfrags from Colon 0 3 250 5 255 252 130 255 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 250,5,255\ longLabel Gencode RACEfrags from Colon\ parent encodeGencodeRaceFrags\ priority 3\ shortLabel RACEfrags Colon\ track encodeGencodeRaceFragsColon\ cnpSebat Sebat CNPs bed 4 + Copy Number Polymorphisms from ROMA (Sebat) 0 3 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows 81 regions detected as putative copy number polymorphisms by representational \ oligonucleotide microarray analysis (ROMA) in a population of 20 normal individuals. \ \

Methods

\

\ \ Following digestion of with BglII or HindIII, genomic DNA was hybridized to a \ custom array consisting of 85,000 oligonucleotide probes, \ probes were selected to be free of common repeats and have unique homology within the human genome. \ \ The average resolution of the array is ~35 kb, however only intervals in which 3 consecutive probes \ showed concordant signal were scored as CNPs. \ \ All hybridizations were performed in duplicate incorporating a dye-reversal, \ with the false positive rate estimated to be ~6%.\
\ Note that CNP intervals as detailed by Sebat et al. (2004) were converted from the \ April 2003 (build33) into the July 2003 (build34) assembly using liftover.\ \ \

References

\

\ Sebat J, Lakshmi B, Troge J, Alexander J,\ Young J, Lundin P, Maner S, Massa H, Walker M, Chi M, Navin N, Lucito R, Healy\ J, Hicks J, Ye K, Reiner A, Gilliam TC, Trask B, Patterson N, Zetterberg A, Wigler\ M (2004) Large-Scale\ Copy Number Polymorphism in the Human Genome. Science 305:525-528\ varRep 1 longLabel Copy Number Polymorphisms from ROMA (Sebat)\ noInherit on\ parent cnp\ priority 3\ shortLabel Sebat CNPs\ track cnpSebat\ type bed 4 +\ decodeSexAveragedNonCarrier Sex Average Non-carrier bigWig 0.0 113.023 deCODE recombination map, sex-average non-carrier 2 3 252 79 89 253 167 172 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 color 252,79,89\ configurable on\ longLabel deCODE recombination map, sex-average non-carrier\ parent avgView off\ priority 3\ shortLabel Sex Average Non-carrier\ subGroups view=avg\ track decodeSexAveragedNonCarrier\ type bigWig 0.0 113.023\ encodeSangerChipH3K4me3 SI H3K4m3 GM06990 bedGraph 4 Sanger Institute ChIP/Chip (H3K4me3 ab, GM06990 cells) 0 3 10 10 130 132 132 192 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 10,10,130\ longLabel Sanger Institute ChIP/Chip (H3K4me3 ab, GM06990 cells)\ parent encodeSangerChipH3H4\ priority 3\ shortLabel SI H3K4m3 GM06990\ track encodeSangerChipH3K4me3\ encodeRegulomeQualitySKNSH SKNSH bed 5 . SKNSH Quality 0 3 120 50 180 187 152 217 1 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 1 color 120,50,180\ longLabel SKNSH Quality\ parent encodeRegulomeQuality\ priority 3\ shortLabel SKNSH\ track encodeRegulomeQualitySKNSH\ encodeRegulomeProbSKNSH SKNSH bedGraph 4 SKNSH DNaseI HSs 0 3 120 50 180 187 152 217 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 120,50,180\ longLabel SKNSH DNaseI HSs\ parent encodeRegulomeProb\ priority 3\ shortLabel SKNSH\ track encodeRegulomeProbSKNSH\ encodeRegulomeBaseSKNSH SKNSH wig 0.0 3.0 SKNSH DNaseI Sensitivity 0 3 120 50 180 187 152 217 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 120,50,180\ longLabel SKNSH DNaseI Sensitivity\ parent encodeRegulomeBase\ priority 3\ shortLabel SKNSH\ track encodeRegulomeBaseSKNSH\ tajdSnpEd SNPs ED bed 4 . SNPs from European Descent 0 3 200 100 0 0 100 200 0 0 0 varRep 1 altColor 0,100,200\ color 200,100,0\ longLabel SNPs from European Descent\ parent tajdSnp\ priority 3\ shortLabel SNPs ED\ track tajdSnpEd\ stanfordChipHepG2GABP Stan HepG2 GABP bedGraph 4 Stanford ChIP-chip (HepG2 cells, GABP ChIP) 0 3 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 0 longLabel Stanford ChIP-chip (HepG2 cells, GABP ChIP)\ parent stanfordChip\ priority 3\ shortLabel Stan HepG2 GABP\ track stanfordChipHepG2GABP\ encodeStanfordChipHepG2GABP Stan HepG2 GABP bedGraph 4 Stanford ChIP-chip (HepG2 cells, GABP ChIP) 0 3 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (HepG2 cells, GABP ChIP)\ parent encodeStanfordChipJohnson\ priority 3\ shortLabel Stan HepG2 GABP\ track encodeStanfordChipHepG2GABP\ encodeStanfordChipJurkatSp1 Stan Jurkat Sp1 bedGraph 4 Stanford ChIP-chip (Jurkat cells, Sp1 ChIP) 0 3 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (Jurkat cells, Sp1 ChIP)\ parent encodeStanfordChip\ priority 3\ shortLabel Stan Jurkat Sp1\ track encodeStanfordChipJurkatSp1\ encodeStanfordMethHCT116 Stan Meth HCT116 bedGraph 4 Stanford Methylation Digest (HCT116 cells) 0 3 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest (HCT116 cells)\ parent encodeStanfordMeth\ priority 3\ shortLabel Stan Meth HCT116\ track encodeStanfordMethHCT116\ encodeStanfordMethSmoothedHCT116 Stan Meth Sc HCT116 bedGraph 4 Stanford Methylation Digest Smoothed Score (HCT116 cells) 0 3 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest Smoothed Score (HCT116 cells)\ parent encodeStanfordMethSmoothed\ priority 3\ shortLabel Stan Meth Sc HCT116\ track encodeStanfordMethSmoothedHCT116\ encodeStanfordPromotersCRL1690 Stan Pro CRL1690 bed 9 + Stanford Promoter Activity (CRL1690 cells) 0 3 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (CRL1690 cells)\ parent encodeStanfordPromoters\ priority 3\ shortLabel Stan Pro CRL1690\ track encodeStanfordPromotersCRL1690\ encodeStanfordChipSmoothedJurkatSp1 Stan Sc Jurkat Sp1 bedGraph 4 Stanford ChIP-chip Smoothed Score (Jurkat cells, Sp1 ChIP) 0 3 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip Smoothed Score (Jurkat cells, Sp1 ChIP)\ parent encodeStanfordChipSmoothed\ priority 3\ shortLabel Stan Sc Jurkat Sp1\ track encodeStanfordChipSmoothedJurkatSp1\ stsMarker STS Markers bed 5 + STS Markers on Genetic (blue), FISH (green) and RH (black) Maps 1 3 0 0 0 128 128 255 0 0 0

Description

\

This track shows locations of Sequence Tagged Site (STS) markers\ along the draft assembly.

\ \

Method

\ These STSs have been mapped using \ either genetic mapping (Genethon and Marshfield maps),\ radiation hybridization mapping (Stanford, Whitehead RH, and GeneMap99 maps) or\ YAC mapping (the Whitehead YAC map) techniques. \ Prior to August 2001, this track also\ showed the approximate positions of fluorescent in situ hybridization (FISH) mapped clones.\ In August 2001 and later assemblies, the FISH clones are displayed in a separate \ track.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a map data set \ within the track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. In the pulldown menu, select the map whose data you would like to highlight or exclude in the display. By default, the "All Genetic" option is selected.\
  2. Choose the color or display characteristic that will be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not\ display data from the map selected in the pulldown list. If "include" is selected, the browser\ will display only data from the selected map.\
  3. When you have finished configuring the filter, click the Submit button.\

\ \

Credits

\

Many thanks to the researchers who worked on these\ maps, and to Greg Schuler, Arek Kasprzyk, Wonhee Jang,\ Terry Furey and Sanja Rogic for helping\ process the data. Additional data on the individual maps can be\ found at the following links:

\ \ \ map 1 altColor 128,128,255,\ group map\ longLabel STS Markers on Genetic (blue), FISH (green) and RH (black) Maps\ priority 3\ shortLabel STS Markers\ track stsMarker\ type bed 5 +\ visibility dense\ encodeTbaGerpEl TBA GERP bed 5 . TBA GERP Conserved Elements 0 3 120 80 120 187 167 187 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 120,80,120\ longLabel TBA GERP Conserved Elements\ parent encodeTbaElements\ priority 3\ shortLabel TBA GERP\ track encodeTbaGerpEl\ encodeTbaGerpCons TBA GERP Cons wig 0.0 3.0 TBA GERP Conservation 0 3 120 80 120 187 167 187 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 0 autoScale Off\ color 120,80,120\ longLabel TBA GERP Conservation\ maxHeightPixels 100:25:11\ noInherit on\ parent encodeTbaCons\ priority 3\ shortLabel TBA GERP Cons\ track encodeTbaGerpCons\ type wig 0.0 3.0\ windowingFunction mean\ hiSeqDepthTop1Pct Top 0.01 Depth bed 3 Top 0.01 of Read Depth Distribution 0 3 139 69 19 197 162 137 0 0 0 map 1 longLabel Top 0.01 of Read Depth Distribution\ parent hiSeqDepth\ priority 3\ shortLabel Top 0.01 Depth\ track hiSeqDepthTop1Pct\ encodeUtexChip2091fibMycStimRaw UT Myc st-Fb bedGraph 4 University of Texas, Austin ChIP-chip (c-Myc, FBS-stimulated 2091 fibroblasts) 0 3 120 30 50 187 142 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 120,30,50\ longLabel University of Texas, Austin ChIP-chip (c-Myc, FBS-stimulated 2091 fibroblasts)\ parent encodeUtexChip\ priority 3\ shortLabel UT Myc st-Fb\ subGroups dataType=raw\ track encodeUtexChip2091fibMycStimRaw\ encodeUvaDnaRep4 UVa DNA Rep 4h bed 3 . University of Virginia Temporal Profiling of DNA Replication (4-6 hrs) 0 3 60 75 60 10 130 10 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 longLabel University of Virginia Temporal Profiling of DNA Replication (4-6 hrs)\ parent encodeUvaDnaRep\ priority 3\ shortLabel UVa DNA Rep 4h\ track encodeUvaDnaRep4\ encodeUvaDnaRepLate UVa DNA Rep Late bed 3 . University of Virginia Temporal Profiling of DNA Replication (Late) 0 3 10 110 160 132 182 207 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 color 10,110,160\ longLabel University of Virginia Temporal Profiling of DNA Replication (Late)\ parent encodeUvaDnaRepSeg\ priority 3\ shortLabel UVa DNA Rep Late\ track encodeUvaDnaRepLate\ encodeUvaDnaRepOriginsBubbleHela UVa Ori-Bubble HeLa bed 3 . University of Virginia DNA Replication Origins, Ori-Bubble, HeLa 0 3 0 0 128 127 127 191 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 color 0,0,128\ dataVersion May 2007\ longLabel University of Virginia DNA Replication Origins, Ori-Bubble, HeLa\ origAssembly hg17\ parent encodeUvaDnaRepOrigins\ priority 3\ shortLabel UVa Ori-Bubble HeLa\ track encodeUvaDnaRepOriginsBubbleHela\ kiddEichlerValidAbc12 Validated ABC12 bed 9 HGSV Individual ABC12 (CEPH) Validated Sites of Structural Variation 0 3 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HGSV Individual ABC12 (CEPH) Validated Sites of Structural Variation\ parent kiddEichlerValid\ priority 3\ shortLabel Validated ABC12\ track kiddEichlerValidAbc12\ hgdpXpehhEurope XP-EHH Europe bedGraph 4 Human Genome Diversity Project XP-EHH (Europe) 0 3 240 144 0 247 199 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 240,144,0\ longLabel Human Genome Diversity Project XP-EHH (Europe)\ parent hgdpXpehh\ priority 3\ shortLabel XP-EHH Europe\ track hgdpXpehhEurope\ encodeYaleChIPSTAT1HeLaMaskLess50mer50bpPval Yale 50-50 PVal bedGraph 4 Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 50-mer, 50bp Win, P-Values 0 3 50 50 200 152 152 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,50,200\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 50-mer, 50bp Win, P-Values\ parent encodeYaleChIPSTAT1Pval\ priority 3\ shortLabel Yale 50-50 PVal\ track encodeYaleChIPSTAT1HeLaMaskLess50mer50bpPval\ encodeYaleChIPSTAT1HeLaMaskLess50mer50bpSig Yale 50-50 Sig bedGraph 4 Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 50-mer, 50bp Win, Signal 0 3 112 63 175 224 66 81 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 altColor 224,66,81\ color 112,63,175\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 50-mer, 50bp Win, Signal\ parent encodeYaleChIPSTAT1Sig\ priority 3\ shortLabel Yale 50-50 Sig\ track encodeYaleChIPSTAT1HeLaMaskLess50mer50bpSig\ encodeYaleChIPSTAT1HeLaMaskLess50mer50bpSite Yale 50-50 Sites bed . Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 50-mer, 50bp Win, Binding Sites 0 3 200 50 50 50 50 200 0 0 18 chr1,chr10,chr11,chr13,chr14,chr15,chr16,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 altColor 50,50,200\ color 200,50,50\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells) Maskless 50-mer, 50bp Win, Binding Sites\ parent encodeYaleChIPSTAT1Sites\ priority 3\ shortLabel Yale 50-50 Sites\ track encodeYaleChIPSTAT1HeLaMaskLess50mer50bpSite\ encodeYaleMASPlacRNANprotTMFWDMless36mer36bp Yale Plc NgF RNA bedGraph 4 Yale Placenta RNA Trans Map, MAS Array, Forward Direction, NimbleGen Protocol 0 3 200 50 50 50 50 200 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22, encodeTxLevels 0 altColor 50,50,200\ color 200,50,50\ longLabel Yale Placenta RNA Trans Map, MAS Array, Forward Direction, NimbleGen Protocol\ parent encodeYaleMASPlacRNATransMap\ priority 3\ shortLabel Yale Plc NgF RNA\ track encodeYaleMASPlacRNANprotTMFWDMless36mer36bp\ encodeYaleMASPlacRNANprotTarsFWDMless36mer36bp Yale Plc NgF TAR bed 6 . Yale Placenta RNA TARs, MAS array, Forward Direction, NimbleGen Protocol 0 3 200 50 50 50 50 200 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22, encodeTxLevels 1 altColor 50,50,200\ color 200,50,50\ longLabel Yale Placenta RNA TARs, MAS array, Forward Direction, NimbleGen Protocol\ parent encodeYaleMASPlacRNATars\ priority 3\ shortLabel Yale Plc NgF TAR\ track encodeYaleMASPlacRNANprotTarsFWDMless36mer36bp\ encodePseudogeneYale Yale Pseudogenes genePred Yale Pseudogene Predictions 0 3 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, http://www.pseudogene.org/cgi-bin/search-results.cgi?tax_id=9606&set_search=25&criterion0=pgene_by_acc&operator0=%3D&sort=1&output=html&searchValue0_0=$$ encodeGenes 1 longLabel Yale Pseudogene Predictions\ parent encodePseudogene\ priority 3\ shortLabel Yale Pseudogenes\ track encodePseudogeneYale\ url http://www.pseudogene.org/cgi-bin/search-results.cgi?tax_id=9606&set_search=25&criterion0=pgene_by_acc&operator0=%3D&sort=1&output=html&searchValue0_0=$$\ urlLabel Yale Pseudogene Link:\ urlName gene\ encodeYaleAffyNeutRNATransMap02 Yale RNA Neu 2 wig -2730 3394 Yale Neutrophil RNA Transcript Map, Sample 2 0 3 50 190 50 152 222 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,190,50\ longLabel Yale Neutrophil RNA Transcript Map, Sample 2\ parent encodeYaleAffyRNATransMap\ priority 3\ shortLabel Yale RNA Neu 2\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATransMap02\ encodeYaleAffyNeutRNATars02 Yale TAR Neu 2 bed 3 . Yale Neutrophil RNA Transcriptionally Active Region, Sample 2 0 3 50 190 50 152 222 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,190,50\ longLabel Yale Neutrophil RNA Transcriptionally Active Region, Sample 2\ parent encodeYaleAffyRNATars\ priority 3\ shortLabel Yale TAR Neu 2\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATars02\ encodeEgaspFullEnsemblPseudo Ensembl Pseudo genePred Ensembl Pseudogene Predictions 0 3.1 130 130 130 192 192 192 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 130,130,130\ longLabel Ensembl Pseudogene Predictions\ parent encodeEgaspFull\ priority 3.1\ shortLabel Ensembl Pseudo\ track encodeEgaspFullEnsemblPseudo\ ntSssZScorePMVar Sel Swp Scan (S) bigWig -8.8332 33.637199 Selective Sweep Scan (S) on Neandertal vs. Human Polymorphisms (Z-Score +- Variance) 0 3.1 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,

Description

\

\ This track shows the S score (Z-score +- variance) for positive selection in humans within a 100 kb\ window surrounding each polymorphic position in the five modern human\ sequences and the human reference genome as described in Green et al.,\ Supplemental Online Material Text 13, Burbano et al.. \ A positive score indicates more derived alleles in Neandertal than\ expected, given the frequency of derived alleles in human. A negative\ score indicates fewer derived alleles in Neandertal, and may indicate an \ episode of positive selection in early humans.\

\

\ To view the polymorphic sites on which the S score was computed, open\ the S\ SNPs track.

\ \

Methods

\

\ Green et al. identified single-base sites that are\ polymorphic among five modern human genomes of diverse ancestry\ (in the \ Modern Human Seq \ track) plus the human reference\ genome. CpG sites were excluded because of the higher mutation rate\ at CpG sites.\ The ancestral or derived state of each single nucleotide polymorphism\ (SNP) was determined by comparison with the chimpanzee genome.\ The SNPs are displayed in the \ S SNPs track.\ The fact that SNPs with higher frequencies of the derived\ allele in modern humans were more likely to show the derived allele in\ Neandertals was used to calculate the expected number of derived alleles in\ Neandertal within a given region of the human genome.\ The observed numbers of derived alleles were compared to the expected\ numbers to identify regions where the Neandertals carry fewer\ derived alleles than expected given the human allelic states. The\ score assigned to each SNP is\ the z-score of the observed and expected counts relative\ to the variance in the number of the expected counts of derived\ alleles within the 100,000-base window around the SNP.\

\

\ Note: In order to display both the score and the variance within\ the same track in the UCSC Genome Browser, the scores were modified as\ follows: at the SNP position, the value displayed is the score plus\ the variance. At the position following the SNP position, the score\ minus the variance is displayed. When viewing large regions (at least\ 100,000 bases), the default mean+whiskers condensation of the scores\ provides an indication of the range covered by the variance.\

\ \

Reference

\

\ Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N,\ Li H, Zhai W, Fritz MH et al.\ A Draft Sequence of the Neandertal Genome.\ Science. 2010 7 May;328(5979):710-22.\

\ neandertal 0 autoScale off\ chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX\ group neandertal\ longLabel Selective Sweep Scan (S) on Neandertal vs. Human Polymorphisms (Z-Score +- Variance)\ maxHeightPixels 128:32:11\ priority 3.1\ shortLabel Sel Swp Scan (S)\ track ntSssZScorePMVar\ type bigWig -8.8332 33.637199\ viewLimits -5:1\ visibility hide\ windowingFunction mean\ yLineMark -2\ yLineOnOff on\ ntSssTop5p 5% Lowest S bed 5 Selective Sweep Scan (S): 5% Smallest S scores 0 3.2 0 0 0 127 127 127 1 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,

Description

\

\ This track shows regions of the human genome with a strong signal for depletion\ of Neandertal-derived alleles (regions from the \ Sel Swp Scan \ (S) track with S scores in the lowest 5%),\ which may indicate an episode of positive selection in early\ humans.

\ \

Display Conventions and Configuration

\

\ Grayscale shading is used as a rough indicator of the strength of the\ score; the darker the item, the stronger its negative score. The\ strongest negative score (-8.7011) is shaded black, and the shading\ lightens from dark to light gray as the negative score weakens (weakest\ score is -4.3202).\

\ \

Methods

\

\ Green et al. identified single-base sites that are\ polymorphic among five modern human genomes of diverse ancestry\ (in the \ Modern Human \ Seq track) plus the human reference\ genome, and determined ancestral or derived state of each\ single nucleotide polymorphism (SNP) by comparison with the chimpanzee\ genome. The SNPs are displayed in the \ S SNPs track.\ The human allele states were used to estimate an expected number\ of derived alleles in Neandertal in the 100,000-base window around \ each SNP, and a measure called the S score was developed, displayed in the \ Sel Swp Scan \ (S) track, to compare the observed\ number of Neandertal alleles in each window to the expected number. \ An S score significantly less than zero indicates a reduction of \ Neandertal-derived alleles (or an increase of human-derived alleles not found in\ Neandertal), consistent with the scenario of positive selection in\ the human lineage since divergence from Neandertals.\

\

\ Genomic regions of 25,000 or more bases in which all polymorphic sites\ were at least 2 standard deviations below the expected value were\ identified, and S was recomputed on each such region. Regions with S\ scores in the lowest 5% (strongest negative scores) were prioritized\ for further analysis as described in Green et al..\

\ \

Credits

\

\ This track was produced at UCSC using data generated by\ Ed Green.\

\ \

References

\

\ Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N,\ Li H, Zhai W, Fritz MH et al.\ A Draft Sequence of the Neandertal Genome.\ Science. 2010 7 May;328(5979):710-22.\

\ \ neandertal 1 bedNameLabel Score\ chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX\ group neandertal\ longLabel Selective Sweep Scan (S): 5% Smallest S scores\ noScoreFilter .\ priority 3.2\ shortLabel 5% Lowest S\ track ntSssTop5p\ type bed 5\ useScore .\ visibility hide\ ntSssSnps S SNPs bed 9 SNPS Used for Selective Sweep Scan (S) 0 3.3 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,

Description

\

\ This track shows single nucleotide polymorphisms (SNPs) used in a\ genome-wide scan for signals of positive selection in the human\ lineage since divergence from the Neandertal lineage.

\

\ SNP labels represent the ancestral (A) or derived (D) status,\ determined by comparison with the chimpanzee reference genome,\ of alleles in the human reference assembly, five modern human\ genomes of diverse ancestry (see the \ Modern Human Seq \ track), and Neandertals. \ The first six characters of an item name show the status of the allele\ (A, D or _ if not known)\ in six genomes: human reference, San, Yoruba, Han, Papuan, and French, in that\ order. These characters are followed by a colon, the number of derived alleles\ found in Neandertals, a comma and the number of ancestral alleles found in\ Neandertals. \ For example, a SNP labeled AAADAA:0D,2A has the ancestral allele in\ the reference human genome and in all of the modern human genomes\ except Han. Among Neandertals, two instances of the ancestral allele \ were found, but no instances of the derived allele. \

\

\ SNPs are colored red when at least four of the six modern human\ genomes are derived while all observed Neandertal alleles are\ ancestral. An overrepresentation of such SNPs in a region would \ imply that the region had undergone positive selection in the \ modern human lineage since divergence from Neandertals; the \ Sel Swp Scan \ (S) track displays a signal calculated from these SNPs, and the \ 5% Lowest S\ track contains the regions in which the signal most strongly indicates selective\ pressure on the modern human lineage.\

\ \

Display Conventions and Configuration

\

\ Red\ SNPs are those where at least four of the six modern human\ genomes are derived while all observed Neandertal alleles are\ ancestral. All other SNPs are black.\

\ \

Methods

\

\ For the purposes of this analysis, SNPs were defined as single-base\ sites that are polymorphic among 5 modern human genomes of diverse\ ancestry (see the \ Modern Human Seq \ track) plus the human reference\ genome. SNPs at CpG sites were excluded because of the higher\ mutation rate at CpG sites. \ Ancestral or derived state was determined by comparison with the \ chimpanzee genome. \

\ \

Credits

\

\ This track was produced at UCSC using data generated by\ Ed Green.\

\ \

Reference

\

\ Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N,\ Li H, Zhai W, Fritz MH et al.\ A Draft Sequence of the Neandertal Genome.\ Science. 2010 7 May;328(5979):710-22.\

\ neandertal 1 chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX\ exonArrows off\ group neandertal\ itemRgb on\ longLabel SNPS Used for Selective Sweep Scan (S)\ noScoreFilter .\ priority 3.3\ shortLabel S SNPs\ track ntSssSnps\ type bed 9\ visibility hide\ ntOoaHaplo Cand. Gene Flow bed 9 + Candidate Regions for Gene Flow from Neandertal to Non-African Modern Humans 0 3.8 0 0 0 127 127 127 0 0 10 chr1,chr4,chr5,chr6,chr9,chr10,chr15,chr17,chr20,chr22,

Description

\

\ This track shows 13 regions of the human genome in which there is\ considerably more haplotype diversity among non-African genomes than\ within African genomes. A prediction of Neandertal-to-modern human\ gene flow is that these deeply divergent haplotypes which exist only in\ non-African populations entered the human gene pool from Neandertals.\ Of the 12 candidate gene flow regions with tag SNP data, there are 10\ regions in which Neandertals match the deep haplotype clade unique to\ non-Africans (out of Africa, OOA) instead of the cosmopolitan\ haplotype clade shared by Africans and non-Africans (cosmopolitan,\ COS).\

\

\ The table below was copied from Table 5, "Non-African haplotypes \ match Neandertal at an unexpected rate", from Green et al.:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
RegionGenomic SizeSTAverage
Frequency
in OOA
AMDMANDNQualitative
Assessment
chr1:168,110,001-168,220,000110,0002.96.3%51010OOA
chr1:223,760,001-223,910,000150,0002.86.3%1400OOA
chr4:171,180,001-171,280,000100,0001.95.2%1200OOA
chr5:28,950,001-29,070,000120,0003.83.1%161660OOA
chr6:66,160,001-66,260,000100,0005.728.1%6600OOA
chr9:32,940,001-33,040,000100,0002.84.2%71400OOA
chr10:4,820,001-4,920,000100,0002.69.4%9500OOA
chr10:38,000,001-38,160,000160,0003.58.3%5920OOA
chr10:69,630,001-69,740,000110,0004.219.8%2201OOA
chr15:45,250,001-45,350,000100,0002.51.1%5610OOA
chr17:35,500,001-35,600,000100,0002.9(no tags)n/an/an/an/an/a
chr20:20,030,001-20,140,000110,0005.164.6%00105COS
chr22:30,690,001-30,820,000130,0003.54.2%0252COS
\
\ ST = estimated ratio of OOA/African gene tree depth.
\ Average Frequency in OOA = average (across tag SNPs in the region) of the population frequency in the 48 OOA individuals of the OOA-only allele for each tag SNP.
\ AM = Neandertal has ancestral allele and matches OOA-specific clade.
\ DM = Neandertal has derived allele and matches OOA-specific clade.
\ AN = Neandertal has ancestral allele and does not match OOA-specific clade.
\ DN = Neandertal has derived allele and does not match OOA-specific clade.\

\ \

Display Conventions and Configuration

\

\ A region is colored green if its qualitative assessment is OOA, blue\ if COS, and gray if unknown (no tag SNPs in region).\

\ \

Methods

\

\ Green et al. used 1,263,750 Perlegen Class A SNPs, identified\ in 71 individuals of diverse ancestry (see Hinds et al.), to\ identify 13 candidate gene flow regions (Supplemental Online Materials\ Text 17).\ 24 individuals of European ancestry and 24 individuals of Han Chinese\ ancestry were used to represent the non-African population, and the\ remaining 23 individuals, of African American ancestry, were used to\ represent the African population.\

\

\ From the 1,263,750 Perlegen Class A SNPs, they identified 166 tag\ SNPs that separate (see below) 12 of the haplotype clades in\ non-Africans (OOA) from the cosmopolitan haplotype clades shared between\ Africans and non-Africans (COS) and for which they had data from the\ Neandertals. Of the 13 regions, one had no tag SNPs so could not be\ assessed, two were COS, and 10 were OOA (see final column Table 1).\

\

\ Overall, the Neandertals match the deep clade unique to non-Africans\ (OOA) at 133 of the 166 tag SNPs. They assessed the rate at which\ Neandertal matches each of these clades by further subdividing the 133\ tag SNPs based on their ancestral or derived status in\ Neandertal and whether they matched the OOA-specific clade or not.\ Candidate regions were qualitatively assessed to be OOA matches for \ Neandertal when the proportion of tag SNPs matching the OOA-specific \ clade is much more than 50%.\

\ \

Credits

\

\ This track was produced at UCSC using data generated by\ Ed Green.\

\ \

References

\

\ Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N,\ Li H, Zhai W, Fritz MH et al.\ A Draft Sequence of the Neandertal Genome.\ Science. 2010 7 May;328(5979):710-22.\

\

\ Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, \ Cox DR.\ Whole-genome patterns of common DNA variation in three human populations.\ Science. 2005 Feb 18;307(5712):1072-9.

\ neandertal 1 chromosomes chr1,chr4,chr5,chr6,chr9,chr10,chr15,chr17,chr20,chr22\ exonArrows off\ group neandertal\ itemRgb on\ longLabel Candidate Regions for Gene Flow from Neandertal to Non-African Modern Humans\ noScoreFilter .\ priority 3.8\ shortLabel Cand. Gene Flow\ track ntOoaHaplo\ type bed 9 +\ visibility hide\ netRBestOtoGar1 Bushbaby RBest Net netAlign otoGar1 chainOtoGar1 Bushbaby (Dec. 2006 (Broad/otoGar1)) Reciprocal Best Alignment Net 0 4 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Reciprocal Best Alignment Net\ otherDb otoGar1\ parent rBestNet\ priority 4\ shortLabel $o_Organism RBest Net\ spectrum on\ track netRBestOtoGar1\ type netAlign otoGar1 chainOtoGar1\ visibility hide\ snpArrayAffy250Nsp Affy 250KNsp bed 6 + Affymetrix GeneChip Human Mapping 250K Nsp 0 4 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Affymetrix GeneChip Human Mapping 250K Nsp\ parent snpArray off\ priority 4\ shortLabel Affy 250KNsp\ track snpArrayAffy250Nsp\ type bed 6 +\ encodeAffyChIpHl60SitesBrg1Hr02 Affy Brg1 RA 2h bed 3 . Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 2hrs) Sites 0 4 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 225,0,0\ longLabel Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 2hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 4\ shortLabel Affy Brg1 RA 2h\ subGroups factor=Brg1 time=2h\ track encodeAffyChIpHl60SitesBrg1Hr02\ encodeAffyChIpHl60SignalStrictH3K9K14DHr32 Affy H3K9ac2 32h wig -2.78 3.97 Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 32hrs) Strict Signal 0 4 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 225,0,0\ longLabel Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 32hrs) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 4\ shortLabel Affy H3K9ac2 32h\ subGroups factor=H3K9K14ac2 time=32h\ track encodeAffyChIpHl60SignalStrictH3K9K14DHr32\ encodeAffyChIpHl60SitesStrictH3K9K14DHr32 Affy H3K9ac2 32h bed 3 . Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 32hrs) Strict Sites 0 4 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 225,0,0\ longLabel Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 32hrs) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 4\ shortLabel Affy H3K9ac2 32h\ subGroups factor=H3K9K14ac2 time=32h\ track encodeAffyChIpHl60SitesStrictH3K9K14DHr32\ encodeAffyChIpHl60PvalStrictH3K9K14DHr32 Affy H3K9ac2 32h wig 0 696.62 Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 32hrs) Strict P-Value 0 4 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 225,0,0\ longLabel Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 32hrs) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 4\ shortLabel Affy H3K9ac2 32h\ subGroups factor=H3K9K14ac2 time=32h\ track encodeAffyChIpHl60PvalStrictH3K9K14DHr32\ encodeAffyRnaHl60SitesHr02IntronsProximal Affy In PrxHL60 2h bed 4 . Affy Intronic Proximal HL60 Retinoic 2hr Transfrags 0 4 212 0 44 233 127 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 212,0,44\ longLabel Affy Intronic Proximal HL60 Retinoic 2hr Transfrags\ parent encodeNoncodingTransFrags\ priority 4\ shortLabel Affy In PrxHL60 2h\ subGroups region=intronicProximal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr02IntronsProximal\ encodeAffyRnaHl60SignalHr02 Affy RNA RA 2h wig -1168.00 1686.5 Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 2hrs) Signal 0 4 50 50 180 152 152 217 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,50,180\ longLabel Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 2hrs) Signal\ parent encodeAffyRnaSignal\ priority 4\ shortLabel Affy RNA RA 2h\ track encodeAffyRnaHl60SignalHr02\ encodeAffyRnaHl60SitesHr02 Affy RNA RA 2h bed 3 . Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 2hrs) Sites 0 4 50 50 180 152 152 217 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,50,180\ longLabel Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 2hrs) Sites\ parent encodeAffyRnaTransfrags\ priority 4\ shortLabel Affy RNA RA 2h\ track encodeAffyRnaHl60SitesHr02\ encodeHapMapAlleleFreqYRI Allele Freq YRI bed 6 + HapMap Minor Allele Frequencies Yoruban (YRI) 0 4 0 0 0 127 127 127 1 0 7 chr2,chr4,chr7,chr8,chr9,chr12,chr18, encodeVariation 1 longLabel HapMap Minor Allele Frequencies Yoruban (YRI)\ parent encodeHapMapAlleleFreq\ priority 4\ shortLabel Allele Freq YRI\ track encodeHapMapAlleleFreqYRI\ encodeEgaspUpdAugustusAny August/EST/Ms Upd genePred Augustus + EST/Protein Evidence + Mouse Homology Gene Predictions 0 4 12 100 100 133 177 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,100,100\ longLabel Augustus + EST/Protein Evidence + Mouse Homology Gene Predictions\ parent encodeEgaspUpdate\ priority 4\ shortLabel August/EST/Ms Upd\ track encodeEgaspUpdAugustusAny\ encodeEgaspPartAugustusEst Augustus/EST genePred Augustus + EST/Protein Evidence Gene Predictions 0 4 12 65 165 133 160 210 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,65,165\ longLabel Augustus + EST/Protein Evidence Gene Predictions\ parent encodeEgaspPartial\ priority 4\ shortLabel Augustus/EST\ track encodeEgaspPartAugustusEst\ encodeBuFirstExonKidney BU Kidney bed 12 + Boston University First Exon Activity in Kidney 0 4 0 0 0 127 127 127 0 0 10 chr11,chr13,chr15,chr16,chr19,chr2,chr5,chr7,chr9,chrX, encodeTxLevels 1 longLabel Boston University First Exon Activity in Kidney\ parent encodeBuFirstExon\ priority 4\ shortLabel BU Kidney\ track encodeBuFirstExonKidney\ cccTrendPvalHt CCC Hypertension chromGraph Case Control Consortium hypertension trend -log10 P-value 0 4 0 0 0 127 127 127 0 0 0 phenDis 0 longLabel Case Control Consortium hypertension trend -log10 P-value\ parent caseControl\ priority 4\ shortLabel CCC Hypertension\ track cccTrendPvalHt\ kiddEichlerDiscAbc11 Discordant ABC11 bed 12 HGSV Individual ABC11 (China) Discordant Clone End Alignments 0 4 0 0 0 127 127 127 0 0 0 http://mrhgsv.gs.washington.edu/cgi-bin/hgc?i=$$&c=$S&l=$[&r=$]&db=$D&position=$S:$[-$] varRep 1 longLabel HGSV Individual ABC11 (China) Discordant Clone End Alignments\ parent kiddEichlerDisc\ priority 4\ shortLabel Discordant ABC11\ track kiddEichlerDiscAbc11\ encodeDNDShuge dN/dS 1.5 to inf bed 4 + ENCODE Exons dN/dS 1.5 to inf 0 4 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel ENCODE Exons dN/dS 1.5 to inf\ parent encodeDNDS\ priority 4\ shortLabel dN/dS 1.5 to inf\ track encodeDNDShuge\ encodeAffyEc51BrainFrontalLobeSignal EC51 Sgnl BrainF wig 0 62385 Affy Ext Trans Signal (51-base window) (Brain Frontal Lobe) 0 4 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 0 color 248,0,8\ longLabel Affy Ext Trans Signal (51-base window) (Brain Frontal Lobe)\ parent encodeAffyEcSignal\ priority 4\ shortLabel EC51 Sgnl BrainF\ track encodeAffyEc51BrainFrontalLobeSignal\ encodeAffyEc51BrainFrontalLobeSites EC51 Site BrainF bed 3 . Affy Ext Trans Sites (51-base window) (Brain Frontal Lobe) 0 4 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 1 color 248,0,8\ longLabel Affy Ext Trans Sites (51-base window) (Brain Frontal Lobe)\ parent encodeAffyEcSites\ priority 4\ shortLabel EC51 Site BrainF\ track encodeAffyEc51BrainFrontalLobeSites\ encodeEgaspFullExogean Exogean genePred Exogean Gene Predictions 0 4 100 12 100 177 133 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 100,12,100\ longLabel Exogean Gene Predictions\ parent encodeEgaspFull\ priority 4\ shortLabel Exogean\ track encodeEgaspFullExogean\ decodeFemale Female bigWig 0.0 90.808 deCODE recombination map, female 0 4 200 60 200 227 157 227 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 color 200,60,200\ configurable on\ longLabel deCODE recombination map, female\ parent femaleView\ priority 4\ shortLabel Female\ subGroups view=female\ track decodeFemale\ type bigWig 0.0 90.808\ fox2ClipClusters FOX2 clusters bed 4 . FOX2 binding site clusters 3 4 0 0 0 127 127 127 0 0 0 regulation 1 longLabel FOX2 binding site clusters\ noInherit on\ noScoreFilter .\ parent fox2ClipSeqCompViewclusters\ priority 4\ shortLabel FOX2 clusters\ subGroups view=clusters\ track fox2ClipClusters\ type bed 4 .\ encodeGencodeGenePseudoMar07 Gencode Pseudo genePred Gencode Pseudogenes 0 4 0 91 191 127 173 223 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 0,91,191\ longLabel Gencode Pseudogenes\ parent encodeGencodeGeneMar07\ priority 4\ shortLabel Gencode Pseudo\ track encodeGencodeGenePseudoMar07\ encodeHapMapCovYRI HapMap Cov YRI wig 0.0 100.0 HapMap Resequencing Coverage Yoruban (YRI) 0 4 0 0 0 127 127 127 0 0 7 chr2,chr4,chr7,chr8,chr9,chr12,chr18, encodeVariation 0 longLabel HapMap Resequencing Coverage Yoruban (YRI)\ parent encodeHapMapCov\ priority 4\ shortLabel HapMap Cov YRI\ track encodeHapMapCovYRI\ hapmapSnpsCHD HapMap SNPs CHD bed 6 + HapMap SNPs from the CHD Population (Chinese Ancestry in Metropolitan Denver, CO, US) 0 4 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HapMap SNPs from the CHD Population (Chinese Ancestry in Metropolitan Denver, CO, US)\ parent hapmapSnps\ priority 4\ shortLabel HapMap SNPs CHD\ track hapmapSnpsCHD\ hgdpHzyEurope Hetzgty Europe bedGraph 4 Human Genome Diversity Proj Smoothd Expec Heterozygosity (Europe) 0 4 240 144 0 247 199 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 240,144,0\ longLabel Human Genome Diversity Proj Smoothd Expec Heterozygosity (Europe)\ parent hgdpHzy\ priority 4\ shortLabel Hetzgty Europe\ track hgdpHzyEurope\ encodeRegulomeQualityHuh7 Huh7 bed 5 . Huh7 Quality 0 4 150 50 150 202 152 202 1 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 1 color 150,50,150\ longLabel Huh7 Quality\ parent encodeRegulomeQuality\ priority 4\ shortLabel Huh7\ track encodeRegulomeQualityHuh7\ encodeRegulomeProbHuh7 Huh7 bedGraph 4 Huh7 DNaseI HSs 0 4 150 50 150 202 152 202 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 150,50,150\ longLabel Huh7 DNaseI HSs\ parent encodeRegulomeProb\ priority 4\ shortLabel Huh7\ track encodeRegulomeProbHuh7\ encodeRegulomeBaseHuh7 Huh7 wig 0.0 3.0 Huh7 DNaseI Sensitivity 0 4 150 50 150 202 152 202 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 150,50,150\ longLabel Huh7 DNaseI Sensitivity\ parent encodeRegulomeBase\ priority 4\ shortLabel Huh7\ track encodeRegulomeBaseHuh7\ hgdpIhsSAsia iHS S. Asia bedGraph 4 Human Genome Diversity Project iHS (South Asia) 0 4 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 color 0,0,0\ longLabel Human Genome Diversity Project iHS (South Asia)\ parent hgdpIhs\ priority 4\ shortLabel iHS S. Asia\ track hgdpIhsSAsia\ encodeGencodeIntergenicDistal Intergenic Dist bed 4 . Gencode Intergenic Distal Regions 0 4 0 0 0 127 127 127 0 0 0 encodeAnalysis 1 longLabel Gencode Intergenic Distal Regions\ parent encodeGencodeRegions\ priority 4\ shortLabel Intergenic Dist\ track encodeGencodeIntergenicDistal\ encodeAllIntergenicProximal Intergenic Prox bed 4 Consensus Intergenic Proximal 0 4 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Consensus Intergenic Proximal\ parent encodeWorkshopSelections\ priority 4\ shortLabel Intergenic Prox\ track encodeAllIntergenicProximal\ iscaRetrospectiveLikelyPathogenic ISCA Ret Lik.Path. gvf Internat. Stds. for Cytogen. Arrays Consort. (ISCA) - Retrospective (Likely Pathogenic) 0 4 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Internat. Stds. for Cytogen. Arrays Consort. (ISCA) - Retrospective (Likely Pathogenic)\ parent iscaRetrospectiveComposite\ priority 4\ shortLabel ISCA Ret Lik.Path.\ track iscaRetrospectiveLikelyPathogenic\ hapmapLdJpt LD JPT bed 4 + Linkage Disequilibrium for the Japanese from Tokyo (JPT) 0 4 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 longLabel Linkage Disequilibrium for the Japanese from Tokyo (JPT)\ parent hapmapLd\ priority 4\ shortLabel LD JPT\ track hapmapLdJpt\ encodeUcsdChipHeLaH3H4tmH3K4_p30 LI H3K4me3 +gIF bedGraph 4 Ludwig Institute ChIP-chip: H3K4me3 ab, HeLa cells, 30 min. after gamma interferon 0 4 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: H3K4me3 ab, HeLa cells, 30 min. after gamma interferon\ parent encodeLIChIPgIF\ priority 4\ shortLabel LI H3K4me3 +gIF\ track encodeUcsdChipHeLaH3H4tmH3K4_p30\ encodeUcsdNgHeLaRnap_p30 LI Ng Pol2 +gIF bedGraph 4 Ludwig Institute/UCSD ChIP/Chip Ng: HeLa, Pol2, 30 min after gamma interferon 0 4 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Ludwig Institute/UCSD ChIP/Chip Ng: HeLa, Pol2, 30 min after gamma interferon\ parent encodeUcsdNgGif\ priority 4\ shortLabel LI Ng Pol2 +gIF\ track encodeUcsdNgHeLaRnap_p30\ encodeUcsdChipRnapHct116_f LI Pol2 HCT116 bedGraph 4 Ludwig Institute ChIP-chip: Pol2 8WG16 ab, HCT116 cells 0 4 58 119 40 156 187 147 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 58,119,40\ longLabel Ludwig Institute ChIP-chip: Pol2 8WG16 ab, HCT116 cells\ parent encodeLIChIP\ priority 4\ shortLabel LI Pol2 HCT116\ track encodeUcsdChipRnapHct116_f\ encodeMlaganUnionEl MLAGAN Union bed 5 . MLAGAN PhastCons/BinCons/GERP Union Conserved Elements 0 4 80 70 180 167 162 217 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 80,70,180\ longLabel MLAGAN PhastCons/BinCons/GERP Union Conserved Elements\ parent encodeMlaganElements\ priority 4\ shortLabel MLAGAN Union\ track encodeMlaganUnionEl\ netSyntenyMm8 Mouse Syn Net netAlign mm8 chainMm8 Mouse (Feb. 2006 (NCBI36/mm8)) Syntenic Alignment Net 0 4 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb mm8\ parent syntenicNet\ priority 4\ shortLabel Mouse Syn Net\ spectrum on\ track netSyntenyMm8\ type netAlign mm8 chainMm8\ visibility hide\ encodeAllNcIntersectEl NC Intersect bed 5 . TBA and MLAGAN PhastCons/BinCons/GERP Intersection NonCoding Conserved Elements 0 4 80 180 80 167 217 167 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 80,180,80\ longLabel TBA and MLAGAN PhastCons/BinCons/GERP Intersection NonCoding Conserved Elements\ parent encodeAllElements\ priority 4\ shortLabel NC Intersect\ track encodeAllNcIntersectEl\ ntSeqContigs Neandertal Cntgs bam Neandertal Sequence Contigs Generated by Genotype Caller 0 4 0 0 0 127 127 127 0 0 0

Description

\

\ The Neandertal Sequence Contigs track shows consensus contigs \ called (after duplicate reads from each library were merged) from \ overlapping, non-redundant reads that passed mapping and base\ quality criteria. \

\ \

Display Conventions and Configuration

\

\ The contigs (query sequences) from each of the six samples are contained in separate\ subtracks. Use the checkboxes to select which samples will be\ displayed in the browser. Click and drag the sample name to reorder the\ subtracks. The order in which the subtracks appear in the subtrack list will\ be the order in which they display in the browser.\

\ The query sequences in the SAM/BAM alignment representation\ are normalized to the + strand of the reference genome\ (see the SAM Format Specification\ for more information on the SAM/BAM file format). If a query sequence was\ originally the reverse of what has been stored and aligned, it will have the\ following\ flag:\

\
(0x10) Read is on '-' strand.\

\

\ BAM/SAM alignment representations also have tags. Some tags are predefined and others (those beginning\ with X, Y or Z) are defined by the aligner or data submitter. \ The following tag is associated with this track:\

\

\

\ The item labels and display\ colors of features within this track can be configured through the controls at\ the top of the track description page.\

\ \

\ \

Methods

\

\ All Neandertal sequence reads from each of the six samples were aligned\ to the human (hg16) genome using the short read aligner/mapper \ ANFO.\

\

\ To reduce the effects of sequencing error, the alignments of Neandertal reads to\ the human and chimpanzee reference genomes were used to construct human-based \ and chimpanzee-based consensus "minicontigs". To generate the consensus, \ uniquely placed, overlapping alignments were selected (ANFO MAPQ ≥ 90) and \ these were merged into a single multi-sequence alignment using the common \ reference genome sequence.\

\

\ At each position in the resulting alignment, for each observed base, and for \ each possible original base: i) The likelihood of the observation was \ calculated, ii) the likely length of single-stranded overhangs was estimated, \ and iii) the potential for ancient DNA damage using the Briggs-Johnson model was\ considered (Briggs et al. 2007). \ If most observations in a given position showed a gap, the consensus became a \ gap; otherwise the base with the highest quality score (calculated by dividing \ each likelihood by the total likelihood) was used as the consensus.\

\

\ At the current coverage, heterozygous sites will appear as low quality bases \ with the second base (not shown) having a similar likelihood to the consensus \ base. \ Likewise, heterozygous indels are included only by chance or may show up as \ stretches of low quality bases. \

\ \

Credits

\

\ This track was produced at UCSC using data generated by\ Ed Green.\

\ \

Reference

\

\ Briggs AW, Good JM, Green RE, Krause J, Maricic T, Stenzel U, Lalueza-Fox C, \ Rudan P, Brajkovic D, Kucan Z et al.\ Patterns of damage in genomic DNA sequences from a Neandertal.\ Proc Natl Acad Sci USA. 2007. Sep 11;104(37):14616-21.\

\

\ Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N,\ Li H, Zhai W, Fritz MH et al.\ A Draft Sequence of the Neandertal Genome.\ Science. 2010 7 May;328(5979):710-22.\

\ neandertal 1 aliQualRange 0:254\ allButtonPair on\ baseColorDefault diffBases\ baseColorUseSequence lfExtra\ compositeTrack on\ dimensions dimensionX=sample\ dragAndDrop subTracks\ group neandertal\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Neandertal Sequence Contigs Generated by Genotype Caller\ noColorTag .\ priority 4\ shortLabel Neandertal Cntgs\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 100\ showNames off\ sortOrder sample=+\ subGroup1 sample Sample All=All Feld1=Feld1 Mez1=Mez1 Sid1253=Sid1253 Vi33dot16=Vi33.16 Vi33dot25=Vi33.25 Vi33dot26=Vi33.26\ track ntSeqContigs\ type bam\ visibility hide\ numtSMitochondrionChrPlacement NumtS chr colored bed 9 . Human NumtS on mitochondrion with chromosome placement 0 4 0 0 0 127 127 127 0 0 0

Description and display conventions

\

\ NumtS (Nuclear mitochondrial sequences) are mitochondrial fragments inserted in nuclear genomic sequences. The most credited hypothesis concerning their generation suggests that in presence of mutagenic agents or under stress conditions fragments of mtDNA escape from mitochondria, reach the nucleus and insert into chromosomes during break repair, although NumtS can derive from duplication of genomic fragments. NumtS may be cause of contamination during human mtDNA sequencing and hence frequent false low heteroplasmic evidences have been reported.\ The Bioinformatics group chaired by M.Attimonelli (Bari, Italy) has produced the RHNumtS compilation annotating more than 500 Human NumtS. To allow the scientific community to access to the compilation and to perform genomics comparative analyses inclusive of the NumtS data, the group has designed the Human NumtS tracks below described.\

\ \

\ The NumtS tracks show the High Score Pairs (HSPs) obtained by aligning the mitochondrial reference genome (NC_012920) with the hg18 release of the human genome.\

\
    \
  1. "NumtS (Nuclear mitochondrial Sequences)" Track\

    \ The "NumtS mitochondrial sequences" track shows the mapping of the HSPs returned by BlastN on the nuclear genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the mitochondrial mapping is provided, thus allowing a fast cross among the NumtS genomic contexts.\

    \
  2. \ \
  3. "NumtS assembled" Track\

    \ The "NumtS assembled" track shows items obtained by assembling HSPs annotated in the "NumtS" track fulfilling the following conditions:\

    \

    \ Exceptions for the second condition arise when a long repetitive element is present between two HSPs.\

    \
  4. \ \
  5. "NumtS on mitochondrion" Track\

    \ The "NumtS on mitochondrion" track shows the mapping of the HSPs on the mitochondrial genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the nuclear mapping is provided.\

    \ \
  6. "NumtS on mitochondrion with chromosome placement" Track\

    \ The "NumtS on mitochondrion with chromosome placement" shows the mapping of the HSPs on the mitochondrial genome, but the items are coloured according to the colours assigned to each human chromosome on the UCSC genome browser. No shading is here provided. For every item, a link pointing to the nuclear mapping is provided.\

    \
\ \

Methods

\

\ NumtS mappings were obtained by running Blast2seq (program: BlastN) between each chromosome of of the Human Genome hg18 build and the human mitochondrial reference sequence (rCRS, AC: NC_012920), fixing the e-value threshold to 1e-03. The assembling of the HSPs was performed with spreadsheet interpolation and manual inspection.\

\ \

Verification

\

\ NumtS predicted in silico were validated by carrying out PCR amplification and sequencing on blood-extracted DNA of a healthy individual of European origin. PCR amplification was successful for 275 NumtS and provided amplicons of the expected length. All PCR fragments were sequenced on both strands, and submitted to the EMBL databank.\

\

\ Furthermore, 541 NumtS were validated by merging NumtS nuclear coordinates with HapMap annotations. Our analysis has been carried on eight HapMap individuals (NA18517, NA18507, NA18956, NA19240, NA18555, NA12878, NA19129, NA12156). For each sample, clones with a single best concordant placement (according to the fosmid end-sequence-pair analysis described in Kidd et al., 2008), have been considered. The analysis showed that 541 NumtS (at least 30bp for each one) had been sequenced in such samples.\

\ \

Credits

\

\ These data were provided by Domenico Simone and Marcella Attimonelli at Department of Biochemistry and Molecular Biology "Ernesto Quagliariello" (University of Bari, Italy). Primer designing was carried out by Francesco Calabrese and Giuseppe Mineccia. PCR validation was carried out by Martin Lang, Domenico Simone and Giuseppe Gasparre. Merging with HapMap annotations has been performed by Domenico Simone.\

\ \

References

\

\ Simone D, Calabrese FM, Lang M, Gasparre G, Attimonelli M: Validation and UCSC tracks of the extended RHNumtS compilation (submitted). \

\ \

\ Lascaro D, Castellana S, Gasparre G, Romeo G, Saccone S, Attimonelli M. The RHNumtS compilation: features and bioinformatics approaches to locate and quantify Human NumtS. BMC\ Genomics. 2008 June 3;9:267.\

\ \

\ Kidd JM, Cooper GM, Donahue WF, et al.\ \ Mapping and sequencing of structural variation from eight human genomes.\ Nature, 2008, 453(7191):56-64.\

\ \ \ \ varRep 1 html numtSeq\ itemRgb on\ longLabel Human NumtS on mitochondrion with chromosome placement\ parent numtSeq\ priority 4\ shortLabel NumtS chr colored\ track numtSMitochondrionChrPlacement\ type bed 9 .\ encodeGencodeRaceFragsHeart RACEfrags Heart genePred Gencode RACEfrags from Heart 0 4 225 30 255 240 142 255 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 225,30,255\ longLabel Gencode RACEfrags from Heart\ parent encodeGencodeRaceFrags\ priority 4\ shortLabel RACEfrags Heart\ track encodeGencodeRaceFragsHeart\ encodeSangerChipH3ac SI H3ac GM06990 bedGraph 4 Sanger Institute ChIP/Chip (H3ac ab, GM06990 cells) 0 4 10 10 130 132 132 192 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 10,10,130\ longLabel Sanger Institute ChIP/Chip (H3ac ab, GM06990 cells)\ parent encodeSangerChipH3H4\ priority 4\ shortLabel SI H3ac GM06990\ track encodeSangerChipH3ac\ stanfordChipHepG2SRF Stan HepG2 SRF bedGraph 4 Stanford ChIP-chip (HepG2 cells, SRF ChIP) 0 4 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 0 longLabel Stanford ChIP-chip (HepG2 cells, SRF ChIP)\ parent stanfordChip\ priority 4\ shortLabel Stan HepG2 SRF\ track stanfordChipHepG2SRF\ encodeStanfordChipHepG2SRF Stan HepG2 SRF bedGraph 4 Stanford ChIP-chip (HepG2 cells, SRF ChIP) 0 4 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (HepG2 cells, SRF ChIP)\ parent encodeStanfordChipJohnson\ priority 4\ shortLabel Stan HepG2 SRF\ track encodeStanfordChipHepG2SRF\ encodeStanfordChipJurkatSp3 Stan Jurkat Sp3 bedGraph 4 Stanford ChIP-chip (Jurkat cells, Sp3 ChIP) 0 4 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (Jurkat cells, Sp3 ChIP)\ parent encodeStanfordChip\ priority 4\ shortLabel Stan Jurkat Sp3\ track encodeStanfordChipJurkatSp3\ encodeStanfordMethHT1080 Stan Meth HT1080 bedGraph 4 Stanford Methylation Digest (HT1080 cells) 0 4 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest (HT1080 cells)\ parent encodeStanfordMeth\ priority 4\ shortLabel Stan Meth HT1080\ track encodeStanfordMethHT1080\ encodeStanfordMethSmoothedHT1080 Stan Meth Sc HT1080 bedGraph 4 Stanford Methylation Digest Smoothed Score (HT1080 cells) 0 4 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest Smoothed Score (HT1080 cells)\ parent encodeStanfordMethSmoothed\ priority 4\ shortLabel Stan Meth Sc HT1080\ track encodeStanfordMethSmoothedHT1080\ encodeStanfordPromotersG402 Stan Pro G402 bed 9 + Stanford Promoter Activity (G402 cells) 0 4 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (G402 cells)\ parent encodeStanfordPromoters\ priority 4\ shortLabel Stan Pro G402\ track encodeStanfordPromotersG402\ encodeStanfordChipSmoothedJurkatSp3 Stan Sc Jurkat Sp3 bedGraph 4 Stanford ChIP-chip Smoothed Score (Jurkat cells, Sp3 ChIP) 0 4 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip Smoothed Score (Jurkat cells, Sp3 ChIP)\ parent encodeStanfordChipSmoothed\ priority 4\ shortLabel Stan Sc Jurkat Sp3\ track encodeStanfordChipSmoothedJurkatSp3\ stsMap STS Markers bed 5 + STS Markers on Genetic (blue) and Radiation Hybrid (black) Maps 1 4 0 0 0 128 128 255 0 0 0

Description

\

This track shows locations of Sequence Tagged Site (STS) markers\ along the draft assembly. These markers have been mapped using either\ genetic mapping (Genethon, Marshfield, and deCODE maps), radiation\ hybridization mapping (Stanford, Whitehead RH, and GeneMap99 maps) or\ YAC mapping (the Whitehead YAC map) techniques. Since August 2001,\ this track no longer displays fluorescent in situ hybridization (FISH)\ clones, which are now displayed in a separate track.

\ \

Genetic map markers are shown in blue; radiation hybrid map markers\ are shown in black. When a marker maps to multiple positions in the\ genome, it is shown in a lighter color.

\ \

Methods

\

Positions of STS markers are determined using both full sequences\ and primer information. Full sequences are aligned using blat,\ while isPCR (Jim Kent) and ePCR are used to find\ locations using primer information. Both sets of placements are\ combined to give final positions. In nearly all cases, full sequence\ and primer-based locations are in agreement, but in cases of\ disagreement, full sequence positions are used. Sequence and primer\ information for the markers were obtained from the primary sites for\ each of the maps, and from UniSTS.\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude\ a set of map data within the track. This is helpful when many items\ are shown in the track display, especially when only some are relevant\ to the current task. To use the filter: \

\

When you have finished configuring the filter, click the\ Submit button.

\ \

Credits

\

This track was designed and implemented by Terry Furey. Many\ thanks to the researchers who worked on these maps, and to Greg\ Schuler, Arek Kasprzyk, Wonhee Jang, and Sanja Rogic for helping\ process the data. Additional data on the individual maps can be found\ at the following links:\

\

\ \ map 1 altColor 128,128,255,\ group map\ longLabel STS Markers on Genetic (blue) and Radiation Hybrid (black) Maps\ priority 4\ shortLabel STS Markers\ track stsMap\ type bed 5 +\ visibility dense\ tajdEd Tajima's D ED bedGraph 4 Tajima's D from European Descent 0 4 200 100 0 0 100 200 0 0 0 varRep 0 altColor 0,100,200\ color 200,100,0\ longLabel Tajima's D from European Descent\ parent tajD\ priority 4\ shortLabel Tajima's D ED\ track tajdEd\ encodeTbaUnionEl TBA Union bed 5 . TBA PhastCons/BinCons/GERP Union Conserved Elements 0 4 80 70 180 167 162 217 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 80,70,180\ longLabel TBA PhastCons/BinCons/GERP Union Conserved Elements\ parent encodeTbaElements\ priority 4\ shortLabel TBA Union\ track encodeTbaUnionEl\ hiSeqDepthTop5Pct Top 0.05 Depth bed 3 Top 0.05 of Read Depth Distribution 0 4 139 69 19 197 162 137 0 0 0 map 1 longLabel Top 0.05 of Read Depth Distribution\ parent hiSeqDepth\ priority 4\ shortLabel Top 0.05 Depth\ track hiSeqDepthTop5Pct\ cnpFosmid Tuzun Fosmids bed 4 + Structural Variation identified by Fosmids (Tuzun) 0 4 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Structural Variation identified by Fosmids (Tuzun)\ noInherit on\ parent cnp\ priority 4\ shortLabel Tuzun Fosmids\ track cnpFosmid\ type bed 4 +\ encodePseudogeneUcsc UCSC Retrogenes genePred UCSC Retrogene Predictions 0 4 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 longLabel UCSC Retrogene Predictions\ parent encodePseudogene\ priority 4\ shortLabel UCSC Retrogenes\ track encodePseudogeneUcsc\ encodeUtexChip2091fibE2F4Raw UT E2F4 Fb bedGraph 4 University of Texas, Austin ChIP-chip (E2F4, 2091 fibroblasts) 0 4 120 30 50 187 142 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 120,30,50\ longLabel University of Texas, Austin ChIP-chip (E2F4, 2091 fibroblasts)\ parent encodeUtexChip\ priority 4\ shortLabel UT E2F4 Fb\ subGroups dataType=raw\ track encodeUtexChip2091fibE2F4Raw\ encodeUvaDnaRep6 UVa DNA Rep 6h bed 3 . University of Virginia Temporal Profiling of DNA Replication (6-8 hrs) 0 4 60 75 60 10 130 10 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 longLabel University of Virginia Temporal Profiling of DNA Replication (6-8 hrs)\ parent encodeUvaDnaRep\ priority 4\ shortLabel UVa DNA Rep 6h\ track encodeUvaDnaRep6\ encodeUvaDnaRepPanS UVa DNA Rep PanS bed 3 . University of Virginia Temporal Profiling of DNA Replication (PanS) 0 4 100 50 50 177 152 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 color 100,50,50\ longLabel University of Virginia Temporal Profiling of DNA Replication (PanS)\ parent encodeUvaDnaRepSeg\ priority 4\ shortLabel UVa DNA Rep PanS\ track encodeUvaDnaRepPanS\ encodeUvaDnaRepOriginsTR50Hela UVa Ori-TR50 HeLa bed 3 . University of Virginia DNA Replication Origins, Ori-TR50, HeLa 0 4 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 dataVersion May 2007\ longLabel University of Virginia DNA Replication Origins, Ori-TR50, HeLa\ origAssembly hg17\ parent encodeUvaDnaRepOrigins\ priority 4\ shortLabel UVa Ori-TR50 HeLa\ track encodeUvaDnaRepOriginsTR50Hela\ kiddEichlerValidAbc11 Validated ABC11 bed 9 HGSV Individual ABC11 (China) Validated Sites of Structural Variation 0 4 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HGSV Individual ABC11 (China) Validated Sites of Structural Variation\ parent kiddEichlerValid\ priority 4\ shortLabel Validated ABC11\ track kiddEichlerValidAbc11\ hgdpXpehhSAsia XP-EHH S. Asia bedGraph 4 Human Genome Diversity Project XP-EHH (South Asia) 0 4 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 0,0,0\ longLabel Human Genome Diversity Project XP-EHH (South Asia)\ parent hgdpXpehh\ priority 4\ shortLabel XP-EHH S. Asia\ track hgdpXpehhSAsia\ encodeTransFragsYaleIntergenicDistal Yale Intergen Dist bed 4 Yale Intergenic Distal 0 4 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Yale Intergenic Distal\ parent encodeTransFrags\ priority 4\ shortLabel Yale Intergen Dist\ track encodeTransFragsYaleIntergenicDistal\ encodeYaleChIPSTAT1HeLaBingRenPval Yale LI PVal bedGraph 4 Yale ChIP/Chip (STAT1 ab, Hela cells) LI/UCSD PCR Amplicon, P-Values 0 4 50 50 200 152 152 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,50,200\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells) LI/UCSD PCR Amplicon, P-Values\ parent encodeYaleChIPSTAT1Pval\ priority 4\ shortLabel Yale LI PVal\ track encodeYaleChIPSTAT1HeLaBingRenPval\ encodeYaleChIPSTAT1HeLaBingRenSig Yale LI Sig bedGraph 4 Yale ChIP/Chip (STAT1 ab, Hela cells) LI/UCSD PCR Amplicon, Signal 0 4 112 63 175 224 66 81 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 altColor 224,66,81\ color 112,63,175\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells) LI/UCSD PCR Amplicon, Signal\ parent encodeYaleChIPSTAT1Sig\ priority 4\ shortLabel Yale LI Sig\ track encodeYaleChIPSTAT1HeLaBingRenSig\ encodeYaleChIPSTAT1HeLaBingRenSites Yale LI Sites bed . Yale ChIP/Chip (STAT1 ab, Hela cells) LI/UCSD PCR Amplicon, Binding Sites 0 4 200 50 50 50 50 200 0 0 18 chr1,chr10,chr11,chr13,chr14,chr15,chr16,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 altColor 50,50,200\ color 200,50,50\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells) LI/UCSD PCR Amplicon, Binding Sites\ parent encodeYaleChIPSTAT1Sites\ priority 4\ shortLabel Yale LI Sites\ track encodeYaleChIPSTAT1HeLaBingRenSites\ encodeYaleMASPlacRNANprotTMREVMless36mer36bp Yale Plc NgR RNA bedGraph 4 Yale Placenta RNA Trans Map, MAS Array, Reverse Direction, NimbleGen Protocol 0 4 50 50 200 200 50 50 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22, encodeTxLevels 0 altColor 200,50,50\ color 50,50,200\ longLabel Yale Placenta RNA Trans Map, MAS Array, Reverse Direction, NimbleGen Protocol\ parent encodeYaleMASPlacRNATransMap\ priority 4\ shortLabel Yale Plc NgR RNA\ track encodeYaleMASPlacRNANprotTMREVMless36mer36bp\ encodeYaleMASPlacRNANprotTarsREVMless36mer36bp Yale Plc NgR TAR bed 6 . Yale Placenta RNA TARs, MAS array, Reverse Direction, NimbleGen Protocol 0 4 50 50 200 200 50 50 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22, encodeTxLevels 1 altColor 200,50,50\ color 50,50,200\ longLabel Yale Placenta RNA TARs, MAS array, Reverse Direction, NimbleGen Protocol\ parent encodeYaleMASPlacRNATars\ priority 4\ shortLabel Yale Plc NgR TAR\ track encodeYaleMASPlacRNANprotTarsREVMless36mer36bp\ encodeYaleAffyNeutRNATransMap03 Yale RNA Neu 3 wig -2730 3394 Yale Neutrophil RNA Transcript Map, Sample 3 0 4 50 175 50 152 215 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,175,50\ longLabel Yale Neutrophil RNA Transcript Map, Sample 3\ parent encodeYaleAffyRNATransMap\ priority 4\ shortLabel Yale RNA Neu 3\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATransMap03\ encodeYaleAffyNeutRNATars03 Yale TAR Neu 3 bed 3 . Yale Neutrophil RNA Transcriptionally Active Region, Sample 3 0 4 50 175 50 152 215 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,175,50\ longLabel Yale Neutrophil RNA Transcriptionally Active Region, Sample 3\ parent encodeYaleAffyRNATars\ priority 4\ shortLabel Yale TAR Neu 3\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATars03\ ntSeqReads Neandertal Seq bam Neandertal Sequence Reads 0 4.1 0 0 0 127 127 127 0 0 0

Description

\

\ The Neandertal Seq track shows Neandertal sequence reads mapped to the human\ genome. The Neandertal sequence was generated from six Neandertal fossils found\ in Croatia, Germany, Spain and Russia.\

\ \

Display Conventions and Configuration

\

\ The sequence reads (query sequences) from each of the six samples are contained\ in separate subtracks. Use the checkboxes to select which samples \ will be displayed in the browser. Click and drag the sample name to\ reorder the subtracks. The order in which the subtracks appear in the subtrack\ list will be the order in which they display in the browser.\

\ The query sequences in the SAM/BAM alignment representation\ are normalized to the + strand of the reference genome\ (see the SAM Format Specification\ for more information on the SAM/BAM file format). If a query sequence was\ originally the reverse of what has been stored and aligned, it will have the\ following\ flag:\

\
(0x10) Read is on '-' strand.\

\

\

\ BAM/SAM alignment representations also have tags. Some tags are predefined and others (those beginning\ with X, Y or Z) are defined by the aligner or data submitter. \ The following tag is associated with this track: \

\

\

\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page.\

\ \

\ \

Methods

\

\ The Neandertal sequence was genereated from six Neandertal fossils. Vi33.16\ (54.1% genome coverage), Vi33.25 (46.6%) and Vi33.26 (45.2%) were discovered in\ the Vindija cave in Croatia. Feld1 (0.1%) is from the Neandertal type specimen\ from the Neander Valley in Germany, Sid1253 (0.1%) is from El Sidron cave in\ Asturias, Spain, and Mez1 (2%) is from Mezmaiskaya in the Altai Mountains,\ Russia.

\

\ To increase the fraction of endogenous Neandertal DNA in the sequencing\ libraries, restriction enzymes were used to deplete libraries of microbial DNA.\ This was done by identifying Neandertal sequencing reads whose best alignment\ was to a primate sequence, and selecting enzymes that would differentially cut\ non-primate fragments. These enzymes all contained CpG dinucleotides in their\ recognition sequences, reflecting the particularly low abundance of this\ dinucleotide in mammalian DNA. Sequencing was carried out on the 454 FLX and\ Titanium platforms and the Illumina GA. Neandertal reads were mapped to the\ human genome (hg16) using a custom mapper called\ ANFO. This custom\ alignment program was developed to take into account the characteristics of \ ancient DNA. Following the observation and implementation by Briggs \ et al., ANFO\ uses different substitution matrices for DNA thought to be double-stranded\ versus single-stranded and changes between them if doing so affords a better\ score.\

\ \

Credits

\

\ This track was produced at UCSC using data generated by\ Ed Green.\

\ \

References

\

\ Briggs AW, Good JM, Green RE, Krause J, Maricic T, Stenzel U, Lalueza-Fox C, \ Rudan P, Brajkovic D, Kucan Z et al.\ Targeted retrieval and analysis of five Neandertal mtDNA \ genomes. Science. 2009 Jul 17;325(5938):318-21.

\

\ Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N,\ Li H, Zhai W, Fritz MH et al.\ A Draft Sequence of the Neandertal Genome.\ Science. 2010 7 May;328(5979):710-22.\

\ neandertal 1 aliQualRange 0:254\ allButtonPair on\ bamColorMode gray\ bamGrayMode aliQual\ baseColorDefault diffBases\ baseColorUseSequence lfExtra\ compositeTrack on\ dimensions dimensionX=sample\ dragAndDrop subTracks\ group neandertal\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Neandertal Sequence Reads\ noColorTag .\ priority 4.1\ shortLabel Neandertal Seq\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 100\ showNames off\ sortOrder sample=+\ subGroup1 sample Sample Feld1=Feld1 Mez1=Mez1 Sid1253=Sid1253 Vi33dot16=Vi33.16 Vi33dot25=Vi33.25 Vi33dot26=Vi33.26\ track ntSeqReads\ type bam\ visibility hide\ netRBestMicMur1 Mouse lemur RBest Net netAlign micMur1 chainMicMur1 Mouse lemur (Jun. 2003 (Broad/micMur1)) Reciprocal Best Alignment Net 0 5 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Reciprocal Best Alignment Net\ otherDb micMur1\ parent rBestNet\ priority 5\ shortLabel $o_Organism RBest Net\ spectrum on\ track netRBestMicMur1\ type netAlign micMur1 chainMicMur1\ visibility hide\ snpArrayAffy250Sty Affy 250KSty bed 6 + Affymetrix GeneChip Human Mapping 250K Sty 0 5 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Affymetrix GeneChip Human Mapping 250K Sty\ parent snpArray off\ priority 5\ shortLabel Affy 250KSty\ track snpArrayAffy250Sty\ type bed 6 +\ encodeAffyChIpHl60PvalBrg1Hr08 Affy Brg1 RA 8h wig 0.0 534.54 Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 8hrs) P-Value 0 5 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 225,0,0\ longLabel Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 8hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 5\ shortLabel Affy Brg1 RA 8h\ subGroups factor=Brg1 time=8h\ track encodeAffyChIpHl60PvalBrg1Hr08\ encodeAffyChIpHl60SignalStrictHisH4Hr00 Affy H4Kac4 0h wig -2.78 3.97 Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 0hrs) Strict Signal 0 5 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 150,75,0\ longLabel Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 0hrs) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 5\ shortLabel Affy H4Kac4 0h\ subGroups factor=H4Kac4 time=0h\ track encodeAffyChIpHl60SignalStrictHisH4Hr00\ encodeAffyChIpHl60SitesStrictHisH4Hr00 Affy H4Kac4 0h bed 3 . Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 0hrs) Strict Sites 0 5 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 150,75,0\ longLabel Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 0hrs) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 5\ shortLabel Affy H4Kac4 0h\ subGroups factor=H4Kac4 time=0h\ track encodeAffyChIpHl60SitesStrictHisH4Hr00\ encodeAffyChIpHl60PvalStrictHisH4Hr00 Affy H4Kac4 0h wig 0 696.62 Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 0hrs) Strict P-Value 0 5 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 150,75,0\ longLabel Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 0hrs) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 5\ shortLabel Affy H4Kac4 0h\ subGroups factor=H4Kac4 time=0h\ track encodeAffyChIpHl60PvalStrictHisH4Hr00\ encodeAffyRnaHl60SitesHr08IntronsProximal Affy In Prx HL60 8h bed 4 . Affy Intronic Proximal HL60 Retinoic 8h Transfrags 0 5 200 0 56 227 127 155 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 200,0,56\ longLabel Affy Intronic Proximal HL60 Retinoic 8h Transfrags\ parent encodeNoncodingTransFrags\ priority 5\ shortLabel Affy In Prx HL60 8h\ subGroups region=intronicProximal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr08IntronsProximal\ encodeAffyRnaHl60SignalHr08 Affy RNA RA 8h wig -1168.00 1686.5 Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 8hrs) Signal 0 5 50 50 210 152 152 232 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,50,210\ longLabel Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 8hrs) Signal\ parent encodeAffyRnaSignal\ priority 5\ shortLabel Affy RNA RA 8h\ track encodeAffyRnaHl60SignalHr08\ encodeAffyRnaHl60SitesHr08 Affy RNA RA 8h bed 3 . Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 8hrs) Sites 0 5 50 50 210 152 152 232 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,50,210\ longLabel Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 8hrs) Sites\ parent encodeAffyRnaTransfrags\ priority 5\ shortLabel Affy RNA RA 8h\ track encodeAffyRnaHl60SitesHr08\ encodeEgaspPartAugustusDual Augustus/Mouse genePred Augustus + Mouse Homology Gene Predictions 0 5 12 85 135 133 170 195 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,85,135\ longLabel Augustus + Mouse Homology Gene Predictions\ parent encodeEgaspPartial\ priority 5\ shortLabel Augustus/Mouse\ track encodeEgaspPartAugustusDual\ encodeBuFirstExonLiver BU Liver bed 12 + Boston University First Exon Activity in Liver 0 5 0 0 0 127 127 127 0 0 10 chr11,chr13,chr15,chr16,chr19,chr2,chr5,chr7,chr9,chrX, encodeTxLevels 1 longLabel Boston University First Exon Activity in Liver\ parent encodeBuFirstExon\ priority 5\ shortLabel BU Liver\ track encodeBuFirstExonLiver\ cccTrendPvalRa CCC Rheum Arth chromGraph Case Control Consortium rheumatoid arthritis trend -log10 P-value 0 5 0 0 0 127 127 127 0 0 0 phenDis 0 longLabel Case Control Consortium rheumatoid arthritis trend -log10 P-value\ parent caseControl\ priority 5\ shortLabel CCC Rheum Arth\ track cccTrendPvalRa\ bamSLDenisova Denisova bam Denisova Sequence Reads 0 5 0 0 0 127 127 127 0 0 0 \ \ \ \
\ Denisova cave\
\ Denisova cave entrance in the Altai Mountains\ \ of Siberia, Russia where the bones were found from which\ \ DNA was sequenced\ \ (Copyright (C) 2010, Johannes Krause)\ \
\ \

Description

\

\ The Denisova track shows Denisova sequence reads mapped to the\ human genome. The Denisova sequence was generated from a phalanx bone\ excavated from Denisova Cave in the Altai Mountains in southern\ Siberia.\

\ \

Methods

\

\ Denisova sequence libraries were prepared by treating DNA extracted\ from a single phalanx bone with two enzymes: uracil-DNA-glycosylase,\ which removes uracil residues from DNA to leave abasic sites, and\ endonuclease VIII, which cuts DNA at the 59 and 39 sides of abasic\ sites. Subsequent incubation with T4 polynucleotide kinase and T4 DNA\ polymerase was used to generate phosphorylated blunt ends that are\ amenable to adaptor ligation. Because the great majority of uracil\ residues occur close to the ends of ancient DNA molecules, this\ procedure leads to only a moderate reduction in average length of the\ molecules in the library, but a several-fold reduction in\ uracil-derived nucleotide misincorporation. Reads were aligned\ to human sequence July 2003 (NCBI34/hg16) using the\ Burrows-Wheeler Aligner.\

\

\ Download the \ Denisova track data sets from the Genome Browser downloads server.\

\ \

References

\

\ Briggs A.W., Stenzel U., Meyer M., Krause J., Kircher M., Pääbo S. \ Removal of deaminated cytosines and detection of in vivo\ methylation in ancient DNA.\ Nucleic Acids Res. 2009 Dec 22:38(6) e87.\

\

\ Reich D., Green R.E., Kircher M., Krause J., Patterson N., Durand E.Y., Viola B., \ Briggs A.W., Stenzel U., Johnson P.L.F. et al.\ \ Genetic history of an archaic hominin group from Denisova Cave in Siberia.\ Nature. 2010 Dec 23;468:1053-1060.\

\ \

Credits

\

\ This track was produced at UCSC using data generated by the Max Planck\ Institute for Evolutionary Anthropology.\

\ denisova 1 aliQualRange 0:60\ bamColorMode gray\ bamGrayMode aliQual\ baseColorDefault diffBases\ baseColorUseSequence lfExtra\ group denisova\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Denisova Sequence Reads\ maxWindowToDraw 1000000\ noColorTag .\ pairEndsByName on\ priority 5\ shortLabel Denisova\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 100\ showNames off\ track bamSLDenisova\ type bam\ visibility hide\ kiddEichlerDiscAbc10 Discordant ABC10 bed 12 HGSV Individual ABC10 (Yoruba) Discordant Clone End Alignments 0 5 0 0 0 127 127 127 0 0 0 http://mrhgsv.gs.washington.edu/cgi-bin/hgc?i=$$&c=$S&l=$[&r=$]&db=$D&position=$S:$[-$] varRep 1 longLabel HGSV Individual ABC10 (Yoruba) Discordant Clone End Alignments\ parent kiddEichlerDisc\ priority 5\ shortLabel Discordant ABC10\ track kiddEichlerDiscAbc10\ encodeAffyEc1BrainHippocampusSignal EC1 Sgnl Hippoc wig 0 62385 Affy Ext Trans Signal (1-base window) (Brain Hippocampus) 0 5 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 0 color 248,0,8\ longLabel Affy Ext Trans Signal (1-base window) (Brain Hippocampus)\ parent encodeAffyEcSignal\ priority 5\ shortLabel EC1 Sgnl Hippoc\ track encodeAffyEc1BrainHippocampusSignal\ encodeAffyEc1BrainHippocampusSites EC1 Sites Hippoc bed 3 . Affy Ext Trans Sites (1-base window) (Brain Hippocampus) 0 5 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 1 color 248,0,8\ longLabel Affy Ext Trans Sites (1-base window) (Brain Hippocampus)\ parent encodeAffyEcSites\ priority 5\ shortLabel EC1 Sites Hippoc\ track encodeAffyEc1BrainHippocampusSites\ encodeEgaspUpdExogean Exogean Update genePred Exogean Gene Predictions 0 5 100 12 100 177 133 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 100,12,100\ longLabel Exogean Gene Predictions\ parent encodeEgaspUpdate\ priority 5\ shortLabel Exogean Update\ track encodeEgaspUpdExogean\ encodeEgaspFullExonhunter ExonHunter genePred ExonHunter Gene Predictions 0 5 12 20 150 133 137 202 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,20,150\ longLabel ExonHunter Gene Predictions\ parent encodeEgaspFull\ priority 5\ shortLabel ExonHunter\ track encodeEgaspFullExonhunter\ decodeFemaleCarrier Female Carrier bigWig 0.0 77.704 deCODE recombination map, female carrier 0 5 187 102 255 221 178 255 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 color 187,102,255\ configurable on\ longLabel deCODE recombination map, female carrier\ parent femaleView\ priority 5\ shortLabel Female Carrier\ subGroups view=female\ track decodeFemaleCarrier\ type bigWig 0.0 77.704\ encodeGencodeGenePolyAMar07 Gencode PolyA bed 9 . Gencode polyA Features 0 5 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 itemRgb on\ longLabel Gencode polyA Features\ noInherit on\ parent encodeGencodeGeneMar07\ priority 5\ shortLabel Gencode PolyA\ track encodeGencodeGenePolyAMar07\ type bed 9 .\ hapmapSnpsGIH HapMap SNPs GIH bed 6 + HapMap SNPs from the GIH Population (Gujarati Indians in Houston, TX, US) 0 5 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HapMap SNPs from the GIH Population (Gujarati Indians in Houston, TX, US)\ parent hapmapSnps\ priority 5\ shortLabel HapMap SNPs GIH\ track hapmapSnpsGIH\ encodeRegulomeQualityHepG2 HepG2 bed 5 . HepG2 Quality 0 5 180 50 120 217 152 187 1 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 1 color 180,50,120\ longLabel HepG2 Quality\ parent encodeRegulomeQuality\ priority 5\ shortLabel HepG2\ track encodeRegulomeQualityHepG2\ encodeRegulomeProbHepG2 HepG2 bedGraph 4 HepG2 DNaseI HSs 0 5 180 50 120 217 152 187 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 180,50,120\ longLabel HepG2 DNaseI HSs\ parent encodeRegulomeProb\ priority 5\ shortLabel HepG2\ track encodeRegulomeProbHepG2\ encodeRegulomeBaseHepG2 HepG2 wig 0.0 3.0 HepG2 DNaseI Sensitivity 0 5 180 50 120 217 152 187 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 180,50,120\ longLabel HepG2 DNaseI Sensitivity\ parent encodeRegulomeBase\ priority 5\ shortLabel HepG2\ track encodeRegulomeBaseHepG2\ hgdpHzySAsia Hetzgty S. Asia bedGraph 4 Human Genome Diversity Proj Smoothd Expec Heterozygosity (S. Asia) 0 5 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 0,0,0\ longLabel Human Genome Diversity Proj Smoothd Expec Heterozygosity (S. Asia)\ parent hgdpHzy\ priority 5\ shortLabel Hetzgty S. Asia\ track hgdpHzySAsia\ hgdpIhsEAsia iHS E. Asia bedGraph 4 Human Genome Diversity Project iHS (East Asia) 0 5 0 200 0 127 227 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 color 0,200,0\ longLabel Human Genome Diversity Project iHS (East Asia)\ parent hgdpIhs\ priority 5\ shortLabel iHS E. Asia\ track hgdpIhsEAsia\ encodeAllIntergenicDistal Intergenic Dist bed 4 Consensus Intergenic Distal 0 5 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Consensus Intergenic Distal\ parent encodeWorkshopSelections\ priority 5\ shortLabel Intergenic Dist\ track encodeAllIntergenicDistal\ iscaRetrospectiveUncertain ISCA Ret Uncert. gvf Internat. Stds. for Cytogen. Arrays Consort. (ISCA) - Retrospective (Uncertain) 0 5 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Internat. Stds. for Cytogen. Arrays Consort. (ISCA) - Retrospective (Uncertain)\ parent iscaRetrospectiveComposite\ priority 5\ shortLabel ISCA Ret Uncert.\ track iscaRetrospectiveUncertain\ encodeUcsdChipHeLaH3H4acH3_p0 LI H3ac -gIF bedGraph 4 Ludwig Institute ChIP-chip: H3ac ab, HeLa cells, no gamma interferon 0 5 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: H3ac ab, HeLa cells, no gamma interferon\ parent encodeLIChIPgIF\ priority 5\ shortLabel LI H3ac -gIF\ track encodeUcsdChipHeLaH3H4acH3_p0\ encodeUcsdChipTaf250Hela_f LI TAF1 HeLa bedGraph 4 Ludwig Institute ChIP-chip: TAF1 ab, HeLa cells 0 5 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Ludwig Institute ChIP-chip: TAF1 ab, HeLa cells\ parent encodeLIChIP\ priority 5\ shortLabel LI TAF1 HeLa\ track encodeUcsdChipTaf250Hela_f\ delMccarroll McCarroll Dels bed 4 . Deletions from Genotype Analysis (McCarroll) 0 5 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Deletions from Genotype Analysis (McCarroll)\ noInherit on\ parent cnp\ priority 5\ shortLabel McCarroll Dels\ track delMccarroll\ type bed 4 .\ encodeMlaganNcUnionEl MLAGAN NC Union bed 5 . MLAGAN PhastCons/BinCons/GERP Union NonCoding Conserved Elements 0 5 80 105 145 167 180 200 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 80,105,145\ longLabel MLAGAN PhastCons/BinCons/GERP Union NonCoding Conserved Elements\ parent encodeMlaganElements\ priority 5\ shortLabel MLAGAN NC Union\ track encodeMlaganNcUnionEl\ bamAllNumtSSorted NumtS SNPs bam Human NumtS on mitochondrion SNPs 3 5 0 0 0 127 127 127 0 0 1 chrM, varRep 1 aliQualRange 0:255\ bamColorMode strand\ bamGrayMode aliQual\ bamSkipPrintQualScore .\ baseColorDefault diffBases\ baseColorUseSequence lfExtra\ chromosomes chrM\ configurable on\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Human NumtS on mitochondrion SNPs\ maxWindowToDraw 1000000\ noColorTag .\ pairEndsByName on\ parent numtSeq\ priority 5\ shortLabel NumtS SNPs\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 100\ showNames on\ track bamAllNumtSSorted\ type bam\ visibility pack\ encodeGencodeOtherESTs Other ESTs bed 4 . Gencode Other ESTs 0 5 0 0 0 127 127 127 0 0 0 encodeAnalysis 1 longLabel Gencode Other ESTs\ parent encodeGencodeRegions\ priority 5\ shortLabel Other ESTs\ track encodeGencodeOtherESTs\ hapmapLdPhChbJpt Ph JPT+CHB ld2 LD for the Han Chinese + Japanese from Tokyo (JPT+CHB) from phased genotypes 0 5 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 longLabel LD for the Han Chinese + Japanese from Tokyo (JPT+CHB) from phased genotypes\ parent hapmapLdPh\ priority 5\ shortLabel Ph JPT+CHB\ track hapmapLdPhChbJpt\ encodeGencodeRaceFragsKidney RACEfrags Kidney genePred Gencode RACEfrags from Kidney 0 5 212 0 44 233 127 149 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 212,0,44\ longLabel Gencode RACEfrags from Kidney\ parent encodeGencodeRaceFrags\ priority 5\ shortLabel RACEfrags Kidney\ track encodeGencodeRaceFragsKidney\ netSyntenyRn4 Rat Syn Net netAlign rn4 chainRn4 Rat (Nov. 2004 (Baylor 3.4/rn4)) Syntenic Alignment Net 0 5 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb rn4\ parent syntenicNet\ priority 5\ shortLabel Rat Syn Net\ spectrum on\ track netSyntenyRn4\ type netAlign rn4 chainRn4\ visibility hide\ encodeSangerChipH4ac SI H4ac GM06990 bedGraph 4 Sanger Institute ChIP/Chip (H4ac ab, GM06990 cells) 0 5 10 10 130 132 132 192 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 10,10,130\ longLabel Sanger Institute ChIP/Chip (H4ac ab, GM06990 cells)\ parent encodeSangerChipH3H4\ priority 5\ shortLabel SI H4ac GM06990\ track encodeSangerChipH4ac\ tajdSnpXd SNPs XD bed 4 . SNPs from Chinese Descent 0 5 200 100 0 0 100 200 0 0 0 varRep 1 altColor 0,100,200\ color 200,100,0\ longLabel SNPs from Chinese Descent\ parent tajdSnp\ priority 5\ shortLabel SNPs XD\ track tajdSnpXd\ stanfordChipHeLaGABP Stan HeLa GABP bedGraph 4 Stanford ChIP-chip (HeLa cells, GABP ChIP) 0 5 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 0 longLabel Stanford ChIP-chip (HeLa cells, GABP ChIP)\ parent stanfordChip\ priority 5\ shortLabel Stan HeLa GABP\ track stanfordChipHeLaGABP\ encodeStanfordChipHeLaGABP Stan HeLa GABP bedGraph 4 Stanford ChIP-chip (HeLa cells, GABP ChIP) 0 5 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (HeLa cells, GABP ChIP)\ parent encodeStanfordChipJohnson\ priority 5\ shortLabel Stan HeLa GABP\ track encodeStanfordChipHeLaGABP\ encodeStanfordChipK562Sp1 Stan K562 Sp1 bedGraph 4 Stanford ChIP-chip (K562 cells, Sp1 ChIP) 0 5 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (K562 cells, Sp1 ChIP)\ parent encodeStanfordChip\ priority 5\ shortLabel Stan K562 Sp1\ track encodeStanfordChipK562Sp1\ encodeStanfordMethHepG2 Stan Meth HepG2 bedGraph 4 Stanford Methylation Digest (HepG2 cells) 0 5 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest (HepG2 cells)\ parent encodeStanfordMeth\ priority 5\ shortLabel Stan Meth HepG2\ track encodeStanfordMethHepG2\ encodeStanfordMethSmoothedHepG2 Stan Meth Sc HepG2 bedGraph 4 Stanford Methylation Digest Smoothed Score (HepG2 cells) 0 5 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest Smoothed Score (HepG2 cells)\ parent encodeStanfordMethSmoothed\ priority 5\ shortLabel Stan Meth Sc HepG2\ track encodeStanfordMethSmoothedHepG2\ encodeStanfordPromotersHCT116 Stan Pro HCT116 bed 9 + Stanford Promoter Activity (HCT116 cells) 0 5 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (HCT116 cells)\ parent encodeStanfordPromoters\ priority 5\ shortLabel Stan Pro HCT116\ track encodeStanfordPromotersHCT116\ encodeStanfordChipSmoothedK562Sp1 Stan Sc K562 Sp1 bedGraph 4 Stanford ChIP-chip Smoothed Score (K562 cells, Sp1 ChIP) 0 5 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip Smoothed Score (K562 cells, Sp1 ChIP)\ parent encodeStanfordChipSmoothed\ priority 5\ shortLabel Stan Sc K562 Sp1\ track encodeStanfordChipSmoothedK562Sp1\ stsMapMouse STS Markers bed 5 + STS Markers on Genetic Maps 1 5 0 0 0 128 128 255 0 0 0

This track shows locations of Sequence-Tagged Site (STS) markers along \ the mouse draft assembly. These markers appear on the Mouse Genome Informatics (MGI) consensus mouse genetic \ map. Information about the genetic map and STS marker primer sequences are \ provided by the Mouse Genome Informatics database group at The Jackson \ Laboratory.

\ map 1 altColor 128,128,255,\ group map\ longLabel STS Markers on Genetic Maps\ priority 5\ shortLabel STS Markers\ track stsMapMouse\ type bed 5 +\ visibility dense\ encodeTbaNcUnionEl TBA NC Union bed 5 . TBA PhastCons/BinCons/GERP Union NonCoding Conserved Elements 0 5 80 105 145 167 180 200 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 80,105,145\ longLabel TBA PhastCons/BinCons/GERP Union NonCoding Conserved Elements\ parent encodeTbaElements\ priority 5\ shortLabel TBA NC Union\ track encodeTbaNcUnionEl\ hiSeqDepthTop10Pct Top 0.10 Depth bed 3 Top 0.10 of Read Depth Distribution 0 5 139 69 19 197 162 137 0 0 0 map 1 longLabel Top 0.10 of Read Depth Distribution\ parent hiSeqDepth\ priority 5\ shortLabel Top 0.10 Depth\ track hiSeqDepthTop10Pct\ encodePseudogeneUcsc2 UCSC Pseudogenes genePred UCSC Pseudogene Predictions 0 5 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 longLabel UCSC Pseudogene Predictions\ parent encodePseudogene\ priority 5\ shortLabel UCSC Pseudogenes\ track encodePseudogeneUcsc2\ encodeUtexChipHeLaMycPeaks UT Myc HeLa Pk bedGraph 4 University of Texas, Austin ChIP-chip (c-Myc, HeLa) Peaks 0 5 50 0 0 152 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,0,0\ longLabel University of Texas, Austin ChIP-chip (c-Myc, HeLa) Peaks\ parent encodeUtexChip\ priority 5\ shortLabel UT Myc HeLa Pk\ subGroups dataType=peaks\ track encodeUtexChipHeLaMycPeaks\ encodeUvaDnaRep8 UVa DNA Rep 8h bed 3 . University of Virginia Temporal Profiling of DNA Replication (8-10 hrs) 0 5 60 75 60 10 130 10 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 1 longLabel University of Virginia Temporal Profiling of DNA Replication (8-10 hrs)\ parent encodeUvaDnaRep\ priority 5\ shortLabel UVa DNA Rep 8h\ track encodeUvaDnaRep8\ kiddEichlerValidAbc10 Validated ABC10 bed 9 HGSV Individual ABC10 (Yoruba) Validated Sites of Structural Variation 0 5 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HGSV Individual ABC10 (Yoruba) Validated Sites of Structural Variation\ parent kiddEichlerValid\ priority 5\ shortLabel Validated ABC10\ track kiddEichlerValidAbc10\ hgdpXpehhEAsia XP-EHH E. Asia bedGraph 4 Human Genome Diversity Project XP-EHH (East Asia) 0 5 0 200 0 127 227 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 0,200,0\ longLabel Human Genome Diversity Project XP-EHH (East Asia)\ parent hgdpXpehh\ priority 5\ shortLabel XP-EHH E. Asia\ track hgdpXpehhEAsia\ encodeTransFragsYaleIntergenicProximal Yale Intergen Prox bed 4 Yale Intergenic Proximal 0 5 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Yale Intergenic Proximal\ parent encodeTransFrags\ priority 5\ shortLabel Yale Intergen Prox\ track encodeTransFragsYaleIntergenicProximal\ encodeYaleMASPlacRNATransMapFwdMless36mer36bp Yale Plc BtF RNA bedGraph 4 Yale Placenta RNA TransMap, MAS array, Forward Direction, Bertone Protocol 0 5 200 50 50 50 50 200 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22, encodeTxLevels 0 altColor 50,50,200\ color 200,50,50\ longLabel Yale Placenta RNA TransMap, MAS array, Forward Direction, Bertone Protocol\ parent encodeYaleMASPlacRNATransMap\ priority 5\ shortLabel Yale Plc BtF RNA\ track encodeYaleMASPlacRNATransMapFwdMless36mer36bp\ encodeYaleMASPlacRNATarsFwdMless36mer36bp Yale Plc BtF TAR bed 6 . Yale Placenta RNA TARs, MAS array, Forward Direction, Bertone Protocol 0 5 200 50 50 50 50 200 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22, encodeTxLevels 1 altColor 50,50,200\ color 200,50,50\ longLabel Yale Placenta RNA TARs, MAS array, Forward Direction, Bertone Protocol\ parent encodeYaleMASPlacRNATars\ priority 5\ shortLabel Yale Plc BtF TAR\ track encodeYaleMASPlacRNATarsFwdMless36mer36bp\ encodeYaleAffyNeutRNATransMap04 Yale RNA Neu 4 wig -2730 3394 Yale Neutrophil RNA Transcript Map, Sample 4 0 5 50 160 50 152 207 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,160,50\ longLabel Yale Neutrophil RNA Transcript Map, Sample 4\ parent encodeYaleAffyRNATransMap\ priority 5\ shortLabel Yale RNA Neu 4\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATransMap04\ encodeYaleAffyNeutRNATars04 Yale TAR Neu 4 bed 3 . Yale Neutrophil RNA Transcriptionally Active Region, Sample 4 0 5 50 160 50 152 207 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,160,50\ longLabel Yale Neutrophil RNA Transcriptionally Active Region, Sample 4\ parent encodeYaleAffyRNATars\ priority 5\ shortLabel Yale TAR Neu 4\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATars04\ netRBestTupBel1 Tree shrew RBest Net netAlign tupBel1 chainTupBel1 Tree shrew (Dec. 2006 (Broad/tupBel1)) Reciprocal Best Alignment Net 0 6 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Reciprocal Best Alignment Net\ otherDb tupBel1\ parent rBestNet\ priority 6\ shortLabel $o_Organism RBest Net\ spectrum on\ track netRBestTupBel1\ type netAlign tupBel1 chainTupBel1\ visibility hide\ encodeAffyChIpHl60SitesBrg1Hr08 Affy Brg1 RA 8h bed 3 . Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 8hrs) Sites 0 6 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 225,0,0\ longLabel Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 8hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 6\ shortLabel Affy Brg1 RA 8h\ subGroups factor=Brg1 time=8h\ track encodeAffyChIpHl60SitesBrg1Hr08\ encodeAffyChIpHl60SignalStrictHisH4Hr02 Affy H4Kac4 2h wig -2.78 3.97 Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 2hrs) Strict Signal 0 6 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 150,75,0\ longLabel Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 2hrs) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 6\ shortLabel Affy H4Kac4 2h\ subGroups factor=H4Kac4 time=2h\ track encodeAffyChIpHl60SignalStrictHisH4Hr02\ encodeAffyChIpHl60SitesStrictHisH4Hr02 Affy H4Kac4 2h bed 3 . Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 2hrs) Strict Sites 0 6 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 150,75,0\ longLabel Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 2hrs) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 6\ shortLabel Affy H4Kac4 2h\ subGroups factor=H4Kac4 time=2h\ track encodeAffyChIpHl60SitesStrictHisH4Hr02\ encodeAffyChIpHl60PvalStrictHisH4Hr02 Affy H4Kac4 2h wig 0 696.62 Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 2hrs) Strict P-Value 0 6 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 150,75,0\ longLabel Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 2hrs) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 6\ shortLabel Affy H4Kac4 2h\ subGroups factor=H4Kac4 time=2h\ track encodeAffyChIpHl60PvalStrictHisH4Hr02\ encodeAffyRnaHl60SitesHr32IntronsProximal Affy In Prx HL60 32h bed 4 . Affy Intronic Proximal HL60 Retinoic 32h Transfrags 0 6 188 0 68 221 127 161 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 188,0,68\ longLabel Affy Intronic Proximal HL60 Retinoic 32h Transfrags\ parent encodeNoncodingTransFrags\ priority 6\ shortLabel Affy In Prx HL60 32h\ subGroups region=intronicProximal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr32IntronsProximal\ encodeAffyRnaHl60SignalHr32 Affy RNA RA 32h wig -1168.00 1686.5 Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 32hrs) Signal 0 6 50 50 240 152 152 247 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,50,240\ longLabel Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 32hrs) Signal\ parent encodeAffyRnaSignal\ priority 6\ shortLabel Affy RNA RA 32h\ track encodeAffyRnaHl60SignalHr32\ encodeAffyRnaHl60SitesHr32 Affy RNA RA 32h bed 3 . Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 32hrs) Sites 0 6 50 50 240 152 152 247 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,50,240\ longLabel Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 32hrs) Sites\ parent encodeAffyRnaTransfrags\ priority 6\ shortLabel Affy RNA RA 32h\ track encodeAffyRnaHl60SitesHr32\ encodeEgaspPartAugustusAny Augustus/EST/Mouse genePred Augustus + EST/Protein Evidence + Mouse Homology Gene Predictions 0 6 12 100 100 133 177 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,100,100\ longLabel Augustus + EST/Protein Evidence + Mouse Homology Gene Predictions\ parent encodeEgaspPartial\ priority 6\ shortLabel Augustus/EST/Mouse\ track encodeEgaspPartAugustusAny\ encodeBuFirstExonLung BU Lung bed 12 + Boston University First Exon Activity in Lung 0 6 0 0 0 127 127 127 0 0 10 chr11,chr13,chr15,chr16,chr19,chr2,chr5,chr7,chr9,chrX, encodeTxLevels 1 longLabel Boston University First Exon Activity in Lung\ parent encodeBuFirstExon\ priority 6\ shortLabel BU Lung\ track encodeBuFirstExonLung\ cccTrendPvalT1d CCC T1 Diabetes chromGraph Case Control Consortium type 1 diabetes trend -log10 P-value 0 6 0 0 0 127 127 127 0 0 0 phenDis 0 longLabel Case Control Consortium type 1 diabetes trend -log10 P-value\ parent caseControl\ priority 6\ shortLabel CCC T1 Diabetes\ track cccTrendPvalT1d\ delConrad Conrad Dels bed 8 . Deletions from Genotype Analysis (Conrad) 0 6 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Deletions from Genotype Analysis (Conrad)\ noInherit on\ parent cnp\ priority 6\ shortLabel Conrad Dels\ track delConrad\ type bed 8 .\ kiddEichlerDiscAbc9 Discordant ABC9 bed 12 HGSV Individual ABC9 (Japan) Discordant Clone End Alignments 0 6 0 0 0 127 127 127 0 0 0 http://mrhgsv.gs.washington.edu/cgi-bin/hgc?i=$$&c=$S&l=$[&r=$]&db=$D&position=$S:$[-$] varRep 1 longLabel HGSV Individual ABC9 (Japan) Discordant Clone End Alignments\ parent kiddEichlerDisc\ priority 6\ shortLabel Discordant ABC9\ track kiddEichlerDiscAbc9\ netSyntenyCanFam2 Dog Syn Net netAlign canFam2 chainCanFam2 Dog (May 2005 (Broad/canFam2)) Syntenic Alignment Net 0 6 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb canFam2\ parent syntenicNet\ priority 6\ shortLabel Dog Syn Net\ spectrum on\ track netSyntenyCanFam2\ type netAlign canFam2 chainCanFam2\ visibility hide\ encodeAffyEc51BrainHippocampusSignal EC51 Sgnl Hippoc wig 0 62385 Affy Ext Trans Signal (51-base window) (Brain Hippocampus) 0 6 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 0 color 248,0,8\ longLabel Affy Ext Trans Signal (51-base window) (Brain Hippocampus)\ parent encodeAffyEcSignal\ priority 6\ shortLabel EC51 Sgnl Hippoc\ track encodeAffyEc51BrainHippocampusSignal\ encodeAffyEc51BrainHippocampusSites EC51 Site Hippoc bed 3 . Affy Ext Trans Sites (51-base window) (Brain Hippocampus) 0 6 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 1 color 248,0,8\ longLabel Affy Ext Trans Sites (51-base window) (Brain Hippocampus)\ parent encodeAffyEcSites\ priority 6\ shortLabel EC51 Site Hippoc\ track encodeAffyEc51BrainHippocampusSites\ encodeGencodeExonic Exonic bed 4 . Gencode Exonic Regions 0 6 0 0 0 127 127 127 0 0 0 encodeAnalysis 1 longLabel Gencode Exonic Regions\ parent encodeGencodeRegions\ priority 6\ shortLabel Exonic\ track encodeGencodeExonic\ decodeFemaleNonCarrier Female Non-carrier bigWig 0.0 93.929 deCODE recombination map, female non-carrier 0 6 148 128 200 201 191 227 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 color 148,128,200\ configurable on\ longLabel deCODE recombination map, female non-carrier\ parent femaleView\ priority 6\ shortLabel Female Non-carrier\ subGroups view=female\ track decodeFemaleNonCarrier\ type bigWig 0.0 93.929\ encodeEgaspFullFgenesh Fgenesh++ genePred Fgenesh++ Gene Predictions 0 6 22 150 20 138 202 137 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 22,150,20\ longLabel Fgenesh++ Gene Predictions\ parent encodeEgaspFull\ priority 6\ shortLabel Fgenesh++\ track encodeEgaspFullFgenesh\ encodeEgaspUpdFgenesh FGenesh++ Upd genePred Fgenesh++ Gene Predictions 0 6 22 150 20 138 202 137 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 22,150,20\ longLabel Fgenesh++ Gene Predictions\ parent encodeEgaspUpdate\ priority 6\ shortLabel FGenesh++ Upd\ track encodeEgaspUpdFgenesh\ fishClones FISH Clones bed 5 + Clones Placed on Cytogenetic Map Using FISH 0 6 0 150 0 127 202 127 0 0 0

Description

\

\ This track shows the location of fluorescent in situ hybridization \ (FISH)-mapped clones along the draft assembly sequence. The locations of \ these clones were contributed as a part of the BAC Consortium paper \ Cheung, V.G. et al. (2001) in the References section below.

\

\ More information about the BAC clones, including how they may be obtained, \ can be found at the \ Human BAC Resource and the \ Clone Registry web sites hosted by \ NCBI.\ To view Clone Registry information for a clone, click on the clone name at \ the top of the details page for that item.

\ \

Using the Filter

\

\ This track has a filter that can be used to change the color or \ include/exclude the display of a dataset from an individual lab. This is \ helpful when many items are shown in the track display, especially when only \ some are relevant to the current task. The filter is located at the top of \ the track description page, which is accessed via the small button to the \ left of the track's graphical display or through the link on the track's \ control menu. To use the filter:\

    \
  1. In the pulldown menu, select the lab whose data you would like to \ highlight or exclude in the display. \
  2. Choose the color or display characteristic that will be used to highlight \ or include/exclude the filtered items. If "exclude" is chosen, the \ browser will not display clones from the lab selected in the pulldown list. \ If "include" is selected, the browser will display clones only \ from the selected lab.\

\

\ When you have finished configuring the filter, click the Submit \ button.

\ \

Credits

\

\ We would like to thank all of the labs that have contributed to this resource:\

\ \

References

\

\ Cheung VG, Nowak N, Jang W, Kirsch IR, Zhao S, Chen X-N, Furey TS, Kim U-J, Kuo W-L, Olivier M \ et al. \ Integration of cytogenetic landmarks into the draft sequence of \ the human genome. Nature. 2001 Feb 15;409(6822):953-958.

\ map 1 color 0,150,0,\ group map\ longLabel Clones Placed on Cytogenetic Map Using FISH\ priority 6\ shortLabel FISH Clones\ track fishClones\ type bed 5 +\ visibility hide\ encodePseudogeneGIS GIS Pseudogenes genePred Genome Institute of Singapore (GIS) Pseudogenes 0 6 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 longLabel Genome Institute of Singapore (GIS) Pseudogenes\ parent encodePseudogene\ priority 6\ shortLabel GIS Pseudogenes\ track encodePseudogeneGIS\ hapmapSnpsJPT HapMap SNPs JPT bed 6 + HapMap SNPs from the JPT Population (Japanese in Tokyo, Japan) 0 6 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HapMap SNPs from the JPT Population (Japanese in Tokyo, Japan)\ parent hapmapSnps\ priority 6\ shortLabel HapMap SNPs JPT\ track hapmapSnpsJPT\ hgdpHzyEAsia Hetzgty E. Asia bedGraph 4 Human Genome Diversity Proj Smoothd Expec Heterozygosity (E. Asia) 0 6 0 200 0 127 227 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 0,200,0\ longLabel Human Genome Diversity Proj Smoothd Expec Heterozygosity (E. Asia)\ parent hgdpHzy\ priority 6\ shortLabel Hetzgty E. Asia\ track hgdpHzyEAsia\ hgdpIhsOceania iHS Oceania bedGraph 4 Human Genome Diversity Project iHS (Oceania) 0 6 0 200 200 127 227 227 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 color 0,200,200\ longLabel Human Genome Diversity Project iHS (Oceania)\ parent hgdpIhs\ priority 6\ shortLabel iHS Oceania\ track hgdpIhsOceania\ snpArrayIllumina650 Illumina 650 bed 6 + Illumina Human Hap 650v3 0 6 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Illumina Human Hap 650v3\ parent snpArray off\ priority 6\ shortLabel Illumina 650\ track snpArrayIllumina650\ type bed 6 +\ encodeRegulomeQualityK562 K562 bed 5 . K562 Quality 0 6 210 50 90 232 152 172 1 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 1 color 210,50,90\ longLabel K562 Quality\ parent encodeRegulomeQuality\ priority 6\ shortLabel K562\ track encodeRegulomeQualityK562\ encodeRegulomeProbK562 K562 bedGraph 4 K562 DNaseI HSs 0 6 210 50 90 232 152 172 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 210,50,90\ longLabel K562 DNaseI HSs\ parent encodeRegulomeProb\ priority 6\ shortLabel K562\ track encodeRegulomeProbK562\ encodeRegulomeBaseK562 K562 wig 0.0 3.0 K562 DNaseI Sensitivity 0 6 210 50 90 232 152 172 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 210,50,90\ longLabel K562 DNaseI Sensitivity\ parent encodeRegulomeBase\ priority 6\ shortLabel K562\ track encodeRegulomeBaseK562\ encodeUcsdChipHeLaH3H4acH3_p30 LI H3ac +gIF bedGraph 4 Ludwig Institute ChIP-chip: H3ac ab, HeLa cells, 30 min. after gamma interferon 0 6 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: H3ac ab, HeLa cells, 30 min. after gamma interferon\ parent encodeLIChIPgIF\ priority 6\ shortLabel LI H3ac +gIF\ track encodeUcsdChipHeLaH3H4acH3_p30\ encodeUcsdChipTaf250Thp1_f LI TAF1 THP1 bedGraph 4 Ludwig Institute ChIP-chip: TAF1 ab, THP1 cells 0 6 0 63 135 127 159 195 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 0,63,135\ longLabel Ludwig Institute ChIP-chip: TAF1 ab, THP1 cells\ parent encodeLIChIP\ priority 6\ shortLabel LI TAF1 THP1\ track encodeUcsdChipTaf250Thp1_f\ encodeMlaganIntersectEl MLAGAN Intersect bed 5 . MLAGAN PhastCons/BinCons/GERP Intersection Conserved Elements 0 6 80 145 105 167 200 180 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 80,145,105\ longLabel MLAGAN PhastCons/BinCons/GERP Intersection Conserved Elements\ parent encodeMlaganElements\ priority 6\ shortLabel MLAGAN Intersect\ track encodeMlaganIntersectEl\ denisovaModernHumans Modern Human Seq bam Alignments of Sequence Reads from 7 Humans 0 6 0 0 0 127 127 127 0 0 0 \

Description

\

\ The Modern Human Seq track shows human sequence reads of seven individuals mapped \ to the human genome. The purpose of this track is to put the divergence of the \ Denisova genome into perspective with regard to present-day humans.\

\ \

Methods

\

\ DNA was obtained for each of seven individuals from the CEPH-Human\ Genome Diversity Panel (HGDP): HGDP00456 (Mbuti), HGDP00998 (Karitiana\ Native American), HGDP00665 (Sardinia), HGDP00491 (Bougainville\ Melanesian), HGDP00711 (Cambodian), HGDP01224 (Mongolian) and\ HGDP00551 (Papuan). Each library was sequenced on the Illumina Genome\ Analyzer IIx using 2x101 + 7 cycles on one flow cell according to the\ manufacturer's instructions for multiplex sequencing. The paired-end\ reads were aligned\ using the Burrows-Wheeler Aligner to the human sequence (NCBI36/hg18)\

\

\ Download the \ Modern Human Seq track data sets from the Genome Browser downloads server.\

\ \

References

\

\ Briggs A.W., Stenzel U., Meyer M., Krause J., Kircher M., Pääbo S. \ Removal of deaminated cytosines and detection of in vivo\ methylation in ancient DNA.\ Nucleic Acids Res. 2009 Dec 22:38(6) e87.\

\

\ Reich D., Green R.E., Kircher M., Krause J., Patterson N., Durand E.Y., Viola B., \ Briggs A.W., Stenzel U., Johnson P.L.F. et al.\ \ Genetic history of an archaic hominin group from Denisova Cave in Siberia.\ Nature. 2010 Dec 23;468:1053-1060.\

\ \

Credits

\

\ This track was produced at UCSC using data generated by the Max Planck\ Institute for Evolutionary Anthropology.\

\ denisova 1 aliQualRange 0:60\ allButtonPair on\ bamColorMode gray\ bamGrayMode aliQual\ baseColorDefault diffBases\ baseColorUseSequence lfExtra\ compositeTrack on\ dimensions dimensionX=sample\ dragAndDrop subTracks\ group denisova\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Alignments of Sequence Reads from 7 Humans\ maxWindowToDraw 1000000\ noColorTag .\ pairEndsByName on\ priority 6\ shortLabel Modern Human Seq\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 100\ showNames off\ sortOrder sample=+\ subGroup1 sample Sample b2MPygmy=Mbuti_Pygmy c3Mel=Melanesian d4Papuan=Papuan e5Sar=Sardinian f6Cam=Cambodian g7NativeAm=Native_Americans h8Mon=Mongolian\ track denisovaModernHumans\ type bam\ visibility hide\ ntModernHumans Modern Human Seq bam Alignments of Sequence Reads from 5 Modern Humans 0 6 0 0 0 127 127 127 0 0 0

Description

\

\ The Modern Human Seq track shows human sequence reads of five individuals mapped\ to the human genome. The purpose of this track is to put the divergence of the\ Neandertal genomes into perspective with regard to present-day humans.

\ \

Display Conventions and Configuration

\

\ The sequence reads (query sequences) from each of the five individuals are\ contained in separate subtracks. Use the checkboxes to select which\ individuals will be displayed in the browser. Click and drag the\ sample name to reorder the subtracks. The order in which the subtracks appear in\ the subtrack list will be the order in which they display in the browser.

\

\ The query sequences in the SAM/BAM alignment representation \ are normalized to the + strand of the reference genome\ (see the SAM Format Specification\ for more information on the SAM/BAM file format). If a query sequence was\ originally the reverse of what has been stored and aligned, it will have the\ following\ flag:\

\
(0x10) Read is on '-' strand.\

\

\ BAM/SAM alignment representations also have tags. Some tags are predefined and others (those beginning\ with X, Y or Z) are defined by the aligner or data submitter.\ The following is a list of the tags associated with this track. For this \ track, those starting with X are specific to the \ Burrows-Wheeler Aligner (BWA).\

\ \ \

\

\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page.\

\ \

\ \

Methods

\

\ The genomes of a San individual from Southern Africa (HGDP01029), a Yoruba\ individual from West Africa (HGDP00927), a Han Chinese individual (HGDP00778), \ an individual from Papua New Guinea (HGDP00542), and a French individual\ (HGDP00521) from Western Europe were sequenced to 4- to 6-fold coverage on\ the Illumina GAII platform. These sequences were aligned to the human\ reference genome (NCBI36/hg18) using the Burrows-Wheeler Aligner (BWA). Reads with an alignment\ quality of less than 30 were not included in these data. Those with an alignment\ quality greater than or equal to 30 were analyzed using a similar approach to\ that used for the Neandertal data.\

\ \

Credits

\

\ This track was produced at UCSC using data generated by\ Ed Green.\

\ \

References

\

\ Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N,\ Li H, Zhai W, Fritz MH et al.\ A Draft Sequence of the Neandertal Genome.\ Science. 2010 7 May;328(5979):710-22.\

\

\ Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler \ Transform. Bioinformatics. 2009 Jul 15;25(14):1754-60.

\ neandertal 1 aliQualRange 0:60\ allButtonPair on\ bamColorMode gray\ bamGrayMode aliQual\ baseColorDefault diffBases\ baseColorUseSequence lfExtra\ compositeTrack on\ dimensions dimensionX=sample\ dragAndDrop subTracks\ group neandertal\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Alignments of Sequence Reads from 5 Modern Humans\ maxWindowToDraw 1000000\ noColorTag .\ pairEndsByName on\ priority 6\ shortLabel Modern Human Seq\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 100\ showNames off\ sortOrder sample=+\ subGroup1 sample Sample a1San=San b2Yoruba=Yoruba c3Han=Han d4Papuan=Papuan e5French=French\ track ntModernHumans\ type bam\ visibility hide\ encodeAllOtherESTs Other ESTs bed 4 Consensus Other ESTs 0 6 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Consensus Other ESTs\ parent encodeWorkshopSelections\ priority 6\ shortLabel Other ESTs\ track encodeAllOtherESTs\ encodeGencodeRaceFragsLiver RACEfrags Liver genePred Gencode RACEfrags from Liver 0 6 236 0 20 245 127 137 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 236,0,20\ longLabel Gencode RACEfrags from Liver\ parent encodeGencodeRaceFrags\ priority 6\ shortLabel RACEfrags Liver\ track encodeGencodeRaceFragsLiver\ encodeSangerChipH3K4me2K562 SI H3K4me2 K562 bedGraph 4 Sanger Institute ChIP/Chip (H3K4me2 ab, K562 cells) 0 6 10 10 10 132 132 132 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 10,10,10\ longLabel Sanger Institute ChIP/Chip (H3K4me2 ab, K562 cells)\ parent encodeSangerChipH3H4\ priority 6\ shortLabel SI H3K4me2 K562\ track encodeSangerChipH3K4me2K562\ stanfordChipHeLaSRF Stan HeLa SRF bedGraph 4 Stanford ChIP-chip (HeLa cells, SRF ChIP) 0 6 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 0 longLabel Stanford ChIP-chip (HeLa cells, SRF ChIP)\ parent stanfordChip\ priority 6\ shortLabel Stan HeLa SRF\ track stanfordChipHeLaSRF\ encodeStanfordChipHeLaSRF Stan HeLa SRF bedGraph 4 Stanford ChIP-chip (HeLa cells, SRF ChIP) 0 6 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (HeLa cells, SRF ChIP)\ parent encodeStanfordChipJohnson\ priority 6\ shortLabel Stan HeLa SRF\ track encodeStanfordChipHeLaSRF\ encodeStanfordChipK562Sp3 Stan K562 Sp3 bedGraph 4 Stanford ChIP-chip (K562 cells, Sp3 ChIP) 0 6 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (K562 cells, Sp3 ChIP)\ parent encodeStanfordChip\ priority 6\ shortLabel Stan K562 Sp3\ track encodeStanfordChipK562Sp3\ encodeStanfordMethJEG3 Stan Meth JEG3 bedGraph 4 Stanford Methylation Digest (JEG3 cells) 0 6 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest (JEG3 cells)\ parent encodeStanfordMeth\ priority 6\ shortLabel Stan Meth JEG3\ track encodeStanfordMethJEG3\ encodeStanfordMethSmoothedJEG3 Stan Meth Sc JEG3 bedGraph 4 Stanford Methylation Digest Smoothed Score (JEG3 cells) 0 6 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest Smoothed Score (JEG3 cells)\ parent encodeStanfordMethSmoothed\ priority 6\ shortLabel Stan Meth Sc JEG3\ track encodeStanfordMethSmoothedJEG3\ encodeStanfordPromotersHMCB Stan Pro HMCB bed 9 + Stanford Promoter Activity (HMCB cells) 0 6 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (HMCB cells)\ parent encodeStanfordPromoters\ priority 6\ shortLabel Stan Pro HMCB\ track encodeStanfordPromotersHMCB\ encodeStanfordChipSmoothedK562Sp3 Stan Sc K562 Sp3 bedGraph 4 Stanford ChIP-chip Smoothed Score (K562 cells, Sp3 ChIP) 0 6 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip Smoothed Score (K562 cells, Sp3 ChIP)\ parent encodeStanfordChipSmoothed\ priority 6\ shortLabel Stan Sc K562 Sp3\ track encodeStanfordChipSmoothedK562Sp3\ tajdXd Tajima's D XD bedGraph 4 Tajima's D from Chinese Descent 0 6 200 100 0 0 100 200 0 0 0 varRep 0 altColor 0,100,200\ color 200,100,0\ longLabel Tajima's D from Chinese Descent\ parent tajD\ priority 6\ shortLabel Tajima's D XD\ track tajdXd\ encodeTbaIntersectEl TBA Intersect bed 5 . TBA PhastCons/BinCons/GERP Intersection Conserved Elements 0 6 80 145 105 167 200 180 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 80,145,105\ longLabel TBA PhastCons/BinCons/GERP Intersection Conserved Elements\ parent encodeTbaElements\ priority 6\ shortLabel TBA Intersect\ track encodeTbaIntersectEl\ encodeUtexChip2091fibMycPeaks UT Myc Fb Pk bedGraph 4 University of Texas, Austin ChIP-chip (c-Myc, 2091 fibroblasts) Peaks 0 6 50 0 0 152 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,0,0\ longLabel University of Texas, Austin ChIP-chip (c-Myc, 2091 fibroblasts) Peaks\ parent encodeUtexChip\ priority 6\ shortLabel UT Myc Fb Pk\ subGroups dataType=peaks\ track encodeUtexChip2091fibMycPeaks\ kiddEichlerValidAbc9 Validated ABC9 bed 9 HGSV Individual ABC9 (Japan) Validated Sites of Structural Variation 0 6 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HGSV Individual ABC9 (Japan) Validated Sites of Structural Variation\ parent kiddEichlerValid\ priority 6\ shortLabel Validated ABC9\ track kiddEichlerValidAbc9\ hgdpXpehhOceania XP-EHH Oceania bedGraph 4 Human Genome Diversity Project XP-EHH (Oceania) 0 6 0 200 200 127 227 227 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 0,200,200\ longLabel Human Genome Diversity Project XP-EHH (Oceania)\ parent hgdpXpehh\ priority 6\ shortLabel XP-EHH Oceania\ track hgdpXpehhOceania\ encodeTransFragsYaleIntronicDistal Yale Intron Dist bed 4 Yale Intronic Distal 0 6 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Yale Intronic Distal\ parent encodeTransFrags\ priority 6\ shortLabel Yale Intron Dist\ track encodeTransFragsYaleIntronicDistal\ encodeYaleMASPlacRNATransMapRevMless36mer36bp Yale Plc BtR RNA bedGraph 4 Yale Placenta RNA Trans Map, MAS Array, Reverse Direction, Bertone Protocol 0 6 50 50 200 200 50 50 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22, encodeTxLevels 0 altColor 200,50,50\ color 50,50,200\ longLabel Yale Placenta RNA Trans Map, MAS Array, Reverse Direction, Bertone Protocol\ parent encodeYaleMASPlacRNATransMap\ priority 6\ shortLabel Yale Plc BtR RNA\ track encodeYaleMASPlacRNATransMapRevMless36mer36bp\ encodeYaleMASPlacRNATarsRevMless36mer36bp Yale Plc BtR TAR bed 6 . Yale Placenta RNA TARs, MAS array, Reverse Direction, Bertone Protocol 0 6 50 50 200 200 50 50 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22, encodeTxLevels 1 altColor 200,50,50\ color 50,50,200\ longLabel Yale Placenta RNA TARs, MAS array, Reverse Direction, Bertone Protocol\ parent encodeYaleMASPlacRNATars\ priority 6\ shortLabel Yale Plc BtR TAR\ track encodeYaleMASPlacRNATarsRevMless36mer36bp\ encodeYaleAffyNeutRNATransMap05 Yale RNA Neu 5 wig -2730 3394 Yale Neutrophil RNA Transcript Map, Sample 5 0 6 50 145 50 152 200 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,145,50\ longLabel Yale Neutrophil RNA Transcript Map, Sample 5\ parent encodeYaleAffyRNATransMap\ priority 6\ shortLabel Yale RNA Neu 5\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATransMap05\ encodeYaleAffyNeutRNATars05 Yale TAR Neu 5 bed 3 . Yale Neutrophil RNA Transcriptionally Active Region, Sample 5 0 6 50 145 50 152 200 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,145,50\ longLabel Yale Neutrophil RNA Transcriptionally Active Region, Sample 5\ parent encodeYaleAffyRNATars\ priority 6\ shortLabel Yale TAR Neu 5\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATars05\ encodeAffyChIpHl60PvalBrg1Hr32 Affy Brg1 RA 32h wig 0.0 534.54 Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 32hrs) P-Value 0 7 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 225,0,0\ longLabel Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 32hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 7\ shortLabel Affy Brg1 RA 32h\ subGroups factor=Brg1 time=32h\ track encodeAffyChIpHl60PvalBrg1Hr32\ encodeAffyChIpHl60SignalStrictHisH4Hr08 Affy H4Kac4 8h wig -2.78 3.97 Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 8hrs) Strict Signal 0 7 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 150,75,0\ longLabel Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 8hrs) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 7\ shortLabel Affy H4Kac4 8h\ subGroups factor=H4Kac4 time=8h\ track encodeAffyChIpHl60SignalStrictHisH4Hr08\ encodeAffyChIpHl60SitesStrictHisH4Hr08 Affy H4Kac4 8h bed 3 . Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 8hrs) Strict Sites 0 7 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 150,75,0\ longLabel Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 8hrs) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 7\ shortLabel Affy H4Kac4 8h\ subGroups factor=H4Kac4 time=8h\ track encodeAffyChIpHl60SitesStrictHisH4Hr08\ encodeAffyChIpHl60PvalStrictHisH4Hr08 Affy H4Kac4 8h wig 0 696.62 Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 8hrs) Strict P-Value 0 7 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 150,75,0\ longLabel Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 8hrs) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 7\ shortLabel Affy H4Kac4 8h\ subGroups factor=H4Kac4 time=8h\ track encodeAffyChIpHl60PvalStrictHisH4Hr08\ encodeBuFirstExonSkMuscle BU Skel. Muscle bed 12 + Boston University First Exon Activity in Skeletal Muscle 0 7 0 0 0 127 127 127 0 0 10 chr11,chr13,chr15,chr16,chr19,chr2,chr5,chr7,chr9,chrX, encodeTxLevels 1 longLabel Boston University First Exon Activity in Skeletal Muscle\ parent encodeBuFirstExon\ priority 7\ shortLabel BU Skel. Muscle\ track encodeBuFirstExonSkMuscle\ cccTrendPvalT2d CCC T2 Diabetes chromGraph Case Control Consortium type 2 diabetes trend -log10 P-value 0 7 0 0 0 127 127 127 0 0 0 phenDis 0 longLabel Case Control Consortium type 2 diabetes trend -log10 P-value\ parent caseControl\ priority 7\ shortLabel CCC T2 Diabetes\ track cccTrendPvalT2d\ encodeTransFragsConservedDistal Cons Distal bed 4 Conserved Distal 0 7 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Conserved Distal\ parent encodeTransFrags\ priority 7\ shortLabel Cons Distal\ track encodeTransFragsConservedDistal\ kiddEichlerDiscAbc8 Discordant ABC8 bed 12 HGSV Individual ABC8 (Yoruba) Discordant Clone End Alignments 0 7 0 0 0 127 127 127 0 0 0 http://mrhgsv.gs.washington.edu/cgi-bin/hgc?i=$$&c=$S&l=$[&r=$]&db=$D&position=$S:$[-$] varRep 1 longLabel HGSV Individual ABC8 (Yoruba) Discordant Clone End Alignments\ parent kiddEichlerDisc\ priority 7\ shortLabel Discordant ABC8\ track kiddEichlerDiscAbc8\ encodeAffyEc1BrainHypothalamusSignal EC1 Sgnl BrainH wig 0 62385 Affy Ext Trans Signal (1-base window) (Brain Hypothalamus) 0 7 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 0 color 248,0,8\ longLabel Affy Ext Trans Signal (1-base window) (Brain Hypothalamus)\ parent encodeAffyEcSignal\ priority 7\ shortLabel EC1 Sgnl BrainH\ track encodeAffyEc1BrainHypothalamusSignal\ encodeAffyEc1BrainHypothalamusSites EC1 Sites BrainH bed 3 . Affy Ext Trans Sites (1-base window) (Brain Hypothalamus) 0 7 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 1 color 248,0,8\ longLabel Affy Ext Trans Sites (1-base window) (Brain Hypothalamus)\ parent encodeAffyEcSites\ priority 7\ shortLabel EC1 Sites BrainH\ track encodeAffyEc1BrainHypothalamusSites\ encodeRegulomeQualityERY ERY bed 5 . Adult Erythroblast Quality 0 7 240 50 60 247 152 157 1 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 1 color 240,50,60\ longLabel Adult Erythroblast Quality\ parent encodeRegulomeQuality\ priority 7\ shortLabel ERY\ track encodeRegulomeQualityERY\ encodeRegulomeProbERY ERY bedGraph 4 Adult Erythroblast DNaseI HSs 0 7 240 50 60 247 152 157 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 240,50,60\ longLabel Adult Erythroblast DNaseI HSs\ parent encodeRegulomeProb\ priority 7\ shortLabel ERY\ track encodeRegulomeProbERY\ encodeRegulomeBaseERY ERY wig 0.0 3.0 Adult Erythroblast DNaseI Sensitivity 0 7 240 50 60 247 152 157 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX, encodeChrom 0 color 240,50,60\ longLabel Adult Erythroblast DNaseI Sensitivity\ parent encodeRegulomeBase\ priority 7\ shortLabel ERY\ track encodeRegulomeBaseERY\ encodeEgaspFullSoftberryPseudo Fgenesh Pseudo genePred Fgenesh Pseudogene Predictions 0 7 130 130 130 192 192 192 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 130,130,130\ longLabel Fgenesh Pseudogene Predictions\ parent encodeEgaspFull\ priority 7\ shortLabel Fgenesh Pseudo\ track encodeEgaspFullSoftberryPseudo\ encodeEgaspUpdGeneId GeneID Update genePred GeneID Gene Predictions 0 7 100 12 100 177 133 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 100,12,100\ longLabel GeneID Gene Predictions\ parent encodeEgaspUpdate\ priority 7\ shortLabel GeneID Update\ track encodeEgaspUpdGeneId\ encodeEgaspPartGenezilla GeneZilla genePred GeneZilla Gene Predictions 0 7 22 150 20 138 202 137 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 22,150,20\ longLabel GeneZilla Gene Predictions\ parent encodeEgaspPartial\ priority 7\ shortLabel GeneZilla\ track encodeEgaspPartGenezilla\ genMapDb GenMapDB Clones bed 6 + GenMapDB BAC Clones 0 7 0 0 0 127 127 127 0 0 0

Description

\

BAC clones from GenMapDB\ are placed on the draft sequence using BAC end sequence information\ and confirmed using STS markers by Vivian Cheung's lab at the\ Department of Pediatrics, University of Pennsylvania. Further\ information about each clone can be obtained by clicking on the clone\ name on the track detail page.\

Credits

\ Thanks to Vivian Cheung's lab \ and GenMapDB at the University of Pennsylvania for providing the data used to create this track.\ map 1 group map\ longLabel GenMapDB BAC Clones\ priority 7\ shortLabel GenMapDB Clones\ track genMapDb\ type bed 6 +\ visibility hide\ hapmapSnpsLWK HapMap SNPs LWK bed 6 + HapMap SNPs from the LWK Population (Luhya in Webuye, Kenya) 0 7 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HapMap SNPs from the LWK Population (Luhya in Webuye, Kenya)\ parent hapmapSnps\ priority 7\ shortLabel HapMap SNPs LWK\ track hapmapSnpsLWK\ hgdpHzyOceania Hetzgty Oceania bedGraph 4 Human Genome Diversity Proj Smoothd Expec Heterozygosity (Oceania) 0 7 0 200 200 127 227 227 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 0,200,200\ longLabel Human Genome Diversity Proj Smoothd Expec Heterozygosity (Oceania)\ parent hgdpHzy\ priority 7\ shortLabel Hetzgty Oceania\ track hgdpHzyOceania\ delHinds Hinds Dels bed 4 . Deletions from Haploid Hybridization Analysis (Hinds) 0 7 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Deletions from Haploid Hybridization Analysis (Hinds)\ noInherit on\ parent cnp\ priority 7\ shortLabel Hinds Dels\ track delHinds\ type bed 4 .\ netSyntenyEquCab1 Horse Syn Net netAlign equCab1 chainEquCab1 Horse (Jan. 2007 (Broad/equCab1)) Syntenic Alignment Net 0 7 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb equCab1\ parent syntenicNet\ priority 7\ shortLabel Horse Syn Net\ spectrum on\ track netSyntenyEquCab1\ type netAlign equCab1 chainEquCab1\ visibility hide\ hgdpIhsAmericas iHS Americas bedGraph 4 Human Genome Diversity Project iHS (Americas) 0 7 224 192 0 239 223 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 color 224,192,0\ longLabel Human Genome Diversity Project iHS (Americas)\ parent hgdpIhs\ priority 7\ shortLabel iHS Americas\ track hgdpIhsAmericas\ snpArrayIllumina550 Illumina 550 bed 6 + Illumina Human Hap 550v3 0 7 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Illumina Human Hap 550v3\ parent snpArray off\ priority 7\ shortLabel Illumina 550\ track snpArrayIllumina550\ type bed 6 +\ encodeUcsdChipHeLaH3H4acH4_p0 LI H4ac -gIF bedGraph 4 Ludwig Institute ChIP-chip: H4ac ab, HeLa cells, no gamma interferon 0 7 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: H4ac ab, HeLa cells, no gamma interferon\ parent encodeLIChIPgIF\ priority 7\ shortLabel LI H4ac -gIF\ track encodeUcsdChipHeLaH3H4acH4_p0\ encodeUcsdChipTaf250Imr90_f LI TAF1 IMR90 bedGraph 4 Ludwig Institute ChIP-chip: TAF1 ab, IMR90 cells 0 7 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: TAF1 ab, IMR90 cells\ parent encodeLIChIP\ priority 7\ shortLabel LI TAF1 IMR90\ track encodeUcsdChipTaf250Imr90_f\ decodeMale Male bigWig 0.0 144.958 deCODE recombination map, male 0 7 0 81 200 127 168 227 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 color 0,81,200\ configurable on\ longLabel deCODE recombination map, male\ parent maleView\ priority 7\ shortLabel Male\ subGroups view=male\ track decodeMale\ type bigWig 0.0 144.958\ encodeMlaganNcIntersectEl MLAGAN NC Intersect bed 5 . MLAGAN PhastCons/BinCons/GERP Intersection NonCoding Conserved Elements 0 7 80 180 80 167 217 167 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 80,180,80\ longLabel MLAGAN PhastCons/BinCons/GERP Intersection NonCoding Conserved Elements\ parent encodeMlaganElements\ priority 7\ shortLabel MLAGAN NC Intersect\ track encodeMlaganNcIntersectEl\ ntMito Neandertal Mito psl Neandertal Mitochondrial Sequence (Vi33.16, 2008) 0 7 0 0 0 127 127 127 0 0 1 chrM,

Description

\

\ This track shows the alignment of a complete Neandertal mitochondrial\ sequence to a modern human mitochondrial sequence.\

\

\ Note: the mitochondrion used as the genome browser reference sequence\ "chrM" in hg18 and hg19 is\ NC_001807, which has been deprecated. \ Future human genome browsers will use the revised Cambridge Reference\ Sequence (rCRS) NC_012920.\

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment \ tracks.\ Mismatching bases are highlighted as described\ here.\ Several types of alignment gap may also be colored; for more information, click \ here.\

\ \

Methods

\

\ DNA was extracted from a 38,000-year-old bone and sequenced using \ methods described in Green, et al.\ The Neandertal mitochondrial sequence \ (NC_011137) was downloaded\ from GenBank and aligned to chrM\ (NC_001807) using BLAT.\

\ \

Reference

\

\ Green RE, Malaspinas AS, Krause J, Briggs AW, Johnson PL, Uhler C, Meyer M, \ Good JM, Maricic T, Stenzel U et al.\ A complete \ Neandertal mitochondrial genome sequence determined by high-throughput \ sequencing.\ Cell. 2008 Aug 8;134(3):416-26.\

\ neandertal 1 baseColorDefault diffBases\ baseColorUseSequence seq\ chromosomes chrM\ group neandertal\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Neandertal Mitochondrial Sequence (Vi33.16, 2008)\ priority 7\ shortLabel Neandertal Mito\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 100\ track ntMito\ type psl\ visibility hide\ encodeGencodeRaceFragsLung RACEfrags Lung genePred Gencode RACEfrags from Lung 0 7 200 0 56 227 127 155 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 200,0,56\ longLabel Gencode RACEfrags from Lung\ parent encodeGencodeRaceFrags\ priority 7\ shortLabel RACEfrags Lung\ track encodeGencodeRaceFragsLung\ encodeSangerChipH3K4me3K562 SI H3K4me3 K562 bedGraph 4 Sanger Institute ChIP/Chip (H3K4me3 ab, K562 cells) 0 7 10 10 10 132 132 132 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 10,10,10\ longLabel Sanger Institute ChIP/Chip (H3K4me3 ab, K562 cells)\ parent encodeSangerChipH3H4\ priority 7\ shortLabel SI H3K4me3 K562\ track encodeSangerChipH3K4me3K562\ stanfordChipHeLaTAF Stan HeLa TAF bedGraph 4 Stanford ChIP-chip (HeLa cells, TAF ChIP) 0 7 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 0 longLabel Stanford ChIP-chip (HeLa cells, TAF ChIP)\ parent stanfordChip\ priority 7\ shortLabel Stan HeLa TAF\ track stanfordChipHeLaTAF\ encodeStanfordChipHeLaTAF Stan HeLa TAF bedGraph 4 Stanford ChIP-chip (HeLa cells, TAF ChIP) 0 7 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (HeLa cells, TAF ChIP)\ parent encodeStanfordChipJohnson\ priority 7\ shortLabel Stan HeLa TAF\ track encodeStanfordChipHeLaTAF\ encodeStanfordMethSmoothedSnu182 Stan Meth Sc Snu182 bedGraph 4 Stanford Methylation Digest Smoothed Score (Snu182 cells) 0 7 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest Smoothed Score (Snu182 cells)\ parent encodeStanfordMethSmoothed\ priority 7\ shortLabel Stan Meth Sc Snu182\ track encodeStanfordMethSmoothedSnu182\ encodeStanfordMethSnu182 Stan Meth Snu182 bedGraph 4 Stanford Methylation Digest (Snu182 cells) 0 7 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest (Snu182 cells)\ parent encodeStanfordMeth\ priority 7\ shortLabel Stan Meth Snu182\ track encodeStanfordMethSnu182\ encodeStanfordPromotersHT1080 Stan Pro HT1080 bed 9 + Stanford Promoter Activity (HT1080 cells) 0 7 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (HT1080 cells)\ parent encodeStanfordPromoters\ priority 7\ shortLabel Stan Pro HT1080\ track encodeStanfordPromotersHT1080\ encodeTbaNcIntersectEl TBA NC Intersect bed 5 . TBA PhastCons/BinCons/GERP Intersection NonCoding Conserved Elements 0 7 80 180 80 167 217 167 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 color 80,180,80\ longLabel TBA PhastCons/BinCons/GERP Intersection NonCoding Conserved Elements\ parent encodeTbaElements\ priority 7\ shortLabel TBA NC Intersect\ track encodeTbaNcIntersectEl\ encodeUtexChip2091fibMycStimPeaks UT Myc st-Fb Pk bedGraph 4 University of Texas, Austin ChIP-chip (c-Myc, FBS-stimulated 2091 fibroblasts) Peaks 0 7 50 0 0 152 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,0,0\ longLabel University of Texas, Austin ChIP-chip (c-Myc, FBS-stimulated 2091 fibroblasts) Peaks\ parent encodeUtexChip\ priority 7\ shortLabel UT Myc st-Fb Pk\ subGroups dataType=peaks\ track encodeUtexChip2091fibMycStimPeaks\ kiddEichlerValidAbc8 Validated ABC8 bed 9 HGSV Individual ABC8 (Yoruba) Validated Sites of Structural Variation 0 7 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HGSV Individual ABC8 (Yoruba) Validated Sites of Structural Variation\ parent kiddEichlerValid\ priority 7\ shortLabel Validated ABC8\ track kiddEichlerValidAbc8\ hgdpXpehhAmericas XP-EHH Americas bedGraph 4 Human Genome Diversity Project XP-EHH (Americas) 0 7 224 192 0 239 223 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 224,192,0\ longLabel Human Genome Diversity Project XP-EHH (Americas)\ parent hgdpXpehh\ priority 7\ shortLabel XP-EHH Americas\ track hgdpXpehhAmericas\ encodeYaleAffyNB4RARNATarsIntronsProximal Yale In Prx NB4 RA bed 4 . Yale Intronic Proximal NB4 Retinoic TARs 0 7 176 0 80 215 127 167 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 176,0,80\ longLabel Yale Intronic Proximal NB4 Retinoic TARs\ parent encodeNoncodingTransFrags\ priority 7\ shortLabel Yale In Prx NB4 RA\ subGroups region=intronicProximal celltype=nb4 source=yale\ track encodeYaleAffyNB4RARNATarsIntronsProximal\ encodeYaleAffyNeutRNATransMap06 Yale RNA Neu 6 wig -2730 3394 Yale Neutrophil RNA Transcript Map, Sample 6 0 7 50 130 50 152 192 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,130,50\ longLabel Yale Neutrophil RNA Transcript Map, Sample 6\ parent encodeYaleAffyRNATransMap\ priority 7\ shortLabel Yale RNA Neu 6\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATransMap06\ encodeYaleAffyNeutRNATars06 Yale TAR Neu 6 bed 3 . Yale Neutrophil RNA Transcriptionally Active Region, Sample 6 0 7 50 130 50 152 192 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,130,50\ longLabel Yale Neutrophil RNA Transcriptionally Active Region, Sample 6\ parent encodeYaleAffyRNATars\ priority 7\ shortLabel Yale TAR Neu 6\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATars06\ encodeAffyChIpHl60SitesBrg1Hr32 Affy Brg1 RA 32h bed 3 . Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 32hrs) Sites 0 8 225 0 0 240 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 225,0,0\ longLabel Affymetrix ChIP/Chip (Brg1 retinoic acid-treated HL-60, 32hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 8\ shortLabel Affy Brg1 RA 32h\ subGroups factor=Brg1 time=32h\ track encodeAffyChIpHl60SitesBrg1Hr32\ encodeAffyChIpHl60SignalStrictHisH4Hr32 Affy H4Kac4 32h wig -2.78 3.97 Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 32hrs) Strict Signal 0 8 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 150,75,0\ longLabel Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 32hrs) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 8\ shortLabel Affy H4Kac4 32h\ subGroups factor=H4Kac4 time=32h\ track encodeAffyChIpHl60SignalStrictHisH4Hr32\ encodeAffyChIpHl60SitesStrictHisH4Hr32 Affy H4Kac4 32h bed 3 . Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 32hrs) Strict Sites 0 8 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 150,75,0\ longLabel Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 32hrs) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 8\ shortLabel Affy H4Kac4 32h\ subGroups factor=H4Kac4 time=32h\ track encodeAffyChIpHl60SitesStrictHisH4Hr32\ encodeAffyChIpHl60PvalStrictHisH4Hr32 Affy H4Kac4 32h wig 0 696.62 Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 32hrs) Strict P-Value 0 8 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 150,75,0\ longLabel Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 32hrs) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 8\ shortLabel Affy H4Kac4 32h\ subGroups factor=H4Kac4 time=32h\ track encodeAffyChIpHl60PvalStrictHisH4Hr32\ encodeBuFirstExonSpleen BU Spleen bed 12 + Boston University First Exon Activity in Spleen 0 8 0 0 0 127 127 127 0 0 10 chr11,chr13,chr15,chr16,chr19,chr2,chr5,chr7,chr9,chrX, encodeTxLevels 1 longLabel Boston University First Exon Activity in Spleen\ parent encodeBuFirstExon\ priority 8\ shortLabel BU Spleen\ track encodeBuFirstExonSpleen\ encodeTransFragsConservedProximal Cons Prox bed 4 Conserved Proximal 0 8 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Conserved Proximal\ parent encodeTransFrags\ priority 8\ shortLabel Cons Prox\ track encodeTransFragsConservedProximal\ netSyntenyBosTau2 Cow Syn Net netAlign bosTau3 chainBosTau2 Cow (Aug. 2006 (Baylor 3.1/bosTau3)) Syntenic Alignment Net 0 8 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb bosTau3\ parent syntenicNet\ priority 8\ shortLabel Cow Syn Net\ spectrum on\ track netSyntenyBosTau2\ type netAlign bosTau3 chainBosTau2\ visibility hide\ kiddEichlerDiscAbc7 Discordant ABC7 bed 12 HGSV Individual ABC7 (Yoruba) Discordant Clone End Alignments 0 8 0 0 0 127 127 127 0 0 0 http://mrhgsv.gs.washington.edu/cgi-bin/hgc?i=$$&c=$S&l=$[&r=$]&db=$D&position=$S:$[-$] varRep 1 longLabel HGSV Individual ABC7 (Yoruba) Discordant Clone End Alignments\ parent kiddEichlerDisc\ priority 8\ shortLabel Discordant ABC7\ track kiddEichlerDiscAbc7\ encodeAffyEc51BrainHypothalamusSignal EC51 Sgnl BrainH wig 0 62385 Affy Ext Trans Signal (51-base window) (Brain Hypothalamus) 0 8 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 0 color 248,0,8\ longLabel Affy Ext Trans Signal (51-base window) (Brain Hypothalamus)\ parent encodeAffyEcSignal\ priority 8\ shortLabel EC51 Sgnl BrainH\ track encodeAffyEc51BrainHypothalamusSignal\ encodeAffyEc51BrainHypothalamusSites EC51 Sites BrainH bed 3 . Affy Ext Trans Sites (51-base window) (Brain Hypothalamus) 0 8 248 0 8 251 127 131 0 0 2 chr21,chr22, encodeTxLevels 1 color 248,0,8\ longLabel Affy Ext Trans Sites (51-base window) (Brain Hypothalamus)\ parent encodeAffyEcSites\ priority 8\ shortLabel EC51 Sites BrainH\ track encodeAffyEc51BrainHypothalamusSites\ encodeEgaspFullGeneId GeneID genePred GeneID Gene Predictions 0 8 100 12 100 177 133 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 100,12,100\ longLabel GeneID Gene Predictions\ parent encodeEgaspFull\ priority 8\ shortLabel GeneID\ track encodeEgaspFullGeneId\ encodeEgaspUpdGeneIdU12 GeneID U12 Upd genePred GeneID U12 Intron Predictions 0 8 200 132 12 227 193 133 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 200,132,12\ longLabel GeneID U12 Intron Predictions\ parent encodeEgaspUpdate\ priority 8\ shortLabel GeneID U12 Upd\ track encodeEgaspUpdGeneIdU12\ hapmapSnpsMEX HapMap SNPs MEX bed 6 + HapMap SNPs from the MEX Population (Mexican Ancestry in Los Angeles, CA, US) 0 8 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HapMap SNPs from the MEX Population (Mexican Ancestry in Los Angeles, CA, US)\ parent hapmapSnps\ priority 8\ shortLabel HapMap SNPs MEX\ track hapmapSnpsMEX\ hgdpHzyAmericas Hetzgty Americas bedGraph 4 Human Genome Diversity Proj Smoothd Expec Heterozygosity (Americas) 0 8 224 192 0 239 223 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22, varRep 0 color 224,192,0\ longLabel Human Genome Diversity Proj Smoothd Expec Heterozygosity (Americas)\ parent hgdpHzy\ priority 8\ shortLabel Hetzgty Americas\ track hgdpHzyAmericas\ snpArrayIllumina300 Illumina 300 bed 6 + Illumina Human Hap 300v3 0 8 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Illumina Human Hap 300v3\ parent snpArray off\ priority 8\ shortLabel Illumina 300\ track snpArrayIllumina300\ type bed 6 +\ encodeUcsdChipHeLaH3H4acH4_p30 LI H4ac +gIF bedGraph 4 Ludwig Institute ChIP-chip: H4ac ab, HeLa cells, 30 min. after gamma interferon 0 8 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: H4ac ab, HeLa cells, 30 min. after gamma interferon\ parent encodeLIChIPgIF\ priority 8\ shortLabel LI H4ac +gIF\ track encodeUcsdChipHeLaH3H4acH4_p30\ encodeUcsdChipTaf250Hct116_f LI TAF1 HCT116 bedGraph 4 Ludwig Institute ChIP-chip: TAF1 ab, HCT116 cells 0 8 58 119 40 156 187 147 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 58,119,40\ longLabel Ludwig Institute ChIP-chip: TAF1 ab, HCT116 cells\ parent encodeLIChIP\ priority 8\ shortLabel LI TAF1 HCT116\ track encodeUcsdChipTaf250Hct116_f\ decodeMaleCarrier Male Carrier bigWig 0.0 204.214 deCODE recombination map, male carrier 0 8 0 100 180 127 177 217 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 color 0,100,180\ configurable on\ longLabel deCODE recombination map, male carrier\ parent maleView\ priority 8\ shortLabel Male Carrier\ subGroups view=male\ track decodeMaleCarrier\ type bigWig 0.0 204.214\ encodeGencodeRaceFragsMuscle RACEfrags Muscle genePred Gencode RACEfrags from Muscle 0 8 188 0 68 221 127 161 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 188,0,68\ longLabel Gencode RACEfrags from Muscle\ parent encodeGencodeRaceFrags\ priority 8\ shortLabel RACEfrags Muscle\ track encodeGencodeRaceFragsMuscle\ recombRate Recomb Rate bed 4 + Recombination Rate from deCODE, Marshfield, or Genethon Maps (deCODE default) 0 8 0 0 0 127 127 127 0 0 0

Description

\

\ The recombination rate track represents\ calculated sex-averaged rates of recombination based on either the\ deCODE, Marshfield, or Genethon genetic maps. By default, the deCODE\ map rates are displayed. Female- and male-specific recombination\ rates, as well as rates from the Marshfield and Genethon maps, can\ also be displayed by choosing the appropriate filter option on the track \ description page.

\ \

Methods

\

\ The deCODE genetic map was created at \ deCODE Genetics and is \ based on 5,136 microsatellite markers for 146 families with a total\ of 1,257 meiotic events. For more information on this map, see\ Kong, A. et al. (2002).

\

\ The Marshfield genetic map was created at the \ Center for Medical Genetics and is based on 8,325 short \ tandem repeat polymorphisms (STRPs) for 8 CEPH families consisting of 134\ individuals with 186 meioses. For more information on this map, see \ Broman, K.W. et al. (1998).

\

\ The Genethon genetic map was created at \ Genethon and is based on 5,264 microsatellites for 8 CEPH \ families consisting of 134 individuals with 186 meioses. For more information \ on this map, see \ Dib et al. (1996).

\

\ Each base is assigned the recombination rate calculated by\ assuming a linear genetic distance across the immediately flanking\ genetic markers. The recombination rate assigned to each 1 Mb window\ is the average recombination rate of the bases contained within the\ window.

\ \

Using the Filter

\

\ This track has a filter that can be used to change the map or\ gender-specific rate displayed. The filter is located at the top of the track \ description page, which is accessed via the small button to the left of \ the track's graphical display or through the link on the track's control menu.\ To view a particular map or gender-specific rate, select the corresponding\ option from the "Map Distances" pulldown list. By default, the \ browser displays the deCODE sex-averaged distances.

\

\ When you have finished configuring the filter, click the Submit \ button.

\ \

Credits

\

\ This track was produced at UCSC using data that are freely available for\ the Genethon, Marshfield, and deCODE genetic maps (see above links). Thanks\ to all who played a part in the creation of these maps.

\ \

References

\

\ Broman, K.W., Murray, J.C., Sheffield, V.C., White, R.L. and Weber, J.L.\ Comprehensive human genetic maps: Individual and sex-specific \ variation in recombination, American Journal of Human Genetics\ 63, 861-689 (1998).

\

\ Dib, C., Faure, S., Fizames, C., Samson, D., Drouot, N., Vignal, A., \ Millasseau, P., Marc, S., Hazan, J., Seboun, E., Lathrop, M., Gyapay, G., \ Morissette, J., and Weissenbach, J. \ \ A comprehensive genetic map of the human genome based on 5,264 \ microsatellites, \ Nature 380(6570), 152-154 (1996).

\

\ Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., \ Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., \ Shlien, A., Palsson, S.T., Frigge, M.L., Thorgeirsson, T.E., Gulcher, J.R., \ and Stefansson, K.\ A high-resolution recombination map of the human genome,\ Nature Genetics, 31(3), 241-247 (2002).

\ map 1 exonArrows off\ group map\ longLabel Recombination Rate from deCODE, Marshfield, or Genethon Maps (deCODE default)\ priority 8\ shortLabel Recomb Rate\ track recombRate\ type bed 4 +\ visibility hide\ encodeEgaspPartSaga SAGA genePred SAGA Gene Predictions 0 8 80 20 80 167 137 167 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 80,20,80\ longLabel SAGA Gene Predictions\ parent encodeEgaspPartial\ priority 8\ shortLabel SAGA\ track encodeEgaspPartSaga\ encodeSangerChipH3acK562 SI H3ac K562 bedGraph 4 Sanger Institute ChIP/Chip (H3ac ab, K562 cells) 0 8 10 10 10 132 132 132 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 10,10,10\ longLabel Sanger Institute ChIP/Chip (H3ac ab, K562 cells)\ parent encodeSangerChipH3H4\ priority 8\ shortLabel SI H3ac K562\ track encodeSangerChipH3acK562\ stanfordChipK562GABP Stan K562 GABP bedGraph 4 Stanford ChIP-chip (K562 cells, GABP ChIP) 0 8 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 0 longLabel Stanford ChIP-chip (K562 cells, GABP ChIP)\ parent stanfordChip\ priority 8\ shortLabel Stan K562 GABP\ track stanfordChipK562GABP\ encodeStanfordChipK562GABP Stan K562 GABP bedGraph 4 Stanford ChIP-chip (K562 cells, GABP ChIP) 0 8 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (K562 cells, GABP ChIP)\ parent encodeStanfordChipJohnson\ priority 8\ shortLabel Stan K562 GABP\ track encodeStanfordChipK562GABP\ encodeStanfordMethSmoothedU87 Stan Meth Sc U87 bedGraph 4 Stanford Methylation Digest Smoothed Score (U87 cells) 0 8 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest Smoothed Score (U87 cells)\ parent encodeStanfordMethSmoothed\ priority 8\ shortLabel Stan Meth Sc U87\ track encodeStanfordMethSmoothedU87\ encodeStanfordMethU87 Stan Meth U87 bedGraph 4 Stanford Methylation Digest (U87 cells) 0 8 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChrom 0 longLabel Stanford Methylation Digest (U87 cells)\ parent encodeStanfordMeth\ priority 8\ shortLabel Stan Meth U87\ track encodeStanfordMethU87\ encodeStanfordPromotersHTB11 Stan Pro HTB11 bed 9 + Stanford Promoter Activity (HTB11 cells) 0 8 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (HTB11 cells)\ parent encodeStanfordPromoters\ priority 8\ shortLabel Stan Pro HTB11\ track encodeStanfordPromotersHTB11\ encodeUtexChip2091fibE2F4Peaks UT E2F4 st-Fb Pk bedGraph 4 University of Texas, Austin ChIP-chip (E2F4, 2091 fibroblasts) Peaks 0 8 50 0 0 152 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,0,0\ longLabel University of Texas, Austin ChIP-chip (E2F4, 2091 fibroblasts) Peaks\ parent encodeUtexChip\ priority 8\ shortLabel UT E2F4 st-Fb Pk\ subGroups dataType=peaks\ track encodeUtexChip2091fibE2F4Peaks\ kiddEichlerValidAbc7 Validated ABC7 bed 9 HGSV Individual ABC7 (Yoruba) Validated Sites of Structural Variation 0 8 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HGSV Individual ABC7 (Yoruba) Validated Sites of Structural Variation\ parent kiddEichlerValid\ priority 8\ shortLabel Validated ABC7\ track kiddEichlerValidAbc7\ encodeYaleAffyNB4TPARNATarsIntronsProximal Yale In Prx NB4 TPA bed 4 . Yale Intronic Proximal TPA-Treated NB4 TARs 0 8 164 0 92 209 127 173 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 164,0,92\ longLabel Yale Intronic Proximal TPA-Treated NB4 TARs\ parent encodeNoncodingTransFrags\ priority 8\ shortLabel Yale In Prx NB4 TPA\ subGroups region=intronicProximal celltype=nb4 source=yale\ track encodeYaleAffyNB4TPARNATarsIntronsProximal\ encodeYaleAffyNeutRNATransMap07 Yale RNA Neu 7 wig -2730 3394 Yale Neutrophil RNA Transcript Map, Sample 7 0 8 50 115 50 152 185 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,115,50\ longLabel Yale Neutrophil RNA Transcript Map, Sample 7\ parent encodeYaleAffyRNATransMap\ priority 8\ shortLabel Yale RNA Neu 7\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATransMap07\ encodeYaleAffyNeutRNATars07 Yale TAR Neu 7 bed 3 . Yale Neutrophil RNA Transcriptionally Active Region, Sample 7 0 8 50 115 50 152 185 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,115,50\ longLabel Yale Neutrophil RNA Transcriptionally Active Region, Sample 7\ parent encodeYaleAffyRNATars\ priority 8\ shortLabel Yale TAR Neu 7\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATars07\ decodeRmap deCODE Recomb bed 3 deCODE Recombination maps, 10Kb bin size, October 2010 0 8.5 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21,

Description

\

\ The deCODE recombination rate track represents\ calculated rates of recombination based on the\ deCODE recombination maps in 10 Kb bins from October 2010.\ Sex averaged-, Female- and male-specific recombination rates\ can be displayed by choosing the appropriate options on the track \ description page.

\

\ Corresponding to each of these tracks separate\ tracks for carriers and non-carriers of the PRDM9 14/15 composite\ allele can be displayed as well. There are also tracks depicting\ the difference between male- and female- recombination rates, and a\ track showing recombination hotspots, i.e. bins with standardized\ recombination rates higher than 10.

\

\ In addition to the deCODE display, three data tracks from the\ HapMap project\ are included. CEU, YRI and combined maps from release #24 can be\ turned on with the track visibility controls.

\ \

Methods

\

\ The deCODE genetic map was created at \ deCODE Genetics and is \ based on 289,658 and 8,411 SNPs on the autosomal and X chromosomes\ for 15,257 parent-offspring pairs. For more information on this map, see\ Kong, A. et al. (2010).

\

\ Each base is assigned the recombination rate calculated by\ assuming a linear genetic distance across the immediately flanking\ genetic markers. The recombination rate assigned to each 10 Kb window\ is the average recombination rate of the bases contained within the\ window. The recombination rates are standardized, bringing the average\ to 1 for all bins used for the standardization.

\ \

Credits

\

\ This track was produced at UCSC using data that are freely available for\ the deCODE genetic maps. Thanks to all who played a part in the\ creation of these maps.

\ \

References

\

\ Kong, A., Thorleifsson, G., Gudbjartsson, D.F., Masson, G., Sigurdsson, A.,\ Jonasdottir, A., Walters, G.B., Jonasdottir, A., Gylfason, A.,\ Kristinsson, Kari Th., Gudjonsson, S.A., Frigge, M.L., Helgason, A.,\ Thorsteinsdottir, U., Stefansson, K.\ Fine-scale recombination rate differences between sexes, populations and individuals,\ Nature, 467(7319), 1099-1103 (2010),\ and supplementary data nature09525-s1.pdf

\

\ Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., \ Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., \ Shlien, A., Palsson, S.T., Frigge, M.L., Thorgeirsson, T.E., Gulcher, J.R., \ and Stefansson, K.\ A high-resolution recombination map of the human genome,\ Nature Genetics, 31(3), 241-247 (2002).

\ map 1 chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21\ compositeTrack on\ group map\ longLabel deCODE Recombination maps, 10Kb bin size, October 2010\ maxHeightPixels 100:36:11\ noInherit on\ priority 8.5\ shortLabel deCODE Recomb\ subGroup1 view Views male=Male female=Female avg=Sex_Average diff=Sex_Difference hot=Hot_Spots other=Other_maps\ track decodeRmap\ type bed 3\ viewLimits 0:10\ visibility hide\ femaleView Female bed 3 deCODE Recombination maps, 10Kb bin size, October 2010 0 8.5 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 1 parent decodeRmap\ shortLabel Female\ track femaleView\ view female\ visibility hide\ otherMaps HapMap bigWig -1.0 111.0 HapMap Release 24 recombination maps 0 8.5 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 longLabel HapMap Release 24 recombination maps\ parent decodeRmap\ shortLabel HapMap\ track otherMaps\ type bigWig -1.0 111.0\ view other\ visibility hide\ hotView Hot Spots bed 4 deCODE recombination map, Female and Male hot spots, >= 10.0 1 8.5 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 1 longLabel deCODE recombination map, Female and Male hot spots, >= 10.0\ parent decodeRmap\ shortLabel Hot Spots\ track hotView\ type bed 4\ view hot\ visibility dense\ maleView Male bed 3 deCODE Recombination maps, 10Kb bin size, October 2010 0 8.5 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 1 parent decodeRmap\ shortLabel Male\ track maleView\ view male\ visibility hide\ diffView Male-Female bed 3 deCODE Recombination maps, 10Kb bin size, October 2010 2 8.5 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 1 parent decodeRmap\ shortLabel Male-Female\ track diffView\ view diff\ visibility full\ avgView Sex Average bed 3 deCODE Recombination maps, 10Kb bin size, October 2010 2 8.5 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 1 parent decodeRmap\ shortLabel Sex Average\ track avgView\ view avg\ visibility full\ encodeAffyChIpHl60PvalCebpeHr00 Affy CEBPe RA 0h wig 0.0 534.54 Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 0hrs) P-Value 0 9 200 25 0 227 140 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 200,25,0\ longLabel Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 0hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 9\ shortLabel Affy CEBPe RA 0h\ subGroups factor=CEBPe time=0h\ track encodeAffyChIpHl60PvalCebpeHr00\ encodeAffyChIpHl60SignalStrictPol2Hr00 Affy Pol2 0h wig -2.78 3.97 Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 0hrs) Strict Signal 0 9 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,175,0\ longLabel Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 0hrs) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 9\ shortLabel Affy Pol2 0h\ subGroups factor=Pol2 time=0h\ track encodeAffyChIpHl60SignalStrictPol2Hr00\ encodeAffyChIpHl60SitesStrictRnapHr00 Affy Pol2 0h bed 3 . Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 0hrs) Strict Sites 0 9 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 50,175,0\ longLabel Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 0hrs) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 9\ shortLabel Affy Pol2 0h\ subGroups factor=Pol2 time=0h\ track encodeAffyChIpHl60SitesStrictRnapHr00\ encodeAffyChIpHl60PvalStrictPol2Hr00 Affy Pol2 0h wig 0 696.62 Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 0hrs) Strict P-Value 0 9 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,175,0\ longLabel Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 0hrs) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 9\ shortLabel Affy Pol2 0h\ subGroups factor=Pol2 time=0h\ track encodeAffyChIpHl60PvalStrictPol2Hr00\ encodeBuFirstExonStomach BU Stomach bed 12 + Boston University First Exon Activity in Stomach 0 9 0 0 0 127 127 127 0 0 10 chr11,chr13,chr15,chr16,chr19,chr2,chr5,chr7,chr9,chrX, encodeTxLevels 1 longLabel Boston University First Exon Activity in Stomach\ parent encodeBuFirstExon\ priority 9\ shortLabel BU Stomach\ track encodeBuFirstExonStomach\ encodeTransFragsConservedIntronicDistal Cons Intron Dist bed 4 Conserved Intronic Distal 0 9 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 longLabel Conserved Intronic Distal\ parent encodeTransFrags\ priority 9\ shortLabel Cons Intron Dist\ track encodeTransFragsConservedIntronicDistal\ kiddEichlerDiscG248 Discordant G248 bed 12 HGSV Individual G248 Discordant Clone End Alignments 0 9 0 0 0 127 127 127 0 0 0 http://mrhgsv.gs.washington.edu/cgi-bin/hgc?i=$$&c=$S&l=$[&r=$]&db=$D&position=$S:$[-$] varRep 1 longLabel HGSV Individual G248 Discordant Clone End Alignments\ parent kiddEichlerDisc\ priority 9\ shortLabel Discordant G248\ track kiddEichlerDiscG248\ encodeAffyEc1FetalKidneySignal EC1 Sgnl FetalK wig 0 62385 Affy Ext Trans Signal (1-base window) (Fetal Kidney) 0 9 176 0 80 215 127 167 0 0 2 chr21,chr22, encodeTxLevels 0 color 176,0,80\ longLabel Affy Ext Trans Signal (1-base window) (Fetal Kidney)\ parent encodeAffyEcSignal\ priority 9\ shortLabel EC1 Sgnl FetalK\ track encodeAffyEc1FetalKidneySignal\ encodeAffyEc1FetalKidneySites EC1 Sites FetalK bed 3 . Affy Ext Trans Sites (1-base window) (Fetal Kidney) 0 9 176 0 80 215 127 167 0 0 2 chr21,chr22, encodeTxLevels 1 color 176,0,80\ longLabel Affy Ext Trans Sites (1-base window) (Fetal Kidney)\ parent encodeAffyEcSites\ priority 9\ shortLabel EC1 Sites FetalK\ track encodeAffyEc1FetalKidneySites\ encodeRegions ENCODE Regions bed 4 . Encyclopedia of DNA Elements (ENCODE) Regions 0 9 150 100 30 202 177 142 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track depicts target regions for the \ NHGRI ENCODE \ project.\ The long-term goal of this project is to identify all functional elements \ in the human genome sequence to facilitate a better understanding of human \ biology and disease.

\

\ During the pilot phase, 44 regions comprising 30 Mb — approximately \ 1% of the human genome — have been selected for intensive study to identify, \ locate and analyze functional elements within the regions. These targets are \ being studied by a diverse public research consortium to test and evaluate the\ efficacy of various methods, technologies, and strategies for locating \ genomic features. The outcome of this initial phase will form the basis for a \ larger-scale effort to analyze the entire human genome.

\

\ See the NHGRI target \ selection process web page for a description of how the target \ regions were selected.

\

\ To open a UCSC Genome Browser with a menu for selecting ENCODE regions on the \ human genome, use ENCODE Regions in the UCSC Browser. The UCSC resources \ provided for the ENCODE project are described on the \ UCSC ENCODE Portal.

\ \

Credits

\

\ Thanks to the NHGRI ENCODE project for providing this initial set of data.

\ \ encodeGenes 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 150,100,30\ dataVersion ENCODE June 2005 Freeze\ group encodeGenes\ longLabel Encyclopedia of DNA Elements (ENCODE) Regions\ origAssembly hg16\ priority 9.0\ shortLabel ENCODE Regions\ track encodeRegions\ type bed 4 .\ visibility hide\ encodeEgaspFullGeneIdU12 GeneID U12 genePred GeneID U12 Intron Predictions 0 9 200 132 12 227 193 133 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 200,132,12\ longLabel GeneID U12 Intron Predictions\ parent encodeEgaspFull\ priority 9\ shortLabel GeneID U12\ track encodeEgaspFullGeneIdU12\ hapmapSnpsMKK HapMap SNPs MKK bed 6 + HapMap SNPs from the MKK Population (Masai in Kinyawa, Kenya) 0 9 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HapMap SNPs from the MKK Population (Masai in Kinyawa, Kenya)\ parent hapmapSnps\ priority 9\ shortLabel HapMap SNPs MKK\ track hapmapSnpsMKK\ snpArrayIllumina1M Illumina 1M-Duo bed 6 + Illumina Human1M-Duo 0 9 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Illumina Human1M-Duo\ parent snpArray\ priority 9\ shortLabel Illumina 1M-Duo\ track snpArrayIllumina1M\ type bed 6 +\ encodeEgaspUpdJigsaw Jigsaw Update genePred Jigsaw Gene Predictions 0 9 22 150 20 138 202 137 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 22,150,20\ longLabel Jigsaw Gene Predictions\ parent encodeEgaspUpdate\ priority 9\ shortLabel Jigsaw Update\ track encodeEgaspUpdJigsaw\ encodeUcsdChipAch3Imr90_f LI H3ac IMR90 bedGraph 4 Ludwig Institute ChIP-chip: H3ac ab, IMR90 cells 0 9 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: H3ac ab, IMR90 cells\ parent encodeLIChIP\ priority 9\ shortLabel LI H3ac IMR90\ track encodeUcsdChipAch3Imr90_f\ encodeUcsdChipHeLaH3H4stat1_p0 LI STAT1 -gIF bedGraph 4 Ludwig Institute ChIP-chip: STAT1 ab, HeLa cells, no gamma interferon 0 9 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: STAT1 ab, HeLa cells, no gamma interferon\ parent encodeLIChIPgIF\ priority 9\ shortLabel LI STAT1 -gIF\ track encodeUcsdChipHeLaH3H4stat1_p0\ decodeMaleNonCarrier Male Non-carrier bigWig 0.0 151.353 deCODE recombination map, male non-carrier 0 9 0 128 140 127 191 197 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 color 0,128,140\ configurable on\ longLabel deCODE recombination map, male non-carrier\ parent maleView\ priority 9\ shortLabel Male Non-carrier\ subGroups view=male\ track decodeMaleNonCarrier\ type bigWig 0.0 151.353\ ctgPos Map Contigs ctgPos Physical Map Contigs 0 9 150 0 0 202 127 127 0 0 0

Description

\

\ This track shows the locations of human contigs on the physical map. \ The underlying data is derived from the NCBI seq_contig.md file \ that accompanies this assembly. All contigs are "+" oriented in\ the assembly.

\ \

Method

\

\ For human genome reference sequences dated April 2003 and later,\ the individual chromosome sequencing centers are responsible\ for preparing the assembly of their chromosomes in \ AGP format. The\ files provided by these centers are checked and validated at NCBI, and\ form the basis for the seq_contig.md file that defines the physical \ map contigs.

\

\ For more information on the human genome assembly process, see \ The NCBI Handbook.

\ map 0 color 150,0,0\ group map\ longLabel Physical Map Contigs\ priority 9\ shortLabel Map Contigs\ track ctgPos\ type ctgPos\ visibility hide\ netSyntenyMonDom4 Opossum Syn Net netAlign monDom4 chainMonDom4 Opossum (Jan. 2006 (Broad/monDom4)) Syntenic Alignment Net 0 9 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb monDom4\ parent syntenicNet\ priority 9\ shortLabel Opossum Syn Net\ spectrum on\ track netSyntenyMonDom4\ type netAlign monDom4 chainMonDom4\ visibility hide\ encodeGencodeRaceFragsPlacenta RACEfrags Placenta genePred Gencode RACEfrags from Placenta 0 9 176 0 80 215 127 167 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 176,0,80\ longLabel Gencode RACEfrags from Placenta\ parent encodeGencodeRaceFrags\ priority 9\ shortLabel RACEfrags Placenta\ track encodeGencodeRaceFragsPlacenta\ encodeSangerChipH4acK562 SI H4ac K562 bedGraph 4 Sanger Institute ChIP/Chip (H4ac ab, K562 cells) 0 9 10 10 10 132 132 132 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 10,10,10\ longLabel Sanger Institute ChIP/Chip (H4ac ab, K562 cells)\ parent encodeSangerChipH3H4\ priority 9\ shortLabel SI H4ac K562\ track encodeSangerChipH4acK562\ stanfordChipK562SRF Stan K562 SRF bedGraph 4 Stanford ChIP-chip (K562 cells, SRF ChIP) 0 9 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 0 longLabel Stanford ChIP-chip (K562 cells, SRF ChIP)\ parent stanfordChip\ priority 9\ shortLabel Stan K562 SRF\ track stanfordChipK562SRF\ encodeStanfordChipK562SRF Stan K562 SRF bedGraph 4 Stanford ChIP-chip (K562 cells, SRF ChIP) 0 9 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (K562 cells, SRF ChIP)\ parent encodeStanfordChipJohnson\ priority 9\ shortLabel Stan K562 SRF\ track encodeStanfordChipK562SRF\ encodeStanfordPromotersHela Stan Pro Hela bed 9 + Stanford Promoter Activity (HeLa cells) 0 9 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (HeLa cells)\ parent encodeStanfordPromoters\ priority 9\ shortLabel Stan Pro Hela\ track encodeStanfordPromotersHela\ kiddEichlerValidG248 Validated G248 bed 9 HGSV Individual G248 Validated Sites of Structural Variation 0 9 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HGSV Individual G248 Validated Sites of Structural Variation\ parent kiddEichlerValid\ priority 9\ shortLabel Validated G248\ track kiddEichlerValidG248\ encodeYaleAffyNB4UntrRNATarsIntronsProximal Yale In Prx NB4 Un bed 4 . Yale Intronic Proximal NB4 TARs 0 9 152 0 104 203 127 179 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 152,0,104\ longLabel Yale Intronic Proximal NB4 TARs\ parent encodeNoncodingTransFrags\ priority 9\ shortLabel Yale In Prx NB4 Un\ subGroups region=intronicProximal celltype=nb4 source=yale\ track encodeYaleAffyNB4UntrRNATarsIntronsProximal\ encodeYaleAffyNeutRNATransMap08 Yale RNA Neu 8 wig -2730 3394 Yale Neutrophil RNA Transcript Map, Sample 8 0 9 50 100 50 152 177 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,100,50\ longLabel Yale Neutrophil RNA Transcript Map, Sample 8\ parent encodeYaleAffyRNATransMap\ priority 9\ shortLabel Yale RNA Neu 8\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATransMap08\ encodeYaleAffyNeutRNATars08 Yale TAR Neu 8 bed 3 . Yale Neutrophil RNA Transcriptionally Active Region, Sample 8 0 9 50 100 50 152 177 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,100,50\ longLabel Yale Neutrophil RNA Transcriptionally Active Region, Sample 8\ parent encodeYaleAffyRNATars\ priority 9\ shortLabel Yale TAR Neu 8\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATars08\ encodeAffyChIpHl60SitesCebpeHr00 Affy CEBPe RA 0h bed 3 . Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 0hrs) Sites 0 10 200 25 0 227 140 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 200,25,0\ longLabel Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 0hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 10\ shortLabel Affy CEBPe RA 0h\ subGroups factor=CEBPe time=0h\ track encodeAffyChIpHl60SitesCebpeHr00\ encodeAffyChIpHl60SignalStrictPol2Hr02 Affy Pol2 2h wig -2.78 3.97 Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 2hrs) Strict Signal 0 10 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,175,0\ longLabel Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 2hrs) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 10\ shortLabel Affy Pol2 2h\ subGroups factor=Pol2 time=2h\ track encodeAffyChIpHl60SignalStrictPol2Hr02\ encodeAffyChIpHl60SitesStrictRnapHr02 Affy Pol2 2h bed 3 . Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 2hrs) Strict Sites 0 10 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 50,175,0\ longLabel Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 2hrs) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 10\ shortLabel Affy Pol2 2h\ subGroups factor=Pol2 time=2h\ track encodeAffyChIpHl60SitesStrictRnapHr02\ encodeAffyChIpHl60PvalStrictPol2Hr02 Affy Pol2 2h wig 0 696.62 Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 2hrs) Strict P-Value 0 10 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,175,0\ longLabel Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 2hrs) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 10\ shortLabel Affy Pol2 2h\ subGroups factor=Pol2 time=2h\ track encodeAffyChIpHl60PvalStrictPol2Hr02\ gold Assembly bed 3 + Assembly from Fragments 0 10 150 100 30 230 170 40 0 0 0

Description

\

This track shows the draft assembly of the human genome.\ This assembly merges contigs from overlapping drafts and\ finished clones into longer sequence contigs. The sequence\ contigs are ordered and oriented when possible by mRNA, EST,\ paired plasmid reads (from the SNP Consortium) and BAC end\ sequence pairs.

\

In dense mode, this track depicts the path through the draft and \ finished clones (aka the golden path) used to create the assembled sequence. \ Clone boundaries are distinguished by the use of alternating gold and brown \ coloration. Where gaps\ exist in the path, spaces are shown between the gold and brown\ blocks. If the relative order and orientation of the contigs\ between the two blocks is known, a line is drawn to bridge the\ blocks.

\

\ Clone Type Key:\

\ \ map 1 altColor 230,170,40\ color 150,100,30\ group map\ longLabel Assembly from Fragments\ priority 10\ shortLabel Assembly\ track gold\ type bed 3 +\ visibility hide\ encodeBuFirstExonTestis BU Testis bed 12 + Boston University First Exon Activity in Testis 0 10 0 0 0 127 127 127 0 0 10 chr11,chr13,chr15,chr16,chr19,chr2,chr5,chr7,chr9,chrX, encodeTxLevels 1 longLabel Boston University First Exon Activity in Testis\ parent encodeBuFirstExon\ priority 10\ shortLabel BU Testis\ track encodeBuFirstExonTestis\ encodeAffyEc51FetalKidneySignal EC51 Sgnl FetalK wig 0 62385 Affy Ext Trans Signal (51-base window) (Fetal Kidney) 0 10 176 0 80 215 127 167 0 0 2 chr21,chr22, encodeTxLevels 0 color 176,0,80\ longLabel Affy Ext Trans Signal (51-base window) (Fetal Kidney)\ parent encodeAffyEcSignal\ priority 10\ shortLabel EC51 Sgnl FetalK\ track encodeAffyEc51FetalKidneySignal\ encodeAffyEc51FetalKidneySites EC51 Site FetalK bed 3 . Affy Ext Trans Sites (51-base window) (Fetal Kidney) 0 10 176 0 80 215 127 167 0 0 2 chr21,chr22, encodeTxLevels 1 color 176,0,80\ longLabel Affy Ext Trans Sites (51-base window) (Fetal Kidney)\ parent encodeAffyEcSites\ priority 10\ shortLabel EC51 Site FetalK\ track encodeAffyEc51FetalKidneySites\ encodeEgaspFullGenemark GeneMark genePred GeneMark Gene Predictions 0 10 100 12 100 177 133 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 100,12,100\ longLabel GeneMark Gene Predictions\ parent encodeEgaspFull\ priority 10\ shortLabel GeneMark\ track encodeEgaspFullGenemark\ hapmapSnpsTSI HapMap SNPs TSI bed 6 + HapMap SNPs from the TSI Population (Toscani in Italia) 0 10 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HapMap SNPs from the TSI Population (Toscani in Italia)\ parent hapmapSnps\ priority 10\ shortLabel HapMap SNPs TSI\ track hapmapSnpsTSI\ snpArrayIlluminaHumanCytoSNP_12 Illumina Cyto-12 bed 6 + Illumina Human CytoSNP-12 0 10 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Illumina Human CytoSNP-12\ parent snpArray\ priority 10\ shortLabel Illumina Cyto-12\ track snpArrayIlluminaHumanCytoSNP_12\ type bed 6 +\ encodeUcsdChipMeh3k4Imr90_f LI H3K4me2 IMR90 bedGraph 4 Ludwig Institute ChIP-chip: H3K4me2 ab, IMR90 cells 0 10 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: H3K4me2 ab, IMR90 cells\ parent encodeLIChIP\ priority 10\ shortLabel LI H3K4me2 IMR90\ track encodeUcsdChipMeh3k4Imr90_f\ encodeUcsdChipHeLaH3H4stat1_p30 LI STAT1 +gIF bedGraph 4 Ludwig Institute ChIP-chip: STAT1 ab, HeLa cells, 30 min. after gamma interferon 0 10 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: STAT1 ab, HeLa cells, 30 min. after gamma interferon\ parent encodeLIChIPgIF\ priority 10\ shortLabel LI STAT1 +gIF\ track encodeUcsdChipHeLaH3H4stat1_p30\ encodeGencodeRaceFragsSmallIntest RACEfrags Sm Int genePred Gencode RACEfrags from Small Intestine 0 10 164 0 92 209 127 173 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 164,0,92\ longLabel Gencode RACEfrags from Small Intestine\ parent encodeGencodeRaceFrags\ priority 10\ shortLabel RACEfrags Sm Int\ track encodeGencodeRaceFragsSmallIntest\ decodeMaleFemaleDifference Sex Difference bigWig -65 94 deCODE recombination map, male minus female difference 2 10 0 0 128 128 0 0 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 altColor 128,0,0\ color 0,0,128\ configurable on\ longLabel deCODE recombination map, male minus female difference\ parent diffView\ priority 10\ shortLabel Sex Difference\ subGroups view=diff\ track decodeMaleFemaleDifference\ type bigWig -65 94\ viewLimits -20:20\ encodeEgaspUpdSgp2 SGP2 Update genePred SGP2 Gene Predictions 0 10 100 12 100 177 133 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 100,12,100\ longLabel SGP2 Gene Predictions\ parent encodeEgaspUpdate\ priority 10\ shortLabel SGP2 Update\ track encodeEgaspUpdSgp2\ stanfordChipK562TAF Stan K562 TAF bedGraph 4 Stanford ChIP-chip (K562 cells, TAF ChIP) 0 10 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 0 longLabel Stanford ChIP-chip (K562 cells, TAF ChIP)\ parent stanfordChip\ priority 10\ shortLabel Stan K562 TAF\ track stanfordChipK562TAF\ encodeStanfordChipK562TAF Stan K562 TAF bedGraph 4 Stanford ChIP-chip (K562 cells, TAF ChIP) 0 10 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (K562 cells, TAF ChIP)\ parent encodeStanfordChipJohnson\ priority 10\ shortLabel Stan K562 TAF\ track encodeStanfordChipK562TAF\ encodeStanfordPromotersHepG2 Stan Pro HepG2 bed 9 + Stanford Promoter Activity (HepG2 cells) 0 10 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (HepG2 cells)\ parent encodeStanfordPromoters\ priority 10\ shortLabel Stan Pro HepG2\ track encodeStanfordPromotersHepG2\ encodeYaleAffyNeutRNATarsAllIntronsProximal Yale In Prx Neu bed 4 . Yale Intronic Proximal Neutrophil TARs 0 10 140 0 116 197 127 185 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 140,0,116\ longLabel Yale Intronic Proximal Neutrophil TARs\ parent encodeNoncodingTransFrags\ priority 10\ shortLabel Yale In Prx Neu\ subGroups region=intronicProximal celltype=neut source=yale\ track encodeYaleAffyNeutRNATarsAllIntronsProximal\ encodeYaleAffyNeutRNATransMap09 Yale RNA Neu 9 wig -2730 3394 Yale Neutrophil RNA Transcript Map, Sample 9 0 10 50 85 50 152 170 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,85,50\ longLabel Yale Neutrophil RNA Transcript Map, Sample 9\ parent encodeYaleAffyRNATransMap\ priority 10\ shortLabel Yale RNA Neu 9\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATransMap09\ encodeYaleAffyNeutRNATars09 Yale TAR Neu 9 bed 3 . Yale Neutrophil RNA Transcriptionally Active Region, Sample 9 0 10 50 85 50 152 170 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,85,50\ longLabel Yale Neutrophil RNA Transcriptionally Active Region, Sample 9\ parent encodeYaleAffyRNATars\ priority 10\ shortLabel Yale TAR Neu 9\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATars09\ encodeGencodeGene Gencode Genes genePred Gencode Gene Annotations 0 10.1 73 76 73 164 165 164 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ The Gencode Gene track shows high-quality manual annotations in the\ ENCODE regions generated by the\ GENCODE project.\ A companion track, Gencode Introns, shows experimental gene structure \ validations for these annotations.

\

\ The gene annotations are colored based on the Havana annotation type.\ Known and validated transcripts \ are colored dark green,\ putative and unconfirmed are light green,\ pseudogenes are blue,\ and artifacts are grey. \ The transcript types are defined in more detail in the accompanying table.\

\ The Gencode project recommends that the annotations\ with known and validated transcripts; i.e., the types Known, \ Novel_CDS, Novel_transcript_gencode_conf, and \ Putative_gencode_conf (which are colored dark green in the track display) \ be used as the reference annotation.\ \

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
TypeColorDescription
Knowndark greenKnown protein coding genes (referenced in Entrez Gene, NCBI)
Novel_CDSdark greenNovel protein coding genes annotated by Havana (not referenced in Entrez Gene, NCBI)
Novel_transcript_gencode_confdark green Novel transcripts annotated by Havana (no ORF assigned) with at least\ one junction validated by RT-PCR
Putative_gencode_confdark greenPutative transcripts (similar to "novel transcripts", EST supported,\ short, no viable ORF) with at least one junction validated by RT-PCR
Novel_transcriptlight greenNovel transcripts annotated by Havana (no ORF assigned) not validated \ by RT-PCR
Putativelight greenPutative transcripts (similar to "novel transcripts", EST supported,\ short, no viable ORF) not validated by RT-PCR
TEClight greenSingle exon objects (supported by multiple ESTs with polyA \ sites and signals) undergoing experimental validation/extension. \
Processed_pseudogene bluePseudogenes arising via retrotransposition (exon structure of parent gene lost)
Unprocessed_pseudogene bluePseudogenes arising via gene duplication (exon structure of parent gene retained)
ArtifactgreyTranscript evidence and/or its translation equivocal

\ \

Methods

\

\ The Human and Vertebrate Analysis and Annotation manual curation process \ (HAVANA) was\ used to produce these annotations.\

\ Finished genomic sequence was analyzed on a clone-by-clone basis using a\ combination of similarity searches against DNA and protein databases, as\ well as a series of ab initio gene predictions. Nucleotide sequence \ databases were searched with WUBLASTN and significant hits were realigned\ to the unmasked genomic sequence by EST2GENOME. WUBLASTX was used to search \ the Uniprot protein database, and the accession numbers of significant hits \ were retrieved from the Pfam database. Hidden Markov models for Pfam protein \ domains were aligned against the genomic sequence using Genewise to provide\ annotation of protein domains. \

\ A number of ab initio\ prediction algorithms were also run: Genscan and Fgenesh for genes, tRNAscan \ to find tRNA genes, and Eponine TSS for transcription start site predictions.\

\ The annotators used the (AceDB-based) Otterlace interface to create and\ edit gene objects, which were then stored in a local database named \ Otter. In cases where predicted transcript structures from Ensembl \ are available, these can be viewed from within the Otterlace interface and \ may be used as starting templates for gene curation. Annotation in the Otter \ database is submitted to the EMBL/Genbank/DDBJ nucleotide database.

\ \

Verification

\

\ The gene objects selected for verification came from various\ computational prediction methods and HAVANA annotations. \

RT-PCR and RACE experiments were performed on them, using a variety of human\ tissues, to confirm their structure. Human cDNAs from 24 different\ tissues (brain, heart, kidney, spleen, liver, colon, small intestine,\ muscle, lung, stomach, testis, placenta, skin, peripheral blood\ leucocytes, bone marrow, fetal brain, fetal liver, fetal kidney, fetal\ heart, fetal lung, thymus, pancreas, mammary gland, prostate) were\ synthesized using 12 poly(A)+ RNAs from Origene, eight from Clemente\ Associates/Quantum Magnetics and four from BD Biosciences as described in\ [Reymond et al., 2002a,b]. The relative amount of each cDNA was \ normalized by quantitative PCR using SyberGreen as intercalator and an \ ABI Prism 7700 Sequence Detection System.

\

\ Predictions of human genes junctions were assayed experimentally by\ RT-PCR as previously described and modified [Reymond, 2002b;\ Mouse Genome Sequencing Consortium, 2002; Guigo, 2003]. \

\ Similar amounts of Homo\ sapiens cDNAs were mixed with JumpStart REDTaq ReadyMix (Sigma) and four\ ng/ul primers (Sigma-Genosys) with a BioMek 2000 robot (Beckman). The\ ten first cycles of PCR amplification were performed with a touchdown\ annealing temperatures decreasing from 60 to 50°C; annealing\ temperature of the next 30 cycles was carried out at 50°C. Amplimers\ were separated on "Ready to Run" precast gels (Pharmacia) and\ sequenced. RACE experiments were performed with the BD SMART RACE cDNA\ Amplification Kit following the manufacturer instructions (BD\ Biosciences).

\ \

Credits

\

\ Click here for a complete list of people who participated in the \ GENCODE project.

\ \

References

\

\ Ashurst, J.L. et al. \ The Vertebrate Genome Annotation (Vega) database. \ Nucleic Acids Res 33 (Database Issue), D459-65 \ (2005).

\

\ Guigo, R. et al. \ Comparison of mouse and human genomes followed by experimental \ verification yields an estimated 1,019 additional genes. \ Proc Natl Acad Sci U S A 100(3), 1140-5 (2003).

\

\ Mouse Genome Sequencing Consortium. \ Initial sequencing and comparative analysis of the mouse \ genome. Nature 420(6915), 520-62 (2002).

\

\ Reymond, A. et al. \ Human chromosome 21 gene expression atlas in the mouse. \ Nature 420(6915), 582-6 (2002).

\

\ Reymond, A. et al. \ Nineteen additional unpredicted transcripts from human \ chromosome 21. Genomics 79(6), 824-32 (2002).

\ encodeGenes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 73,76,73\ dataVersion ENCODE June 2005 Freeze\ gClass_Artifact 163,168,163\ gClass_Known 33,91,51\ gClass_Novel_CDS 33,91,51\ gClass_Novel_transcript 84,188,0\ gClass_Novel_transcript_gencode_conf 33,91,51\ gClass_Processed_pseudogene 0,91,191\ gClass_Putative 84,188,0\ gClass_Putative_gencode_conf 33,91,51\ gClass_TEC 84,188,0\ gClass_Unprocessed_pseudogene 0,91,191\ geneClasses Artifact Known Novel_CDS Novel_transcript Novel_transcript_gencode_conf Putative Putative_gencode_conf TEC Processed_pseudogene Unprocessed_pseudogene\ group encodeGenes\ itemClassTbl gencodeGeneClass\ longLabel Gencode Gene Annotations\ origAssembly hg17\ priority 10.1\ shortLabel Gencode Genes\ track encodeGencodeGene\ type genePred\ visibility hide\ encodeGencodeIntron Gencode Introns bed 6 + Gencode Intron Validation 0 10.2 0 0 0 127 127 127 0 0 20 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr8,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr9,chrX,

Description

\

\ The Gencode Intron Validation track shows\ gene structure validations generated by the\ GENCODE project.\ This track serves as a companion to the Gencode Genes track.

\

\ The items in this track are colored based on the validation status determined \ via RT-PCR of exons flanking the intron:

\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
StatusColorValidation Result
RT_positivegreenIntron validated (PCR product corresponds to expected junction)
RT_negative\ redIntron not validated (no PCR product was obtained)
RT_wrong_junctiongoldIntron not validated, but another junction exists between the two\ (PCR product does not correspond to the expected junction)

\ \

Methods

\

\ Selected gene models from the Genecode Genes track were picked for RT-PCR \ and RACE verification experiments.\ RT-PCR and RACE experiments were performed on the objects, using a variety of \ human tissues, to confirm their structure. Human cDNAs from 24 different\ tissues (brain, heart, kidney, spleen, liver, colon, small intestine,\ muscle, lung, stomach, testis, placenta, skin, peripheral blood\ leucocytes, bone marrow, fetal brain, fetal liver, fetal kidney, fetal\ heart, fetal lung, thymus, pancreas, mammary gland, prostate) were\ synthesized using twelve poly(A)+ RNAs from Origene, eight from Clemente\ Associates/Quantum Magnetics and four from BD Biosciences as described in\ [Reymond et al., 2002a,b]. The relative amount of each cDNA was \ normalized with glyceraldehyde-3-phosphate dehydrogenase (GAPDH) by quantitative\ PCR using SyberGreen as intercalator and \ an ABI Prism 7700 Sequence Detection System.

\

\ Predictions of human genes junctions were assayed experimentally by\ RT-PCR as previously described and modified [Reymond, 2002b;\ Mouse Genome Sequencing Consortium, 2002; Guigo, 2003]. \

\ Similar amounts of Homo\ sapiens cDNAs were mixed with JumpStart REDTaq ReadyMix (Sigma) and 4\ ng/ul primers (Sigma-Genosys) with a BioMek 2000 robot (Beckman). The\ ten first cycles of PCR amplification were performed with a touchdown\ annealing temperatures decreasing from 60 to 50°C; annealing\ temperature of the next 30 cycles was carried out at 50°C. Amplimers\ were separated on "Ready to Run" precast gels (Pharmacia) and\ sequenced. RACE experiments were performed with the BD SMART RACE cDNA\ Amplification Kit following the manufacturer instructions (BD\ Biosciences).

\ \

Credits

\

\ Click here for a complete list of people who participated in the \ GENCODE project.

\ \

References

\

\ Ashurst, J.L. et al. \ The Vertebrate Genome Annotation (Vega) database. \ Nucleic Acids Res 33 (Database Issue), D459-65 \ (2005).

\

\ Guigo, R. et al. \ Comparison of mouse and human genomes followed by experimental \ verification yields an estimated 1,019 additional genes. \ Proc Natl Acad Sci U S A 100(3), 1140-5 (2003).

\

\ Mouse Genome Sequencing Consortium.\ Initial sequencing and comparative analysis of the mouse \ genome. Nature 420(6915), 520-62 (2002).

\

\ Reymond, A. et al. \ Human chromosome 21 gene expression atlas in the mouse. \ Nature 420(6915), 582-6 (2002).

\

\ Reymond, A. et al. \ Nineteen additional unpredicted transcripts from human \ chromosome 21. Genomics 79(6), 824-32 (2002).

\ encodeGenes 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr8,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr9,chrX\ dataVersion ENCODE June 2005 Freeze\ group encodeGenes\ longLabel Gencode Intron Validation\ origAssembly hg17\ priority 10.2\ shortLabel Gencode Introns\ track encodeGencodeIntron\ type bed 6 +\ visibility hide\ encodeGencodeSuper Gencode Genes Gencode Gene Annotation 0 10.3 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Overview

\

\ This super-track combines related tracks from the\ GENCODE project.\ The goal of this project is to identify all protein-coding genes\ in the ENCODE regions using a pipeline that uses computational predictions, \ experimental verification, and manual annotation, based on\ the Sanger\ Havana process.\

\ \

Gencode Genes Mar07

\

\ This track shows gene annotations from the GENCODE \ release v3.1 (March 2007). These annotations contain\ updates and corrections to the GENCODE October 2005 annotations,\ based on validation data from 5' RACE and RT-PCR experiments,\ which are displayed in the Gencode RACEfrags and Gencode Introns Oct05\ tracks.\

\ \

Gencode RACEfrags

\

\ This track shows the products of 5' RACE reactions performed on\ GENCODE genes in 12 tissues and 3 cell lines,\ as assayed on Affymetrix ENCODE 20nt tiling arays.\ The results were used to annotate 5' transcription\ start sites and internal exons of all annotated protein-coding\ loci in the Oct. 2005 GENCODE freeze.\ \

Gencode Genes Oct05

\

\ This track shows gene annotations from the GENCODE release v2.2 (Oct 2005), \ which was released as part of the ENCODE October 2005\ data freeze.\

\ \

Gencode Introns Oct05

\ This track shows validation status of the introns\ in selected gene models from the Gencode Oct 05 gene\ annotations, as identified by RT-PCR and RACE experiments\ in 24 human tissues.\

\

\ \

Credits

\

\ This GENCODE release is the result of a collaborative effort among\ the following laboratories:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
\ \ Lab/Institution
\
Contributors
\
HAVANA annotation\ \ group, Wellcome Trust Sanger Insitute, Hinxton, UKAdam Frankish, Jonathan Mudge, James\ \ Gilbert, Tim Hubbard, Jennifer Harrow
\
Genome Bioinformatics\ \ Lab CRG, Barcelona, SpainFrance Denoeud, Julien Lagarde, Sylvain\ \ Foissac, Robert Castelo, Roderic Guigó (GENCODE Principal\ \ Investigator)
Department of\ \ Genetic Medicine and Development, University of Geneva, SwitzerlandCatherine Ucla, Carine Wyss,\ \ Caroline Manzano, Colette Rossier, Stylianos E. Antonorakis
Center for\ \ Integrative Genomics, University of Lausanne, SwitzerlandJacqueline Chrast, Charlotte N.\ \ Henrichsen, Alexandre Reymond
Affymetrix, Inc.,\ \ Santa Clara, CA, USAPhilipp Kapranov, Thomas R. Gingeras
\ \

\ The RACEfrags result from a collaborative effort among the following\ laboratories:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
\ \ Lab/Institution
\ \
Contributors
\
Genome Bioinformatics\ \ Lab CRG, Barcelona, SpainFrance Denoeud, Julien Lagarde, Tyler Alioto, Sylvain\ \ Foissac, Robert Castelo, Roderic Guigó
Department of\ \ Genetic Medicine and Development, University of Geneva, SwitzerlandCatherine Ucla, Carine Wyss,\ \ Caroline Manzano, Colette Rossier, Stylianos E. Antonorakis
Center for\ \ Integrative Genomics, University of Lausanne, SwitzerlandJacqueline Chrast, Charlotte N.\ \ Henrichsen, Alexandre Reymond
Affymetrix, Inc.,\ \ Santa Clara, CA, USAPhilipp Kapranov, Jorg Drenkow, Sujit Dike, Jill Cheng, Thomas R. Gingeras
HAVANA annotation\ \ group, Wellcome Trust Sanger Insitute, Hinxton, UKAdam Frankish, James\ \ Gilbert, Tim Hubbard, Jennifer Harrow
\ \
\ \

References

\

\ Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J,\ Lagarde J, Alioto TS, Manzano C, Chrast J et al.\ Prominent use of distal 5' transcription start sites and discovery\ of a large number of additional exons in ENCODE regions.\ Genome Res. 2007 Jun;17(6):746-59.

\

\ Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J,\ Lagarde J, Gilbert JG, Storey R, Swarbreck D et al.\ GENCODE: producing a reference annotation for ENCODE.\ Genome Biol. 2006;7 Suppl 1:S4.1-9.

\ encodeGenes 0 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ group encodeGenes\ longLabel Gencode Gene Annotation\ priority 10.3\ shortLabel Gencode Genes\ superTrack on\ track encodeGencodeSuper\ encodeGencodeGeneMar07 Gencode Genes Mar07 genePred Gencode Gene Annotations (March 2007) 0 10.4 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ The Gencode Genes track (v3.1, March 2007) shows high-quality manual\ annotations in the ENCODE regions generated by the\ GENCODE project.\

\

\ The gene annotations are colored based on the HAVANA annotation type. See the \ table below for the color key, as well as more detail about the transcript\ and feature types. The Gencode project recommends that the annotations\ with known and validated transcripts; i.e., the types Known\ and Novel_CDS (which are colored\ dark green in the track\ display) be used as the reference gene annotation.\

\

The v3.1 release includes the following updates and enhancements to v2.2 \ (Oct. 2005):\

\

\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
TypeColorDescription
Knowndark greenKnown protein-coding genes (i.e., referenced\ \ in Entrez Gene)
Novel_CDSdark greenHave an open reading frame (ORF) and are identical, or\ \ have homology, to cDNAs or proteins but do not fall into the above\ \ category. These can be known in the sense that they are represented by\ \ mRNA sequences in the public databases, but they are not yet \ \ represented\ \ in Entrez Gene or have not received an official gene name. They can \ \ also\ \ be novel in that they are not yet represented by an mRNA sequence in\ \ human.
Novel_transcriptlight greenSimilar to Novel_CDS; however, cannot be assigned an unambigous \ \ ORF.
Putativelight greenHave identical, or have homology to spliced ESTs, but\ \ are devoid of significant ORF and polyA features. These are\ \ generally short (two or three exon) genes or gene fragments.
TEClight green(To Experimentally Confirm)\ \ Single-exon objects (supported by multiple unspliced ESTs with polyA\ \ sites and signals).
Polymorphicpurple Have functional transcripts in one haplotype and "pseudo"\ \ (non-functional) transcripts in another.
Processed_pseudogene bluePseudogenes that lack introns and are thought to arise\ \ from reverse transcription of mRNA followed by reinsertion of\ \ DNA into the genome.
Unprocessed_pseudogene bluePseudogenes that can contain introns, as they are\ \ produced by gene duplication.
ArtifactgreyTranscript evidence and/or its translation equivocal.\ \ Usually these arise from high-throughput cDNA sequencing projects that\ \ submit automatic annotation, sometimes resulting in erroneous CDSs in\ \ what turns out to be, for example, 3' UTRs. In addition HAVANA has\ \ extended this category to include cDNAs with non-canonical splice sites\ \ due to deletion/sequencing errors.
PolyA_signalbrown Polyadenylation signal
PolyA_siteorange Polyadenylation site
Pseudo_polyApink "Pseudo"-polyadenylation signal detected in the sequence\ \ of a processed pseudogene.
\ Warning: \ Pseudo_polyA features and processed_pseudogenes\ generally don't overlap. The reason is that pseudogene annotations are\ based solely on protein evidence, whereas pseudo_polyA signals are\ identified from transcript evidence; as they are found at the end of\ the 3' UTR, they can lie several kb downstream of the 3' end of the\ pseudogene.
\

\

\ The current full set of GENCODE annotations is available for download \ here.
\

\ \

Methods

\

\ For a detailed description of the methods and references used, see Harrow \ et al., 2006 and Denoeud et al., 2007. \ \

5' RACE/array experiments

\

\ A combination of 5’ RACE and \ high-density tiling microarrays were used to empirically annotate 5’ \ transcription start sites (TSSs) and internal exons of all 410 annotated\ protein-coding loci across the 44 ENCODE regions (Oct. 2005 GENCODE\ freeze). The 5’ RACE reactions were performed with oligonucleotides\ mapping to a coding exon common to most of the transcripts of a protein-coding \ gene locus annotated by GENCODE (Oct. 2005 freeze) on polyA+ RNA\ from twelve adult human tissues (brain, heart, kidney, spleen, liver,\ colon, small intestine, muscle, lung, stomach, testis, placenta) and\ three cell lines \ (GM06990 (lymphoblastoid), \ HL60 (acute promyelocytic leukemia) and\ HeLaS3 (cervix carcinoma)).

\

\ The RACE reactions were then hybridized to 20 nucleotide-resolution\ Affymetrix tiling arrays covering the non-repeated regions of the 44\ ENCODE regions. The resulting "RACEfrags"\ -- array-detected fragments of RACE products -- were assessed for\ novelty by comparing their genome coordinates to those of\ GENCODE-annotated exons. Connectivity between novel RACEfrags and their\ respective index exon were further investigated by RT-PCR, cloning and\ sequencing. The resulting cDNA sequences (deposited in GenBank under\ accession numbers DQ655905-DQ656069 and\ EF070113-EF070122) were then fed into the HAVANA annotation pipeline as\ mRNA evidence (see "HAVANA manual annotations" below).\ \

HAVANA manual annotations

\

\ The HAVANA\ process was used to produce these annotations.\

\

\ Before the manual annotation process begins, an automated analysis pipeline\ for similarity searches and ab initio predictions is run\ on a computer farm and stored in an Ensembl MySQL\ database using a modified Ensembl analysis pipeline system. All\ searches and prediction algorithms, except CpG island prediction (see\ cpgreport in the EMBOSS application suite), are run on repeat-masked\ sequence. RepeatMasker is used to mask interspersed repeats, followed by Tandem\ repeats finder to mask tandem repeats.\

\ Nucleotide sequence databases are searched with wuBLASTN, and\ significant hits are re-aligned to the unmasked genomic sequence using\ est2genome.\ The UniProt protein database is searched with wuBLASTX, and the\ accession numbers of significant hits are found in the Pfam\ database. The hidden Markov models for Pfam protein domains are aligned\ against the genomic sequence using Genewise to provide annotation of\ protein domains.

\

\ Several ab initio prediction algorithms are also run:\ Genescan and Fgenesh for genes, tRNAscan to find tRNAgenes and Eponine\ TSS to predict transcription start sites.

\

\ Once the automated analysis is complete, the annotator uses a Perl/Tk\ based graphical interface, "otterlace", developed in-house at \ the Wellcome Trust Sanger Institute to edit annotation data held in a \ separate MySQL database system. The interface displays a rich,\ interactive graphical view of the genomic region, showing features such as \ database matches, gene predictions, and transcripts created by the\ annotators. Gapped alignments of nucleotide and protein blast hits to\ the genomic sequence are viewed and explored using the "Blixem"\ alignment viewer.

\

\ Additionally, the "Dotter" dot plot tool is used to show the\ pairwise alignments of unmasked sequence, thus revealing the location\ of exons that are occasionally missed by the automated blast searches\ because of their small size and/or match to repeat-masked sequence.

\

\ The interface provides a number of tools that the annotator uses to\ build genes and edit annotations: adding transcripts, exon coordinates,\ translation regions, gene names and descriptions, remarks and\ polyadenlyation signals and sites.

\ \

Verification

\

\ See Harrow et al., 2006 for information on verification techniques.\

\ \

Credits

\

\ This GENCODE release is the result of a collaborative effort among\ the following laboratories:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
\ \ Lab/Institution
\
Contributors
\
HAVANA annotation \ \ group, Wellcome Trust Sanger Insitute, Hinxton, UKAdam Frankish, Jonathan Mudge, James \ \ Gilbert, Tim Hubbard, Jennifer Harrow
\
Genome Bioinformatics\ \ Lab CRG, Barcelona, SpainFrance Denoeud, Julien Lagarde, Sylvain \ \ Foissac, Robert Castelo, Roderic Guigó (GENCODE Principal \ \ Investigator)
Department of \ \ Genetic Medicine and Development, University of Geneva, SwitzerlandCatherine Ucla, Carine Wyss,\ \ Caroline Manzano, Colette Rossier, Stylianos E. Antonorakis
Center for \ \ Integrative Genomics, University of Lausanne, SwitzerlandJacqueline Chrast, Charlotte N.\ \ Henrichsen, Alexandre Reymond
Affymetrix, Inc., \ \ Santa Clara, CA, USAPhilipp Kapranov, Thomas R. Gingeras
\ \

References

\

\ Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, Lagarde J, \ Alioto TS, Manzano C, Chrast J et al. \ \ Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions \ Genome Res. 2007 Jun;17(6):746-759.

\

\ Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J,\ Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. \ GENCODE: \ producing a reference annotation for ENCODE. Genome Biol. 2006;7 \ Suppl 1:S4.1-9.

\

\ The ENCODE Project Consortium. \ Identification and analysis of\ functional elements in 1% of the human genome by the ENCODE pilot\ project\ Nature. 2007 Jun 14;447(7146):799-816.

\ encodeGenes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion Mar 2007\ gClass_Artifact 163,168,163\ gClass_Known 33,91,51\ gClass_Novel_CDS 33,91,51\ gClass_Novel_Transcript 84,188,0\ gClass_Polymorphic 160,32,240\ gClass_Processed_pseudogene 0,91,191\ gClass_Putative 84,188,0\ gClass_TEC 84,188,0\ gClass_Unprocessed_pseudogene 0,91,191\ geneClasses Artifact Known Novel_CDS Novel_Transcript Putative TEC Polymorphic Processed_pseudogene Unprocessed_pseudogene\ group encodeGenes\ itemClassTbl encodeGencodeGeneClassMar07\ longLabel Gencode Gene Annotations (March 2007)\ origAssembly hg17\ priority 10.4\ shortLabel Gencode Genes Mar07\ superTrack encodeGencodeSuper dense\ track encodeGencodeGeneMar07\ type genePred\ visibility hide\ encodeGencodeRaceFrags Gencode RACEfrags genePred 5' RACE-Array experiments on Gencode loci 0 10.5 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ RACEfrags are the products of 5’ RACE reactions performed on GENCODE \ genes (using the primers displayed in the subtrack "Gencode 5’ \ RACE primer") in 12 tissues and 3 cell lines (15 subtracks) followed \ by hybridization on ENCODE tiling arrays. Each RACEfrag is linked to the \ 5’ RACE primer but no other connectivity information is available \ from this experiment.\ \

Methods

\

\ For a detailed description of the methods and references used, see \ Denoeud et al., 2007.

\

\ A combination of 5’ RACE and \ high-density tiling microarrays were used to empirically annotate 5’ \ transcription start sites (TSSs) and internal exons of all 410 annotated\ protein-coding loci across the 44 ENCODE regions (Oct. 2005 GENCODE\ freeze ; Harrow et al., 2006). Oligonucleotides for 5’ RACE \ experiments were chosen such that they map to a coding exon (the index exon) \ common to most of the transcripts of protein-coding gene loci annotated by \ the GENCODE (Oct. 2005 freeze). The 5’ RACE reactions were \ performed with oligonucleotides mapping to a coding exon (the index exon) \ on polyA+ RNA from twelve adult human tissues (brain, heart, kidney, spleen, \ liver, colon, small intestine, muscle, lung, stomach, testis, placenta) and \ three cell lines \ (GM06990 (lymphoblastoid), \ HL60 (acute promyelocytic leukemia) and\ HeLaS3 (cervix carcinoma)).

\

\ The RACE reactions were then hybridized to 20 nucleotide-resolution \ Affymetrix tiling arrays covering the non-repeated regions of the 44 ENCODE \ regions. The resulting "RACEfrags" -- array-detected fragments of \ RACE products -- were assessed for novelty by comparing their genomic\ coordinates to those of GENCODE-annotated exons. \ \

Verification

\

\ Connectivity between novel RACEfrags and their respective index exon were \ investigated by RT-PCR using the 5’ RACE primer as one of the primers, \ followed by hybridization on tiling arrays. 385 RT-PCR reactions corresponding \ to 199 GENCODE loci were positive after hybridization on tiling arrays \ (244 RACE reactions). All positive RT-PCR reactions and a subset of those \ that were negative in the hybridization experiments were further verified by \ cloning and sequencing of the RT-PCR products. In most cases, eight clones were \ selected from each set of RT-PCR products for sequencing. To be retained in \ the dataset, these sequences must unambiguously map to the \ correct location, show splicing and pass manual inspection by the \ HAVANA team. By these criteria, 89 of these RT-PCR \ reactions (69 GENCODE loci) were positive after cloning and sequencing.\ (see Denoeud et al., 2007 for further details). \ The resulting cDNA sequences were deposited in GenBank under accession \ numbers DQ655905-DQ656069 and EF070113-EF070122. See additional information \ about the sequences \ here.\

\ \

Credits

\

\ The RACEfrags result from a collaborative effort among the following \ laboratories:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
\ \ Lab/Institution
\ \
Contributors
\
Genome Bioinformatics\ \ Lab CRG, Barcelona, SpainFrance Denoeud, Julien Lagarde, Tyler Alioto, Sylvain \ \ Foissac, Robert Castelo, Roderic Guigó
Department of \ \ Genetic Medicine and Development, University of Geneva, SwitzerlandCatherine Ucla, Carine Wyss,\ \ Caroline Manzano, Colette Rossier, Stylianos E. Antonorakis
Center for \ \ Integrative Genomics, University of Lausanne, SwitzerlandJacqueline Chrast, Charlotte N.\ \ Henrichsen, Alexandre Reymond
Affymetrix, Inc., \ \ Santa Clara, CA, USAPhilipp Kapranov, Jorg Drenkow, Sujit Dike, Jill Cheng, Thomas R. Gingeras
HAVANA annotation \ \ group, Wellcome Trust Sanger Insitute, Hinxton, UKAdam Frankish, James \ \ Gilbert, Tim Hubbard, Jennifer Harrow
\ \
\ \

References

\

\ Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, Lagarde J, \ Alioto TS, Manzano C, Chrast J et al. \ \ Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions.\ Genome Res. 2007 Jun;17(6):746-759.

\

\ \ Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, \ Gilbert JG, Storey R, Swarbreck D et al. \ GENCODE: \ producing a reference annotation for ENCODE. Genome Biol. 2006;7 \ Suppl 1:S4.1-9.

\

\ The ENCODE Project Consortium. \ Identification and analysis of\ functional elements in 1% of the human genome by the ENCODE pilot\ project.\ Nature. 2007 Jun 14;447(7146):799-816.

\ encodeGenes 1 autoTranslate 0\ chromosomes chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion Mar 2007\ group encodeGenes\ longLabel 5' RACE-Array experiments on Gencode loci\ origAssembly hg17\ priority 10.5\ shortLabel Gencode RACEfrags\ superTrack encodeGencodeSuper\ track encodeGencodeRaceFrags\ type genePred\ visibility hide\ encodeGencodeGeneOct05 Gencode Genes Oct05 genePred Gencode Gene Annotations (October 2005) 0 10.6 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ The Gencode Gene track shows high-quality manual annotations in the\ ENCODE regions generated by the\ GENCODE project.\ A companion track, Gencode Introns, shows experimental gene structure \ validations for these annotations.

\

\ The gene annotations are colored based on the Havana annotation type.\ Known and validated transcripts \ are colored dark green,\ putative and unconfirmed are light green,\ pseudogenes are blue,\ and artifacts are grey. \ The transcript types are defined in more detail in the accompanying table.\

\ The Gencode project recommends that the annotations\ with known and validated transcripts; i.e., the types Known, \ Novel_CDS, Novel_transcript_gencode_conf, and \ Putative_gencode_conf (which are colored dark green in the track display) \ be used as the reference annotation.\ \

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
TypeColorDescription
Knowndark greenKnown protein coding genes (referenced in Entrez Gene, NCBI)
Novel_CDSdark greenNovel protein coding genes annotated by Havana (not referenced in Entrez Gene, NCBI)
Novel_transcript_gencode_confdark green Novel transcripts annotated by Havana (no ORF assigned) with at least\ one junction validated by RT-PCR
Putative_gencode_confdark greenPutative transcripts (similar to "novel transcripts", EST supported,\ short, no viable ORF) with at least one junction validated by RT-PCR
Novel_transcriptlight greenNovel transcripts annotated by Havana (no ORF assigned) not validated \ by RT-PCR
Putativelight greenPutative transcripts (similar to "novel transcripts", EST supported,\ short, no viable ORF) not validated by RT-PCR
TEClight greenSingle exon objects (supported by multiple ESTs with polyA \ sites and signals) undergoing experimental validation/extension. \
Processed_pseudogene bluePseudogenes arising via retrotransposition (exon structure of parent gene lost)
Unprocessed_pseudogene bluePseudogenes arising via gene duplication (exon structure of parent gene retained)
ArtifactgreyTranscript evidence and/or its translation equivocal

\ \

Methods

\

\ The Human and Vertebrate Analysis and Annotation manual curation process \ (HAVANA) was\ used to produce these annotations.\

\ Finished genomic sequence was analyzed on a clone-by-clone basis using a\ combination of similarity searches against DNA and protein databases, as\ well as a series of ab initio gene predictions. Nucleotide sequence \ databases were searched with WUBLASTN and significant hits were realigned\ to the unmasked genomic sequence by EST2GENOME. WUBLASTX was used to search \ the Uniprot protein database, and the accession numbers of significant hits \ were retrieved from the Pfam database. Hidden Markov models for Pfam protein \ domains were aligned against the genomic sequence using Genewise to provide\ annotation of protein domains. \

\ A number of ab initio\ prediction algorithms were also run: Genscan and Fgenesh for genes, tRNAscan \ to find tRNA genes, and Eponine TSS for transcription start site predictions.\

\ The annotators used the (AceDB-based) Otterlace interface to create and\ edit gene objects, which were then stored in a local database named \ Otter. In cases where predicted transcript structures from Ensembl \ are available, these can be viewed from within the Otterlace interface and \ may be used as starting templates for gene curation. Annotation in the Otter \ database is submitted to the EMBL/Genbank/DDBJ nucleotide database.

\ \

Verification

\

\ The gene objects selected for verification came from various\ computational prediction methods and HAVANA annotations. \

RT-PCR and RACE experiments were performed on them, using a variety of human\ tissues, to confirm their structure. Human cDNAs from 24 different\ tissues (brain, heart, kidney, spleen, liver, colon, small intestine,\ muscle, lung, stomach, testis, placenta, skin, peripheral blood\ leucocytes, bone marrow, fetal brain, fetal liver, fetal kidney, fetal\ heart, fetal lung, thymus, pancreas, mammary gland, prostate) were\ synthesized using 12 poly(A)+ RNAs from Origene, eight from Clemente\ Associates/Quantum Magnetics and four from BD Biosciences as described in\ [Reymond et al., 2002a,b]. The relative amount of each cDNA was \ normalized by quantitative PCR using SyberGreen as intercalator and an \ ABI Prism 7700 Sequence Detection System.

\

\ Predictions of human genes junctions were assayed experimentally by\ RT-PCR as previously described and modified [Reymond, 2002b;\ Mouse Genome Sequencing Consortium, 2002; Guigo, 2003]. \

\ Similar amounts of Homo\ sapiens cDNAs were mixed with JumpStart REDTaq ReadyMix (Sigma) and four\ ng/ul primers (Sigma-Genosys) with a BioMek 2000 robot (Beckman). The\ ten first cycles of PCR amplification were performed with a touchdown\ annealing temperatures decreasing from 60 to 50°C; annealing\ temperature of the next 30 cycles was carried out at 50°C. Amplimers\ were separated on "Ready to Run" precast gels (Pharmacia) and\ sequenced. RACE experiments were performed with the BD SMART RACE cDNA\ Amplification Kit following the manufacturer instructions (BD\ Biosciences).

\ \

Credits

\

\ Click here for a complete list of people who participated in the \ GENCODE project.

\ \

References

\

\ Ashurst, J.L. et al. \ The Vertebrate Genome Annotation (Vega) database. \ Nucleic Acids Res 33 (Database Issue), D459-65 \ (2005).

\

\ Guigo, R. et al. \ Comparison of mouse and human genomes followed by experimental \ verification yields an estimated 1,019 additional genes. \ Proc Natl Acad Sci U S A 100(3), 1140-5 (2003).

\

\ Mouse Genome Sequencing Consortium. \ Initial sequencing and comparative analysis of the mouse \ genome. Nature 420(6915), 520-62 (2002).

\

\ Reymond, A. et al. \ Human chromosome 21 gene expression atlas in the mouse. \ Nature 420(6915), 582-6 (2002).

\

\ Reymond, A. et al. \ Nineteen additional unpredicted transcripts from human \ chromosome 21. Genomics 79(6), 824-32 (2002).

\ encodeGenes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE Oct 2005 Freeze\ gClass_Artifact 163,168,163\ gClass_Known 33,91,51\ gClass_Novel_CDS 33,91,51\ gClass_Novel_transcript 84,188,0\ gClass_Novel_transcript_gencode_conf 33,91,51\ gClass_Processed_pseudogene 0,91,191\ gClass_Putative 84,188,0\ gClass_Putative_gencode_conf 33,91,51\ gClass_TEC 84,188,0\ gClass_Unprocessed_pseudogene 0,91,191\ geneClasses Artifact Known Novel_CDS Novel_transcript Novel_transcript_gencode_conf Putative Putative_gencode_conf TEC Processed_pseudogene Unprocessed_pseudogene\ group encodeGenes\ itemClassTbl encodeGencodeGeneClassOct05\ longLabel Gencode Gene Annotations (October 2005)\ origAssembly hg17\ priority 10.6\ shortLabel Gencode Genes Oct05\ superTrack encodeGencodeSuper\ track encodeGencodeGeneOct05\ type genePred\ visibility hide\ encodeGencodeIntronOct05 Gencode Introns Oct05 bed 6 + Gencode Intron Validation (October 2005) 0 10.7 0 0 0 127 127 127 0 0 20 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr8,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr9,chrX,

Description

\

\ The Gencode Intron Validation track shows\ gene structure validations generated by the\ GENCODE project.\ This track serves as a companion to the Gencode Genes track.

\

\ The items in this track are colored based on the validation status determined \ via RT-PCR of exons flanking the intron:

\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
StatusColorValidation Result
RT_positivegreenIntron validated (RT-PCR product corresponds to expected junction)
RACE_validatedgreenIntron validated (RACE product corresponds to expected junction)
RT_negative\ redIntron not validated (no RT-PCR product was obtained)
RT_wrong_junctiongoldIntron not validated, but another junction exists between the two\ (RT-PCR product does not correspond to the expected junction)

\ \

Methods

\

\ Selected gene models from the Genecode Genes track were picked for RT-PCR \ and RACE verification experiments.\ RT-PCR and RACE experiments were performed on the objects, using a variety of \ human tissues, to confirm their structure. Human cDNAs from 24 different\ tissues (brain, heart, kidney, spleen, liver, colon, small intestine,\ muscle, lung, stomach, testis, placenta, skin, peripheral blood\ leucocytes, bone marrow, fetal brain, fetal liver, fetal kidney, fetal\ heart, fetal lung, thymus, pancreas, mammary gland, prostate) were\ synthesized using twelve poly(A)+ RNAs from Origene, eight from Clemente\ Associates/Quantum Magnetics and four from BD Biosciences as described in\ [Reymond et al., 2002a,b]. The relative amount of each cDNA was \ normalized with glyceraldehyde-3-phosphate dehydrogenase (GAPDH) by quantitative\ PCR using SyberGreen as intercalator and \ an ABI Prism 7700 Sequence Detection System.

\

\ Predictions of human genes junctions were assayed experimentally by\ RT-PCR as previously described and modified [Reymond, 2002b;\ Mouse Genome Sequencing Consortium, 2002; Guigo, 2003]. \

\ Similar amounts of Homo\ sapiens cDNAs were mixed with JumpStart REDTaq ReadyMix (Sigma) and 4\ ng/ul primers (Sigma-Genosys) with a BioMek 2000 robot (Beckman). The\ ten first cycles of PCR amplification were performed with a touchdown\ annealing temperatures decreasing from 60 to 50°C; annealing\ temperature of the next 30 cycles was carried out at 50°C. Amplimers\ were separated on "Ready to Run" precast gels (Pharmacia) and\ sequenced. RACE experiments were performed with the BD SMART RACE cDNA\ Amplification Kit following the manufacturer instructions (BD\ Biosciences).

\ \

Credits

\

\ Click here for a complete list of people who participated in the \ GENCODE project.

\ \

References

\

\ Ashurst, J.L. et al. \ The Vertebrate Genome Annotation (Vega) database. \ Nucleic Acids Res 33 (Database Issue), D459-65 \ (2005).

\

\ Guigo, R. et al. \ Comparison of mouse and human genomes followed by experimental \ verification yields an estimated 1,019 additional genes. \ Proc Natl Acad Sci U S A 100(3), 1140-5 (2003).

\

\ Mouse Genome Sequencing Consortium.\ Initial sequencing and comparative analysis of the mouse \ genome. Nature 420(6915), 520-62 (2002).

\

\ Reymond, A. et al. \ Human chromosome 21 gene expression atlas in the mouse. \ Nature 420(6915), 582-6 (2002).

\

\ Reymond, A. et al. \ Nineteen additional unpredicted transcripts from human \ chromosome 21. Genomics 79(6), 824-32 (2002).

\ encodeGenes 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr8,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr9,chrX\ dataVersion ENCODE Oct 2005 Freeze\ group encodeGenes\ longLabel Gencode Intron Validation (October 2005)\ origAssembly hg17\ priority 10.7\ shortLabel Gencode Introns Oct05\ superTrack encodeGencodeSuper\ track encodeGencodeIntronOct05\ type bed 6 +\ visibility hide\ encodeAffyChIpHl60PvalCebpeHr02 Affy CEBPe RA 2h wig 0.0 534.54 Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 2hrs) P-Value 0 11 200 25 0 227 140 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 200,25,0\ longLabel Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 2hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 11\ shortLabel Affy CEBPe RA 2h\ subGroups factor=CEBPe time=2h\ track encodeAffyChIpHl60PvalCebpeHr02\ encodeAffyChIpHl60SignalStrictPol2Hr08 Affy Pol2 8h wig -2.78 3.97 Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 8hrs) Strict Signal 0 11 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,175,0\ longLabel Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 8hrs) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 11\ shortLabel Affy Pol2 8h\ subGroups factor=Pol2 time=8h\ track encodeAffyChIpHl60SignalStrictPol2Hr08\ encodeAffyChIpHl60SitesStrictRnapHr08 Affy Pol2 8h bed 3 . Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 8hrs) Strict Sites 0 11 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 50,175,0\ longLabel Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 8hrs) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 11\ shortLabel Affy Pol2 8h\ subGroups factor=Pol2 time=8h\ track encodeAffyChIpHl60SitesStrictRnapHr08\ encodeAffyChIpHl60PvalStrictPol2Hr08 Affy Pol2 8h wig 0 696.62 Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 8hrs) Strict P-Value 0 11 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,175,0\ longLabel Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 8hrs) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 11\ shortLabel Affy Pol2 8h\ subGroups factor=Pol2 time=8h\ track encodeAffyChIpHl60PvalStrictPol2Hr08\ encodeAffyEc1FetalSpleenSignal EC1 Sgnl Spleen wig 0 62385 Affy Ext Trans Signal (1-base window) (Fetal Spleen) 0 11 176 0 80 215 127 167 0 0 2 chr21,chr22, encodeTxLevels 0 color 176,0,80\ longLabel Affy Ext Trans Signal (1-base window) (Fetal Spleen)\ parent encodeAffyEcSignal\ priority 11\ shortLabel EC1 Sgnl Spleen\ track encodeAffyEc1FetalSpleenSignal\ encodeAffyEc1FetalSpleenSites EC1 Sites Spleen bed 3 . Affy Ext Trans Sites (1-base window) (Fetal Spleen) 0 11 176 0 80 215 127 167 0 0 2 chr21,chr22, encodeTxLevels 1 color 176,0,80\ longLabel Affy Ext Trans Sites (1-base window) (Fetal Spleen)\ parent encodeAffyEcSites\ priority 11\ shortLabel EC1 Sites Spleen\ track encodeAffyEc1FetalSpleenSites\ gap Gap bed 3 + Gap Locations 1 11 0 0 0 127 127 127 0 0 0

Description

\

\ This track depicts gaps in the assembly. Most of these gaps - with the\ exception of intractable heterochromatic, centromeric, telomeric, and short-arm \ gaps - have been closed during the finishing process, although a small number \ still remain. \

\ Gaps are represented as black boxes in this track.\ If the relative order and orientation of the contigs on either side\ of the gap is known, it is a bridged gap. In this case, a white line is \ drawn through the black box representing the gap and the gap is labeled \ "yes". \

\

This assembly contains the following types of gaps:\

\ map 1 group map\ longLabel Gap Locations\ priority 11\ shortLabel Gap\ track gap\ type bed 3 +\ visibility dense\ hapmapSnpsYRI HapMap SNPs YRI bed 6 + HapMap SNPs from the YRI Population (Yoruba in Ibadan, Nigeria) 0 11 0 0 0 127 127 127 0 0 0 varRep 1 longLabel HapMap SNPs from the YRI Population (Yoruba in Ibadan, Nigeria)\ parent hapmapSnps\ priority 11\ shortLabel HapMap SNPs YRI\ track hapmapSnpsYRI\ decodeHotSpotMale Hot Spot Male bed 4 deCODE recombination map, male >= 10.0 1 11 0 81 200 127 168 227 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 1 color 0,81,200\ configurable on\ longLabel deCODE recombination map, male >= 10.0\ parent hotView\ priority 11\ shortLabel Hot Spot Male\ subGroups view=hot\ track decodeHotSpotMale\ snpArrayIlluminaHuman660W_Quad Illumina 660W-Q bed 6 + Illumina Human 660W-Quad 0 11 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Illumina Human 660W-Quad\ parent snpArray\ priority 11\ shortLabel Illumina 660W-Q\ track snpArrayIlluminaHuman660W_Quad\ type bed 6 +\ encodeEgaspFullJigsaw Jigsaw genePred Jigsaw Gene Predictions 0 11 22 150 20 138 202 137 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 22,150,20\ longLabel Jigsaw Gene Predictions\ parent encodeEgaspFull\ priority 11\ shortLabel Jigsaw\ track encodeEgaspFullJigsaw\ encodeUcsdChipHeLaH3H4RNAP_p0 LI Pol2 -gIF bedGraph 4 Ludwig Institute ChIP-chip: RNA Pol2, HeLa cells, no gamma interferon 0 11 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: RNA Pol2, HeLa cells, no gamma interferon\ parent encodeLIChIPgIF\ priority 11\ shortLabel LI Pol2 -gIF\ track encodeUcsdChipHeLaH3H4RNAP_p0\ encodeUcsdChipH3K27me3Suz12 LI SUZ12 HeLa bedGraph 4 Ludwig Institute ChIP-chip: SUZ12 protein ab, HeLa cells 0 11 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: SUZ12 protein ab, HeLa cells\ parent encodeLIChIP\ priority 11\ shortLabel LI SUZ12 HeLa\ track encodeUcsdChipH3K27me3Suz12\ encodeGencodeRaceFragsSpleen RACEfrags Spleen genePred Gencode RACEfrags from Spleen 0 11 152 0 104 203 127 179 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 152,0,104\ longLabel Gencode RACEfrags from Spleen\ parent encodeGencodeRaceFrags\ priority 11\ shortLabel RACEfrags Spleen\ track encodeGencodeRaceFragsSpleen\ encodeEgaspUpdSgp2U12 SGP2 U12 Update genePred SGP2 U12 Intron Predictions 0 11 200 132 12 227 193 133 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 200,132,12\ longLabel SGP2 U12 Intron Predictions\ parent encodeEgaspUpdate\ priority 11\ shortLabel SGP2 U12 Update\ track encodeEgaspUpdSgp2U12\ stanfordChipNRSFMono Stan Jurkat NRSF/REST/Mono bedGraph 4 Stanford ChIP-chip (Jurkat cells, NRSF/REST/Mono ChIP) 0 11 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 0 longLabel Stanford ChIP-chip (Jurkat cells, NRSF/REST/Mono ChIP)\ parent stanfordChip\ priority 11\ shortLabel Stan Jurkat NRSF/REST/Mono\ track stanfordChipNRSFMono\ encodeStanfordChipNRSFMono Stan Jurkat NRSF/REST/Mono bedGraph 4 Stanford ChIP-chip (Jurkat cells, NRSF/REST/Mono ChIP) 0 11 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (Jurkat cells, NRSF/REST/Mono ChIP)\ parent encodeStanfordChipJohnson\ priority 11\ shortLabel Stan Jurkat NRSF/REST/Mono\ track encodeStanfordChipNRSFMono\ encodeStanfordPromotersJEG3 Stan Pro JEG3 bed 9 + Stanford Promoter Activity (JEG3 cells) 0 11 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (JEG3 cells)\ parent encodeStanfordPromoters\ priority 11\ shortLabel Stan Pro JEG3\ track encodeStanfordPromotersJEG3\ encodeYaleAffyPlacRNATarsIntronsProximal Yale In Prx Plac bed 4 . Yale Intronic Proximal Placental TARs 0 11 128 0 128 191 127 191 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 128,0,128\ longLabel Yale Intronic Proximal Placental TARs\ parent encodeNoncodingTransFrags\ priority 11\ shortLabel Yale In Prx Plac\ subGroups region=intronicProximal celltype=plac source=yale\ track encodeYaleAffyPlacRNATarsIntronsProximal\ encodeYaleAffyNeutRNATransMap10 Yale RNA Neu 10 wig -2730 3394 Yale Neutrophil RNA Transcript Map, Sample 10 0 11 50 70 50 152 162 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,70,50\ longLabel Yale Neutrophil RNA Transcript Map, Sample 10\ parent encodeYaleAffyRNATransMap\ priority 11\ shortLabel Yale RNA Neu 10\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATransMap10\ encodeYaleAffyNeutRNATars10 Yale TAR Neu 10 bed 3 . Yale Neutrophil RNA Transcriptionally Active Region, Sample 10 0 11 50 70 50 152 162 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,70,50\ longLabel Yale Neutrophil RNA Transcriptionally Active Region, Sample 10\ parent encodeYaleAffyRNATars\ priority 11\ shortLabel Yale TAR Neu 10\ subGroups celltype=neutro samples=samples\ track encodeYaleAffyNeutRNATars10\ encodeAffyChIpHl60SitesCebpeHr02 Affy CEBPe RA 2h bed 3 . Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 2hrs) Sites 0 12 200 25 0 227 140 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 200,25,0\ longLabel Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 2hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 12\ shortLabel Affy CEBPe RA 2h\ subGroups factor=CEBPe time=2h\ track encodeAffyChIpHl60SitesCebpeHr02\ encodeAffyRnaGm06990SitesIntronsDistal Affy In Dst GM06990 bed 4 . Affy Intronic Distal GM06990 Transfrags 0 12 250 5 255 252 130 255 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 250,5,255\ longLabel Affy Intronic Distal GM06990 Transfrags\ parent encodeNoncodingTransFrags\ priority 12\ shortLabel Affy In Dst GM06990\ subGroups region=intronicDistal celltype=gm06990 source=affy\ track encodeAffyRnaGm06990SitesIntronsDistal\ encodeAffyChIpHl60SignalStrictPol2Hr32 Affy Pol2 32h wig -2.78 3.97 Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 32hrs) Strict Signal 0 12 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,175,0\ longLabel Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 32hrs) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 12\ shortLabel Affy Pol2 32h\ subGroups factor=Pol2 time=32h\ track encodeAffyChIpHl60SignalStrictPol2Hr32\ encodeAffyChIpHl60SitesStrictRnapHr32 Affy Pol2 32h bed 3 . Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 32hrs) Strict Sites 0 12 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 50,175,0\ longLabel Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 32hrs) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 12\ shortLabel Affy Pol2 32h\ subGroups factor=Pol2 time=32h\ track encodeAffyChIpHl60SitesStrictRnapHr32\ encodeAffyChIpHl60PvalStrictPol2Hr32 Affy Pol2 32h wig 0 696.62 Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 32hrs) Strict P-Value 0 12 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,175,0\ longLabel Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 32hrs) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 12\ shortLabel Affy Pol2 32h\ subGroups factor=Pol2 time=32h\ track encodeAffyChIpHl60PvalStrictPol2Hr32\ encodeAffyEc51FetalSpleenSignal EC51 Sgnl Spleen wig 0 62385 Affy Ext Trans Signal (51-base window) (Fetal Spleen) 0 12 176 0 80 215 127 167 0 0 2 chr21,chr22, encodeTxLevels 0 color 176,0,80\ longLabel Affy Ext Trans Signal (51-base window) (Fetal Spleen)\ parent encodeAffyEcSignal\ priority 12\ shortLabel EC51 Sgnl Spleen\ track encodeAffyEc51FetalSpleenSignal\ encodeAffyEc51FetalSpleenSites EC51 Site Spleen bed 3 . Affy Ext Trans Sites (51-base window) (Fetal Spleen) 0 12 176 0 80 215 127 167 0 0 2 chr21,chr22, encodeTxLevels 1 color 176,0,80\ longLabel Affy Ext Trans Sites (51-base window) (Fetal Spleen)\ parent encodeAffyEcSites\ priority 12\ shortLabel EC51 Site Spleen\ track encodeAffyEc51FetalSpleenSites\ encodeEgaspSuper EGASP ENCODE Gene Prediction Workshop (EGASP) 0 12 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Overview

\

\ This super-track combines related tracks from the\ ENCODE Gene Annotation Assessment Project (EGASP)\ \ 2005 Gene Prediction Workshop. \ The goal of the workshop was to evaluate automatic methods for\ gene annotation of the human genome, with a focus on protein-coding genes.\ Predictions were evaluated in terms of their ability to\ reproduce the high-quality manually assisted GENCODE gene annotations \ and to predict novel transcripts.\

\ The EGASP Full track shows gene predictions covering all\ 44 ENCODE regions submitted before the GENCODE annotations were released.\ The EGASP Partial track shows gene predictions that cover\ some of the ENCODE regions, submitted before the GENCODE release.\ The EGASP Update track shows gene predictions that cover\ all ENCODE regions, submitted after the GENCODE release.\

\ These annotations were originally produced using the hg17 assembly.

\

\ The following gene predictions are included:\

\

\ \

Credits

\

\ Click here for a complete list of people who participated in the\ GENCODE project.

\

\ The following individuals and institutions provided the data for the subtracks\ in this annotation:\

\ \

References

\

\ Ashurst JL, Chen CK, Gilbert JG, Jekosch K, Keenan S, Meidl P, Searle SM,\ Stalker J, Storey R, Trevanion S et al.\ The Vertebrate Genome Annotation (Vega) database.\ Nucleic Acids Res. 2005 Jan 1;33(Database issue):D459-65.

\

\ Guigo R, Dermitzakis ET, Agarwal P, Ponting CP, Parra G, Reymond A, Abril JF,\ Keibler E, Lyle R, Ucla C et al.\ Comparison of mouse and human genomes followed by experimental\ verification yields an estimated 1,019 additional genes.\ Proc Natl Acad Sci U S A. 2003 Feb 4;100(3):1140-5.\

\

\ Mouse Genome Sequencing Consortium.\ Initial sequencing and comparative analysis of the mouse\ genome. Nature. 2002 Dec 5;420(6915):520-62.

\

\ Reymond A, Marigo V, Yaylaoglu MB, Leoni A, Ucla C, Scamuffa N, Caccioppoli C, Dermitzakis ET, Lyle R, Banfi S et al.\ Human chromosome 21 gene expression atlas in the mouse.\ Nature. 2002 Dec 5;420(6915):582-6.

\

\ Reymond A, Camargo AA, Deutsch S, Stevenson BJ, Parmigiani RB, Ucla C, Bettoni F, Rossier C, Lyle R, Guipponi M et al.\ Nineteen additional unpredicted transcripts from human\ chromosome 21. Genomics. 2002 Jun;79(6):824-32.

\

\ Chatterji S, Pachter L.\ Multiple organism gene finding by collapsed Gibbs sampling.\ J Comput Biol. 2005 Jul-Aug;12(6):599-608.

\

\ Siepel A, Haussler D.\ Computational identification of evolutionarily conserved\ exons.\ Proc. 8th Int'l Conf. on Research in Computational Molecular Biology.\ 2004;177-186.

\ \

Augustus

\

\ Stanke M, Waack S.\ Gene prediction with a hidden Markov model and a new intron\ submodel.\ Bioinformatics. 2003;19(Suppl. 2):ii215-ii225.

\

\ Stanke M, Steinkamp R, Waack S, Morgenstern B.\ AUGUSTUS: a web server for gene finding in eukaryotes.\ Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12.

\ \

FGenesh++

\

\ Solovyev VV.\ "Statistical approaches in Eukaryotic gene prediction".\ In Handbook of Statistical Genetics (eds. Balding D et al.)\ (John Wiley & Sons, Inc., 2001). p. 83-127.

\ \

GeneID

\

\ Blanco E, Parra G, Guigó R.\ "Using geneid to identify genes".\ In Current Protocols in Bioinformatics, Unit 4.3. (eds. Baxevanis AD.)\ (John Wiley & Sons, Inc., 2002).

\

\ Guigó R.\ Assembling genes from predicted exons in linear time with\ dynamic programming.\ J Comput Biol. 1998 Winter;5(4):681-702.

\

\ Guigó R, Knudsen S, Drake N, Smith T.\ Prediction of gene structure.\ J Mol Biol. 1992 Jul 5;226(1):141-57.

\

\ Parra G, Blanco E, Guigó R.\ GeneID in Drosophila.\ Genome Res. 2000 Apr;10(4):511-5.

\ \

Jigsaw

\

\ Allen JE, Pertea M, Salzberg SL.\ Computational gene prediction using multiple sources of\ evidence.\ Genome Res. 2004 Jan;14(1):142-8.

\

\ Allen JE, Salzberg SL.\ JIGSAW: integration of multiple sources of evidence for gene\ prediction.\ Bioinformatics. 2005 Sep 15;21(18):3596-603.

\ \

SGP2

\

\ Guigó R, Dermitzakis ET, Agarwal P, Ponting CP, Parra G,\ Reymond A, Abril JF, Keibler E, Lyle R, Ucla C et al.\ Comparison of mouse and human genomes followed by experimental\ verification yields an estimated 1,019 additional genes.\ Proc Natl Acad Sci U S A. 2003 Feb 4;100(3):1140-5.

\

\ Parra G, Agarwal P, Abril JF, Wiehe T, Fickett JW, Guigó R.\ Comparative gene prediction in human and mouse.\ Genome Res. 2003 Jan;13(1):108-17.

\ \ encodeGenes 0 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ group encodeGenes\ longLabel ENCODE Gene Prediction Workshop (EGASP)\ priority 12.0\ shortLabel EGASP\ superTrack on\ track encodeEgaspSuper\ encodeEgaspFull EGASP Full genePred ENCODE Gene Prediction Workshop (EGASP) All ENCODE Regions 0 12 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows full sets of gene predictions covering all 44 ENCODE regions \ originally submitted for the ENCODE Gene Annotation Assessment Project \ (EGASP) Gene Prediction Workshop 2005. \ The following gene predictions are included:\

\ The EGASP Partial companion track shows original gene prediction submissions \ for a partial set of the 44 ENCODE regions; the EGASP Update track \ shows updated versions of the submitted predictions. These annotations\ were originally produced using the hg17 assembly.

\ \

Display Conventions and Configuration

\

\ Data for each gene prediction method within this composite annotation track \ are displayed in a separate subtrack. See the top of the track description \ page for configuration options allowing display of selected subsets of gene\ predictions. To remove a subtrack from the display,\ uncheck the appropriate box.\

\ The individual subtracks within this annotation follow the display conventions \ for gene prediction\ tracks. Display characteristics specific to individual subtracks are \ described in the Methods section. The track description page offers the option \ to color and label codons in a zoomed-in display of the subtracks to facilitate \ validation and comparison of gene predictions. To enable this feature, select \ the genomic codons option from the "Color track by codons"\ menu. Click the\ Help on codon coloring\ link for more information about this feature.

\

\ Color differences among the subtracks are arbitrary. They provide a\ visual cue for distinguishing the different gene prediction methods.

\ \

Methods

\ \

AceView

\

\ These annotations were generated using AceView. All mRNAs\ and cDNAs available in GenBank, excluding NMs, were co-aligned on the Gencode\ sections. The results were then examined and filtered to resemble Havana. \ The very restrictive view of Havana on CDS was not reproduced, due to a lack of\ experimental data.

\ \

DOGFISH-C

\

\ Candidate splice sites and coding starts/stops were evaluated using DNA\ alignments between the human assembly and seven other vertebrate species \ (UCSC multiz alignments, adding the frog and removing the chimp). Genes\ (single transcripts only) were then predicted using dynamic programming.

\ \

Ensembl

\

\ The Ensembl annotation includes two types of predictions: protein-coding \ genes (the Ensembl Gene Predictions subtrack)\ and pseudogenes of protein-coding genes \ (the Ensembl Pseudogene Predictions subtrack). \ The Ensembl Pseudo track is not intended as a comprehensive annotation of \ pseudogenes, but rather\ an attempt to identify and label those gene predictions made by the Ensembl \ pipeline that have pseudogene characteristics. Exons that lie partially outside \ the ENCODE region are not included in the data set. The "Alternate \ Name" field on the subtrack details page shows the Ensembl ID for the \ selected gene or transcript.

\ \

ExonHunter

\

\ ExonHunter is a comprehensive gene-finder based on hidden Markov models (HMMs)\ allowing the use of a variety of additional sources of information (ESTs, \ proteins, genome-genome comparisons).

\ \

Exogean

\

\ Exogean annotates protein coding genes by combining mRNA and cross-species\ protein alignments in directed acyclic colored multigraphs where nodes and\ edges respectively represent biological objects and human expertise.\ Additional predictions and methods for this subtrack are available in the\ EGASP Updates track.

\ \

Fgenesh Pseudogenes

\

\ Fgenesh is an HMM gene structure prediction program.\ This data set shows predictions of potential pseudogenes.

\ \

Fgenesh++

\

\ These gene predictions were generated by Fgenesh++, a gene-finding program that\ uses both HMMs and protein similarity to find \ genes in a completely automated manner.

\ \

GeneID-U12

\

\ The GeneID-U12 gene prediction set, generated using a version of GeneID modified\ to detect U12-dependent introns (both GT-AG and AT-AC subtypes) when present,\ employs a single-genome ab initio method.\ This modified version of GeneID uses matrices for U12 donor,\ acceptor and branch sites constructed from examples of published U12 \ intron splice junctions \ (both experimentally confirmed and expressed-sequence-validated predictions). \ Two GeneID-U12 subtracks are \ included: GeneID Gene Predictions and GeneID U12 Intron Predictions. The U12\ splice sites for features in the U12 Intron Predictions track are displayed\ on the track details pages. \ Additional predictions and methods for this subtrack are available in the\ EGASP Updates track.

\ \

GeneMark

\

\ The eukaryotic version of the GeneMark.hmm (release 2.2) gene prediction\ program utilizes the HMM statistical model with duration or hidden\ semi-Markov model (HSMM). The HMM includes hidden states for initial, \ internal and terminal exons, introns, intergenic regions and single exon genes. \ It also includes the "border" states, such as start site (initiation \ codon), stop site (termination codons), and donor and acceptor splice sites. \ Sequences of all protein-coding regions were modeled by three periodic \ inhomogeneous Markov chains; sequences of non-coding regions were modeled by \ homogeneous Markov chains. Nucleotide sequences corresponding to the site \ states were modeled by position-specific inhomogeneous Markov chains. \ Parameters of the gene models were derived from the set of genes obtained by \ cDNA mapping to genomic DNA. To reflect variations in G+C composition of the\ genome, the gene model parameters were estimated separately for the three G+C \ regions.

\ \

Jigsaw

\

\ Jigsaw uses the output from gene-finders, splice-site prediction programs and \ sequence alignments to predict gene models. Annotation data downloaded from \ the UCSC Genome Browser and TIGR gene-finder output was used as input for these\ predictions. Jigsaw predicts both partial and complete genes. \ Additional predictions and methods for this subtrack are available in the\ EGASP Updates track.

\ \

Pairagon/N-SCAN

\

\ The pairHMM-based alignment program, Pairagon, was used to align\ high-quality mRNA sequences to the ENCODE regions. These were\ supplemented with N-SCAN EST predictions which are displayed in the\ Pairgn/NSCAN-E subtrack, and extended further with additional\ transcripts from the Brent Lab to produce the predictions\ displayed as the Pairgn/NSCAN-E/+ subtrack. The NSCAN subtrack \ contains only predictions from the N-SCAN program. \

\ \

SGP2-U12

\

\ The SGP2-U12 gene prediction set, generated using a version of GeneID modified \ to detect U12-dependent introns (both AT-AC and GT-AG subtypes) when present,\ employs a dual-genome method (SGP2) that utilizes similarity (tblastx) to \ mouse genomic sequence syntenic to the ENCODE regions (Oct. 2004 MSA freeze). \ This modified version of GeneID uses matrices for U12 \ donor, acceptor and branch sites constructed from examples of published U12 \ intron splice junctions (both experimentally confirmed and \ expressed-sequence-validated predictions). Two SGP2-U12 subtracks are \ included: SGP2 Gene Predictions and SGP2 U12 Intron Predictions.\ The U12 splice sites for features in the U12 Intron Predictions track are \ displayed on the track details pages. \ Additional predictions and methods for this subtrack are available in the\ EGASP Updates track.

\ \

SPIDA

\

\ This exon-only prediction set was produced using SPIDA (Substitution Periodicity\ Index and Domain Analysis). Exons derived by mapping ESTs to the genome were\ validated by seeking periodic substitution patterns in the aligned informant \ DNA sequences. First, all\ available ESTs were mapped to the genome using Exonerate. The resulting\ transcript structures were "flattened" to remove redundancy. Each \ exon of the flattened transcripts was subjected to SPI analysis, which involves\ identifying periodicity in the pattern of mutations occurring between the human\ and an informant species DNA sequence (the informant sequences and their TBA\ alignments were provided by Elliott Margulies). SPI was calculated for all \ available human-informant pairs for whole exons and in a sliding 48 bp window. \ SPI analysis requires that a threshold level of periodicity be identified in at\ least two of the informant species if the exon is to be accepted. If accepted,\ SPI provides the correct frame for translation of the exon. This exon was used \ as a starting point for extending the ORF coding region of the flattened\ transcript from which it came. This gave a full or partial CDS; different exons\ may give different CDSs. The CDSs were translated and searched for domains using\ hmmpfam and Pfam_fs. Only transcripts with a domain hit with e > 1.0 were\ retained. Heuristics were applied to the retained CDSs to identify problems with\ the transcript structure, particularly frame-shifts. Many transcripts may\ identify the same exon, but only a single instance of each exon has been \ retained.

\ \

Twinscan-MARS

\

\ This gene prediction set was produced by a version of Twinscan that employs \ multiple pairwise genome comparisons to identify protein-coding genes (including\ alternative splices) using nucleotide homology information. No expression or \ protein data were used.

\ \

Credits

\

\ The following individuals and institutions provided the data for the subtracks \ in this annotation:\

\ \ encodeGenes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeGenes\ longLabel ENCODE Gene Prediction Workshop (EGASP) All ENCODE Regions\ origAssembly hg17\ priority 12.0\ shortLabel EGASP Full\ superTrack encodeEgaspSuper dense\ track encodeEgaspFull\ type genePred\ visibility hide\ decodeHotSpotFemale Hot Spot Female bed 4 deCODE recombination map, female >= 10.0 1 12 255 0 255 255 127 255 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 1 color 255,0,255\ configurable on\ longLabel deCODE recombination map, female >= 10.0\ parent hotView\ priority 12\ shortLabel Hot Spot Female\ subGroups view=hot\ track decodeHotSpotFemale\ snpArrayIlluminaHumanOmni1_Quad Illumina Omni1-Q bed 6 + Illumina Human Omni1-Quad 0 12 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Illumina Human Omni1-Quad\ parent snpArray\ priority 12\ shortLabel Illumina Omni1-Q\ track snpArrayIlluminaHumanOmni1_Quad\ type bed 6 +\ encodeUcsdChipH3K27me3 LI H3K27me3 HeLa bedGraph 4 Ludwig Institute ChIP-chip: H3K27me3 ab, HeLa cells 0 12 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: H3K27me3 ab, HeLa cells\ parent encodeLIChIP\ priority 12\ shortLabel LI H3K27me3 HeLa\ track encodeUcsdChipH3K27me3\ encodeUcsdChipHeLaH3H4RNAP_p30 LI Pol2 +gIF bedGraph 4 Ludwig Institute ChIP-chip: RNA Pol2, HeLa cells, 30 min. after gamma interferon 0 12 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: RNA Pol2, HeLa cells, 30 min. after gamma interferon\ parent encodeLIChIPgIF\ priority 12\ shortLabel LI Pol2 +gIF\ track encodeUcsdChipHeLaH3H4RNAP_p30\ encodeEgaspFullPairagonMrna Pairgn/NSCAN-E genePred Pairagon/NSCAN-EST Gene Predictions 0 12 12 50 200 133 152 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,50,200\ longLabel Pairagon/NSCAN-EST Gene Predictions\ parent encodeEgaspFull\ priority 12\ shortLabel Pairgn/NSCAN-E\ track encodeEgaspFullPairagonMrna\ par PAR bed 4 Pseudoautosomal regions 0 12 160 0 50 207 127 152 0 0 2 chrX,chrY,

Description

\

\ The pseudoautosomal regions (PARs) are sections of the X and Y chromosomes\ that undergo homologous recombination. Genes in these regions are inherited\ in the same manner as autosomal genes. Two PAR regions, PAR1 and PAR2,\ have been identified in human.\ The pseudoautosomal regions allow the X and Y chromosomes to pair and segregate during\ meiosis in males.\

\

Methods

\ The homologous pseudoautosomal regions on human X and Y chromosomes are\ assembled using the same clones and are thus identical, haploid sequences.\ The coordinates of these regions are provided as part of the assembly.\ map 1 chromosomes chrX,chrY\ color 160,0,50\ group map\ longLabel Pseudoautosomal regions\ priority 12\ shortLabel PAR\ track par\ type bed 4\ visibility hide\ partMrnas Partially Found mRNAs psl . Partially Found RefSeq and MGC mRNAs 0 12 0 0 0 127 127 127 0 0 0 map 1 group map\ longLabel Partially Found RefSeq and MGC mRNAs\ priority 12\ shortLabel Partially Found mRNAs\ track partMrnas\ type psl .\ visibility hide\ encodeGencodeRaceFragsStomach RACEfrags Stomach genePred Gencode RACEfrags from Stomach 0 12 140 0 116 197 127 185 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 140,0,116\ longLabel Gencode RACEfrags from Stomach\ parent encodeGencodeRaceFrags\ priority 12\ shortLabel RACEfrags Stomach\ track encodeGencodeRaceFragsStomach\ stanfordChipNRSFUpstate Stan Jurkat NRSF/REST/Upstate bedGraph 4 Stanford ChIP-chip (Jurkat cells, NRSF/REST/Upstate ChIP) 0 12 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 0 longLabel Stanford ChIP-chip (Jurkat cells, NRSF/REST/Upstate ChIP)\ parent stanfordChip\ priority 12\ shortLabel Stan Jurkat NRSF/REST/Upstate\ track stanfordChipNRSFUpstate\ encodeStanfordChipNRSFUpstate Stan Jurkat NRSF/REST/Upstate bedGraph 4 Stanford ChIP-chip (Jurkat cells, NRSF/REST/Upstate ChIP) 0 12 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 longLabel Stanford ChIP-chip (Jurkat cells, NRSF/REST/Upstate ChIP)\ parent encodeStanfordChipJohnson\ priority 12\ shortLabel Stan Jurkat NRSF/REST/Upstate\ track encodeStanfordChipNRSFUpstate\ encodeStanfordPromotersMG63 Stan Pro MG63 bed 9 + Stanford Promoter Activity (MG63 cells) 0 12 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (MG63 cells)\ parent encodeStanfordPromoters\ priority 12\ shortLabel Stan Pro MG63\ track encodeStanfordPromotersMG63\ encodeEgaspUpdYalePseudo Yale Pseudo Upd genePred Yale Pseudogene Predictions 0 12 130 130 130 192 192 192 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 130,130,130\ longLabel Yale Pseudogene Predictions\ parent encodeEgaspUpdate\ priority 12\ shortLabel Yale Pseudo Upd\ track encodeEgaspUpdYalePseudo\ encodeYaleAffyPlacRNATransMap Yale RNA Plcnta wig -2730 3394 Yale Placenta RNA Transcript Map 0 12 50 50 150 152 152 202 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 50,50,150\ longLabel Yale Placenta RNA Transcript Map\ parent encodeYaleAffyRNATransMap\ priority 12\ shortLabel Yale RNA Plcnta\ subGroups celltype=plac samples=samples\ track encodeYaleAffyPlacRNATransMap\ encodeYaleAffyPlacRNATars Yale RNA Plcnta bed 3 . Yale Placenta RNA Transcriptionally Active Region 0 12 50 50 150 152 152 202 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 50,50,150\ longLabel Yale Placenta RNA Transcriptionally Active Region\ parent encodeYaleAffyRNATars\ priority 12\ shortLabel Yale RNA Plcnta\ subGroups celltype=plac samples=samples\ track encodeYaleAffyPlacRNATars\ encodeEgaspPartial EGASP Partial genePred ENCODE Gene Prediction Workshop (EGASP) for Partial ENCODE Regions 0 12.5 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows gene predictions submitted for the ENCODE Gene Annotation \ Assessment Project \ (EGASP) Gene Prediction Workshop 2005 that cover only\ a partial set of the 44 ENCODE regions. The partial set excludes\ the 13 ENCODE regions for which high-quality annotations were released in late\ 2004.\ The following gene predictions are included:\

\ The EGASP Full companion track shows original gene prediction submissions for \ the full set of 44 ENCODE regions using Gene Prediction algorithms other than \ those used here; the EGASP Update track shows updated versions\ of some of the submitted predictions.

\ \

Display Conventions and Configuration

\

\ Data for each gene prediction method within this composite annotation track \ is displayed in a separate subtrack. See the top of the track description page \ for a complete list of the subtracks available for this annotation. To display\ only selected subtracks, uncheck the boxes next to the tracks you wish to\ hide. \

\ The individual subtracks within this annotation follow the display conventions \ for gene prediction\ tracks. The track description page offers the option \ to color and label codons in a zoomed-in display of the subtracks to facilitate \ validation and comparison of gene predictions. To enable this feature, select \ the genomic codons option from the "Color track by codons"\ menu. Click the\ Help on codon coloring\ link for more information about this feature.

\

\ Color differences among the subtracks are arbitrary. They provide a\ visual cue for distinguishing the different gene prediction methods.

\ \

Methods

\ \

ACEScan

\

\ ACEScan (Alternative Conserved Exons Scan) \ indicates alternative splicing that is evolutionarily conserved in human and \ mouse/rat. The Conserved Alternative Exon Predictions subtrack shows\ predicted alternative conserved exons. The Unconserved Alternative and \ Constitutive Exon Predictions subtrack shows exons that \ are predicted to be constitutive or may have species-specific alternative \ splicing.

\ \

Augustus

\

\ Augustus uses a generalized hidden Markov model (GHMM) that\ models coding and non-coding sequence, splice sites, the branch point region, \ translation start and end, and lengths of exons and introns. The track \ contains four different sets of predictions. Ab initio\ single genome predictions are based solely on the input sequence. EST and\ protein evidence predictions were generated using AGRIPPA hints based on \ alignments of human sequence from the dbEST and nr databases. Mouse homology \ gene predictions were produced using mouse genomic sequence only; BLAST, CHAOS, \ DIALIGN were used to generate the hints for Augustus. The combined \ EST/protein evidence and mouse homology gene predictions were created using \ human sequence from the dbEST and nr databases and mouse genomic sequence to \ generate hints for Augustus.\ Additional predictions and methods for this subtrack are available in the\ EGASP Updates track.

\ \

GeneZilla

\

\ GeneZilla is a program for the computational prediction of protein-coding genes \ in eukaryotic DNA, based on the generalized hidden Markov model (GHMM) \ framework. These predictions were generated using GeneZilla and \ IsoScan, which uses a four-state hidden Markov model to \ predict isochores (regions of homogeneous G+C content) in genomic DNA.

\ \

SAGA

\

\ SAGA is an ab initio multiple-species gene-finding program based on the\ Gibbs sampling-based method described in Chatterji et al. (2004). In \ addition to sampling parameters, SAGA also uses a phyloHMM based model to \ boost the scores, similar to the method described in Siepel et al.\ (2004).

\ \

Credits

\

\ The gene prediction data sets were submitted by the following individuals and \ institutions:\

\ \

References

\

\ Chatterji, S. and Pachter, L. \ Multiple organism gene finding by collapsed Gibbs sampling. \ Proc. 8th Int'l Conf. on Research in Computational Molecular Biology, \ 187-193 (2004).

\

\ Siepel, A. and Haussler, D. \ Computational identification of evolutionarily conserved \ exons. \ Proc. 8th Int'l Conf. on Research in Computational Molecular Biology, \ 177-186 (2004).

\ encodeGenes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeGenes\ longLabel ENCODE Gene Prediction Workshop (EGASP) for Partial ENCODE Regions\ origAssembly hg17\ priority 12.5\ shortLabel EGASP Partial\ superTrack encodeEgaspSuper\ track encodeEgaspPartial\ type genePred\ visibility hide\ encodeEgaspUpdate EGASP Update genePred ENCODE Gene Prediction Workshop (EGASP) Updates 0 12.7 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows updated versions of gene predictions submitted for the \ ENCODE Gene Annotation Assessment Project \ (EGASP) Gene Prediction Workshop 2005.\ The following gene predictions are included:\

\ The original EGASP submissions are displayed in the companion tracks, \ EGASP Full and EGASP Partial.

\ \

Display Conventions and Configuration

\

\ Data for each gene prediction method within this composite annotation track \ are displayed in separate subtracks. See the top of the track description page \ for a complete list of the subtracks available for this annotation. To display\ only selected subtracks, uncheck the boxes next to the tracks you wish to\ hide. \

\ The individual subtracks within this annotation follow the display conventions \ for gene prediction\ tracks. Display characteristics specific to individual subtracks are \ described in the Methods section. The track description page offers the option \ to color and label codons in a zoomed-in display of the subtracks to facilitate \ validation and comparison of gene predictions. To enable this feature, select \ the genomic codons option from the "Color track by codons"\ menu. Click the\ Help on codon coloring\ link for more information about this feature.

\

\ Color differences among the subtracks are arbitrary. They provide a\ visual cue for distinguishing the different gene prediction methods.

\ \

Methods

\ \

Augustus

\

\ Augustus uses a generalized hidden Markov model (GHMM) that models \ coding and non-coding sequence, splice sites, the branch point region, \ the translation start and end, and the lengths of exons and introns. \ This version has been trained on a set of 1284 human genes.\ The track contains four sets of predictions: ab initio,\ EST and protein-based, mouse homology-based, and those using\ EST/protein and mouse homology evidence as additional input to Augustus\ for the predictions.

\

\ The EST and protein evidence was generated by aligning sequences from the dbEST \ and nr databases to the ENCODE region using wublastn and wublastx.\ The resulting alignments were used to generate hints about putative splice \ sites, exons, coding regions, introns, translation start and \ translation stop.

\

\ The mouse homology evidence was generated by aligning pairs of human and\ mouse genomic sequences using the program \ DIALIGN. Regions conserved at the peptide level were used to \ generate hints about coding regions.

\ \

Exogean

\

\ Exogean produces alternative transcripts by combining mRNA and cross-species \ sequence alignments using heuristic rules. The program implements a generic \ framework based on directed acyclic colored multigraphs (DACMs). In Exogean, \ DACM nodes represent biological objects (mRNA or protein HSPs/transcripts) and\ multiple edges between nodes represent known relationships between these \ objects derived from human expertise. Exogean DACMs are succesively built and \ reduced, leading to increasingly complex objects. This process\ enables the production of alternative transcripts from initial HSPs.

\ \

FGenesh++

\

\ FGenesh++ predictions are based on hidden Markov models and protein similarity to\ the NR database. For more information, see the reference below.\ \

GeneID-U12

\

\ The GeneID program predicts genes in anonymous genomic sequences \ designed with a hierarchical structure.\ In the first step, splice sites, start and stop codons are predicted and scored \ along the sequence using position weight arrays (PWAs).\ Next, exons are built from the sites. Exons are scored as the sum of the scores \ of the defining sites plus the the log-likelihood ratio of a Markov model for \ coding DNA.\ Finally, the gene structure is assembled from the set of predicted exons, \ maximizing the sum of the scores of the assembled exons.\ The modified version of GeneID used to generate the predictions in this track \ incorporates models for U12-dependent splice signals in addition to U2 splice \ signals.

\

\ The GeneID subtrack shows all GeneID genes. Only U12 introns\ and their flanking exons are displayed in the GeneID U12 subtrack.\ Exons flanking predicted U12-dependent introns are assigned a type\ attribute reflecting their splice sites, displayed on\ the details page of the GeneID U12 subtrack as the "Alternate Name" \ of the item composed of the intron plus flanking exons.

\ \

Jigsaw

\

\ Jigsaw is a gene prediction program that determines genes based on \ target genomic sequence and output from a gene structure annotation database.\ Data downloaded from UCSC's annotation database is \ used as input and includes the following tracks of evidence:\ Known Genes, Ensembl, RefSeq, GeneID, Genscan, SGP, Twinscan, Human mRNAs,\ TIGR Gene Index, UniGene, Most Conserved Elements and Non-human RefSeq Genes.\ GlimmerHMM and GeneZilla, two open source ab initio gene-finding \ programs based on GHMMs, are also used.

\ \

SGP2-U12

\

\ To predict genes in a genomic query, SGP2 combines GeneID predictions with \ tblastx comparisons of the genomic query against other genomic sequences.\ This modified version of SGP2 uses models for U12-dependent splice signals \ in addition to U2 splice signals. The reference genomic sequence for this data \ set is the Oct. 2004 release of mouse sequence syntenic to ENCODE regions.

\

\ The SGP2 and SGP2 U12 tracks follow the same display conventions as the \ GeneID and GeneID U12 subtracks described above.

\ \

Yale Pseudogenes

\

\ For this analysis, pseudogenes were defined as genomic sequences similar \ to known human genes and with various disablements (premature stop codons or\ frameshifts) in their "putative" protein-coding regions.

\

\ The protein sequences of known human genes (as annotated by ENSEMBL) were used\ to search for similar nongenic sequences in ENCODE regions. The matching\ sequences were assessed as disabled copies of genes based on the occurrences of\ premature stop codons or frameshifts. The intron-exon structure of the\ functional gene was further used to infer whether a pseudogene was duplicated\ or processed (a duplicated pseudogene keeps the intron-exon structure of its\ parent functional gene). Small pseudogene sequences were labeled as fragments or\ other types.

\

\ All pseudogenes in this track were manually curated.\ In the browser, the track details page shows the pseudogene type.

\ \

Credits

\

\ Augustus was written by Mario Stanke at the\ Department of \ Bioinformatics of the University of Göttingen in Germany.

\

\ Exogean was developed by Sarah Djebali and Hugues Roest Crollius from the\ Dyogen Lab, Ecole \ Normale Supérieure (Paris, France) and Franck Delaplace\ from the Laboratoire de Méthodes Informatiques \ (LaMI), (Evry, \ France).

\

\ The FGenesh++ gene predictions were provided by Victor Solovyev of\ Softberry Inc.\

\ The GeneID-U12 and SGP2-U12 programs were developed by the\ Grup de Recerca en Informàtica Biomèdica \ (GRIB) at \ the Institut Municipal d'Investigació Mèdica (IMIM) in Barcelona.\ The version of GeneID on which GeneID-U12 is based (geneid_v1.2) was written by \ Enrique Blanco and Roderic Guigó.\ The parameter files were constructed by Genis Parra and Francisco Camara.\ Additional contributions were made by Josep F. Abril, Moises Burset and Xavier \ Messeguer. Modifications to GeneID that allow for the prediction of \ U12-dependent splice sites and incorporation of U12 introns into gene models \ were made by Tyler Alioto.

\

\ Jigsaw was developed at The Institute for Genomic Research \ (TIGR)\ by Jonathan Allen and Steven Salzberg,\ with computational gene-finder contributions from Mihaela Pertea and William \ Majoros. Continued maintenance and development of Jigsaw will\ be provided by the Salzberg group at the Center for Bioinformatics \ and Computational Biology \ (CBCB) at the \ University of Maryland, College Park.

\

\ The Yale Pseudogenes were generated by the pseudogene annotation group of \ Mark Gerstein at Yale \ University.

\ \

References

\ \

Augustus

\

\ Stanke, M. \ Gene prediction with a hidden Markov model.\ Ph.D. thesis, Universität Göttingen, Germany (2004).

\

\ Stanke, M. and Waack, S. \ Gene prediction with a hidden Markov model and a new intron \ submodel.\ Bioinformatics, 19(Suppl. 2), ii215-ii225 (2003).

\

\ Stanke, M., Steinkamp, R., Waack, S. and Morgenstern, B. \ AUGUSTUS: a web server for gene finding in eukaryotes.\ Nucl. Acids Res., 32, W309-W312 (2004).

\ \

FGenesh++

\

\ Solovyev V.V. \ "Statistical approaches in Eukaryotic gene prediction".\ In Handbook of Statistical Genetics (eds. Balding D. et al.)\ (John Wiley & Sons, Inc., 2001). p. 83-127.

\ \

GeneID

\

\ Blanco, E., Parra, G. and Guigó, R. \ "Using geneid to identify genes". \ In Current Protocols in Bioinformatics, Unit 4.3. (ed. Baxevanis, A.D.)\ (John Wiley & Sons, Inc., 2002).

\

\ Guigó, R. \ Assembling genes from predicted exons in linear time with \ dynamic programming. \ J Comput Biol. 5(4), 681-702 (1998).

\

\ Guigó, R., Knudsen, S., Drake, N. and Smith, T.\ Prediction of gene structure. \ J Mol Biol. 226(1), 141-57 (1992).

\

\ Parra, G., Blanco, E. and Guigó, R. \ GeneID in Drosophila. \ Genome Research 10(4), 511-515 (2000).

\ \

Jigsaw

\

\ Allen, J.E., Pertea, M. and Salzberg, S.L.\ Computational gene prediction using multiple sources of \ evidence. \ Genome Res., 14(1), 142-8 (2004).

\

\ Allen, J.E. and Salzberg, S.L.\ JIGSAW: integration of multiple sources of evidence for gene \ prediction.\ Bioinformatics 21(18), 3596-3603 (2005).

\ \

SGP2

\

\ Guigó, R., Dermitzakis, E.T., Agarwal, P., Ponting, C.P., Parra, G., \ Reymond, A., Abril, J.F., Keibler, E., Lyle, R., Ucla, C. et al. \ Comparison of mouse and human genomes followed by experimental \ verification yields an estimated 1,019 additional genes. \ Proc Natl Acad Sci U S A 100(3), 1140-5 (2003).

\

\ Parra, G., Agarwal, P., Abril, J.F., Wiehe, T., Fickett, J.W. and Guigó, R. \ Comparative gene prediction in human and mouse. \ Genome Res. 13(1), 108-17 (2003).

\ encodeGenes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeGenes\ longLabel ENCODE Gene Prediction Workshop (EGASP) Updates\ origAssembly hg17\ priority 12.7\ shortLabel EGASP Update\ superTrack encodeEgaspSuper dense\ track encodeEgaspUpdate\ type genePred\ visibility hide\ encodeAffyChIpHl60PvalCebpeHr08 Affy CEBPe RA 8h wig 0.0 534.54 Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 8hrs) P-Value 0 13 200 25 0 227 140 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 200,25,0\ longLabel Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 8hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 13\ shortLabel Affy CEBPe RA 8h\ subGroups factor=CEBPe time=8h\ track encodeAffyChIpHl60PvalCebpeHr08\ encodeAffyRnaHeLaSitesIntronsDistal Affy In Dst HeLa bed 4 . Affy Intronic Distal HeLa Transfrags 0 13 225 30 255 240 142 255 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 225,30,255\ longLabel Affy Intronic Distal HeLa Transfrags\ parent encodeNoncodingTransFrags\ priority 13\ shortLabel Affy In Dst HeLa\ subGroups region=intronicDistal celltype=hela source=affy\ track encodeAffyRnaHeLaSitesIntronsDistal\ encodeAffyChIpHl60SignalStrictp63_ActD Affy p63 ME-180+ wig -2.78 3.97 Affymetrix ChIP-chip (p63, actinomycin-D treated ME-180) Strict Signal 0 13 0 0 200 127 127 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 0,0,200\ longLabel Affymetrix ChIP-chip (p63, actinomycin-D treated ME-180) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 13\ shortLabel Affy p63 ME-180+\ subGroups factor=actd time=0h\ track encodeAffyChIpHl60SignalStrictp63_ActD\ encodeAffyChIpHl60SitesStrictP63_ActD Affy p63 ME-180+ bed 3 . Affymetrix ChIP-chip (p63, actinomycin-D treated ME-180) Strict Sites 0 13 0 0 200 127 127 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 0,0,200\ longLabel Affymetrix ChIP-chip (p63, actinomycin-D treated ME-180) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 13\ shortLabel Affy p63 ME-180+\ subGroups factor=actd time=0h\ track encodeAffyChIpHl60SitesStrictP63_ActD\ encodeAffyChIpHl60PvalStrictp63_ActD Affy p63 ME-180+ wig 0 696.62 Affymetrix ChIP-chip (p63, actinomycin-D treated ME-180) Strict P-Value 0 13 0 0 200 127 127 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 0,0,200\ longLabel Affymetrix ChIP-chip (p63, actinomycin-D treated ME-180) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 13\ shortLabel Affy p63 ME-180+\ subGroups factor=actd time=0h\ track encodeAffyChIpHl60PvalStrictp63_ActD\ encodeAffyEc1PlacentaSignal EC1 Sgnl Placen wig 0 62385 Affy Ext Trans Signal (1-base window) (Placenta) 0 13 176 0 80 215 127 167 0 0 2 chr21,chr22, encodeTxLevels 0 color 176,0,80\ longLabel Affy Ext Trans Signal (1-base window) (Placenta)\ parent encodeAffyEcSignal\ priority 13\ shortLabel EC1 Sgnl Placen\ track encodeAffyEc1PlacentaSignal\ encodeAffyEc1PlacentaSites EC1 Sites Placen bed 3 . Affy Ext Trans Sites (1-base window) (Placenta) 0 13 176 0 80 215 127 167 0 0 2 chr21,chr22, encodeTxLevels 1 color 176,0,80\ longLabel Affy Ext Trans Sites (1-base window) (Placenta)\ parent encodeAffyEcSites\ priority 13\ shortLabel EC1 Sites Placen\ track encodeAffyEc1PlacentaSites\ hapMapRelease24CombinedRecombMap HapMap bigWig -1.0 91.6 HapMap Release 24 combined recombination map 0 13 50 50 50 152 152 152 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 color 50,50,50\ configurable on\ longLabel HapMap Release 24 combined recombination map\ parent otherMaps\ priority 13\ shortLabel HapMap\ subGroups view=other\ track hapMapRelease24CombinedRecombMap\ type bigWig -1.0 91.6\ encodeUcsdChipHeLaH3H4TAF250_p0 LI TAF1 -gIF bedGraph 4 Ludwig Institute ChIP-chip: TAF1, HeLa cells, no gamma interferon 0 13 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: TAF1, HeLa cells, no gamma interferon\ parent encodeLIChIPgIF\ priority 13\ shortLabel LI TAF1 -gIF\ track encodeUcsdChipHeLaH3H4TAF250_p0\ missingHg Missing Human psl . Unplaced Human RefSeq Genes Blatted against Mouse Translated 0 13 0 100 0 255 240 200 0 0 0 map 1 altColor 255,240,200\ color 0,100,0\ group map\ longLabel Unplaced Human RefSeq Genes Blatted against Mouse Translated\ priority 13\ shortLabel Missing Human\ track missingHg\ type psl .\ visibility hide\ encodeEgaspFullPairagonAny Pairgn/NSCAN-E/+ genePred Pairagon/NSCAN Any Evidence Gene Predictions 0 13 12 65 165 133 160 210 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,65,165\ longLabel Pairagon/NSCAN Any Evidence Gene Predictions\ parent encodeEgaspFull\ priority 13\ shortLabel Pairgn/NSCAN-E/+\ track encodeEgaspFullPairagonAny\ encodeGencodeRaceFragsTestis RACEfrags Testis genePred Gencode RACEfrags from Testis 0 13 128 0 128 191 127 191 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 128,0,128\ longLabel Gencode RACEfrags from Testis\ parent encodeGencodeRaceFrags\ priority 13\ shortLabel RACEfrags Testis\ track encodeGencodeRaceFragsTestis\ encodeStanfordPromotersMRC5 Stan Pro MRC5 bed 9 + Stanford Promoter Activity (MRC5 cells) 0 13 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (MRC5 cells)\ parent encodeStanfordPromoters\ priority 13\ shortLabel Stan Pro MRC5\ track encodeStanfordPromotersMRC5\ encodeYaleAffyNB4RARNATransMap Yale RNA NB4 RA wig -2730 3394 Yale NB4 RNA Transcript Map, Treated with Retinoic Acid 0 13 150 50 50 202 152 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 150,50,50\ longLabel Yale NB4 RNA Transcript Map, Treated with Retinoic Acid\ parent encodeYaleAffyRNATransMap\ priority 13\ shortLabel Yale RNA NB4 RA\ subGroups celltype=nb4 samples=samples\ track encodeYaleAffyNB4RARNATransMap\ encodeYaleAffyNB4RARNATars Yale TAR NB4 RA bed 3 . Yale NB4 RNA, TAR, Treated with Retinoic Acid 0 13 150 50 50 202 152 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 150,50,50\ longLabel Yale NB4 RNA, TAR, Treated with Retinoic Acid\ parent encodeYaleAffyRNATars\ priority 13\ shortLabel Yale TAR NB4 RA\ subGroups celltype=nb4 samples=samples\ track encodeYaleAffyNB4RARNATars\ encodeAffyChIpHl60SitesCebpeHr08 Affy CEBPe RA 8h bed 3 . Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 8hrs) Sites 0 14 200 25 0 227 140 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 200,25,0\ longLabel Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 8hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 14\ shortLabel Affy CEBPe RA 8h\ subGroups factor=CEBPe time=8h\ track encodeAffyChIpHl60SitesCebpeHr08\ encodeAffyRnaHl60SitesHr00IntronsDistal Affy In Dst HL60 bed 4 . Affy Intronic Distal Hl60 Transfrags 0 14 200 55 255 227 155 255 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 200,55,255\ longLabel Affy Intronic Distal Hl60 Transfrags\ parent encodeNoncodingTransFrags\ priority 14\ shortLabel Affy In Dst HL60\ subGroups region=intronicDistal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr00IntronsDistal\ encodeAffyChIpHl60SignalStrictp63_mActD Affy p63 ME-180 wig -2.78 3.97 Affymetrix ChIP-chip (p63, ME-180) Strict Signal 0 14 0 0 200 127 127 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 0,0,200\ longLabel Affymetrix ChIP-chip (p63, ME-180) Strict Signal\ parent encodeAffyChIpHl60SignalStrict\ priority 14\ shortLabel Affy p63 ME-180\ subGroups factor=mactd time=0h\ track encodeAffyChIpHl60SignalStrictp63_mActD\ encodeAffyChIpHl60SitesStrictP63_mActD Affy p63 ME-180 bed 3 . Affymetrix ChIP-chip (p63, ME-180) Strict Sites 0 14 0 0 200 127 127 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 0,0,200\ longLabel Affymetrix ChIP-chip (p63, ME-180) Strict Sites\ parent encodeAffyChIpHl60SitesStrict\ priority 14\ shortLabel Affy p63 ME-180\ subGroups factor=mactd time=0h\ track encodeAffyChIpHl60SitesStrictP63_mActD\ encodeAffyChIpHl60PvalStrictp63_mActD Affy p63 ME-180 wig 0 696.62 Affymetrix ChIP-chip (p63, ME-180) Strict P-Value 0 14 0 0 200 127 127 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 0,0,200\ longLabel Affymetrix ChIP-chip (p63, ME-180) Strict P-Value\ parent encodeAffyChIpHl60PvalStrict\ priority 14\ shortLabel Affy p63 ME-180\ subGroups factor=mactd time=0h\ track encodeAffyChIpHl60PvalStrictp63_mActD\ clonePos Coverage clonePos Clone Coverage 0 14 0 0 0 180 180 180 0 0 0

Description

\

\ In dense display mode, this track shows the coverage level of \ the genome. Finished regions are depicted in black. Draft regions \ are shown in various shades of gray that correspond\ to the level of coverage. \

\ In full display mode, this track shows the position of each clone that aligns\ to the genome sequence. Finished clones are depicted in black, and unfinished\ clones are colored gray. NOTE: Fragment positions in unfinished clones are no \ longer delineated.\

\ map 0 altColor 180,180,180\ group map\ longLabel Clone Coverage\ priority 14\ shortLabel Coverage\ track clonePos\ type clonePos\ visibility hide\ encodeAffyEc51PlacentaSignal EC51 Sgnl Placen wig 0 62385 Affy Ext Trans Signal (51-base window) (Placenta) 0 14 176 0 80 215 127 167 0 0 2 chr21,chr22, encodeTxLevels 0 color 176,0,80\ longLabel Affy Ext Trans Signal (51-base window) (Placenta)\ parent encodeAffyEcSignal\ priority 14\ shortLabel EC51 Sgnl Placen\ track encodeAffyEc51PlacentaSignal\ encodeAffyEc51PlacentaSites EC51 Site Placen bed 3 . Affy Ext Trans Sites (51-base window) (Placenta) 0 14 176 0 80 215 127 167 0 0 2 chr21,chr22, encodeTxLevels 1 color 176,0,80\ longLabel Affy Ext Trans Sites (51-base window) (Placenta)\ parent encodeAffyEcSites\ priority 14\ shortLabel EC51 Site Placen\ track encodeAffyEc51PlacentaSites\ hapMapRelease24CEURecombMap HapMap CEU bigWig 0.0 111.0 HapMap Release 24 CEU recombination map 0 14 80 80 80 167 167 167 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 color 80,80,80\ configurable on\ longLabel HapMap Release 24 CEU recombination map\ parent otherMaps\ priority 14\ shortLabel HapMap CEU\ subGroups view=other\ track hapMapRelease24CEURecombMap\ type bigWig 0.0 111.0\ encodeUcsdChipHeLaH3H4TAF250_p30 LI TAF1 +gIF bedGraph 4 Ludwig Institute ChIP-chip: TAF1, HeLa cells, 30 min. after gamma interferon 0 14 109 51 43 182 153 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 109,51,43\ longLabel Ludwig Institute ChIP-chip: TAF1, HeLa cells, 30 min. after gamma interferon\ parent encodeLIChIPgIF\ priority 14\ shortLabel LI TAF1 +gIF\ track encodeUcsdChipHeLaH3H4TAF250_p30\ encodeEgaspFullPairagonMultiple NSCAN genePred N-SCAN Gene Predictions 0 14 12 85 135 133 170 195 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,85,135\ longLabel N-SCAN Gene Predictions\ parent encodeEgaspFull\ priority 14\ shortLabel NSCAN\ track encodeEgaspFullPairagonMultiple\ encodeGencodeRaceFragsGM06990 RACEfrags GM06990 genePred Gencode RACEfrags from GM06990 cells 0 14 0 0 205 127 127 230 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 0,0,205\ longLabel Gencode RACEfrags from GM06990 cells\ parent encodeGencodeRaceFrags\ priority 14\ shortLabel RACEfrags GM06990\ track encodeGencodeRaceFragsGM06990\ encodeStanfordPromotersPanc1 Stan Pro Panc1 bed 9 + Stanford Promoter Activity (Panc1 cells) 0 14 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (Panc1 cells)\ parent encodeStanfordPromoters\ priority 14\ shortLabel Stan Pro Panc1\ track encodeStanfordPromotersPanc1\ encodeYaleAffyNB4TPARNATransMap Yale RNA NB4 TPA wig -2730 3394 Yale NB4 RNA Transcript Map, Treated with 12-O-tetradecanoylphorbol-13 Acetate (TPA) 0 14 120 50 50 187 152 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 120,50,50\ longLabel Yale NB4 RNA Transcript Map, Treated with 12-O-tetradecanoylphorbol-13 Acetate (TPA)\ parent encodeYaleAffyRNATransMap\ priority 14\ shortLabel Yale RNA NB4 TPA\ subGroups celltype=nb4 samples=samples\ track encodeYaleAffyNB4TPARNATransMap\ encodeYaleAffyNB4TPARNATars Yale TAR NB4 TPA bed 3 . Yale NB4 RNA, TAR, Treated with 12-O-tetradecanoylphorbol-13 Acetate (TPA) 0 14 120 50 50 187 152 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 120,50,50\ longLabel Yale NB4 RNA, TAR, Treated with 12-O-tetradecanoylphorbol-13 Acetate (TPA)\ parent encodeYaleAffyRNATars\ priority 14\ shortLabel Yale TAR NB4 TPA\ subGroups celltype=nb4 samples=samples\ track encodeYaleAffyNB4TPARNATars\ encodeAffyChIpHl60PvalCebpeHr32 Affy CEBPe RA 32h wig 0.0 534.54 Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 32hrs) P-Value 0 15 200 25 0 227 140 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 200,25,0\ longLabel Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 32hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 15\ shortLabel Affy CEBPe RA 32h\ subGroups factor=CEBPe time=32h\ track encodeAffyChIpHl60PvalCebpeHr32\ encodeAffyRnaHl60SitesHr02IntronsDistal Affy In Dst HL60 2h bed 4 . Affy Intronic Distal Hl60 2hr Transfrags 0 15 175 80 255 215 167 255 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 175,80,255\ longLabel Affy Intronic Distal Hl60 2hr Transfrags\ parent encodeNoncodingTransFrags\ priority 15\ shortLabel Affy In Dst HL60 2h\ subGroups region=intronicDistal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr02IntronsDistal\ bacEndPairs BAC End Pairs bed 6 + BAC End Pairs 0 15 0 0 0 80 80 80 0 0 0

Description

\

\ Bacterial artificial chromosomes (BACs) are a key part of many \ large-scale sequencing projects. A BAC typically consists of 25 - 350 kb of\ DNA. During the early phase of a sequencing project, it is common\ to sequence a single read (approximately 500 bases) off each end of\ a large number of BACs. Later on in the project, these BAC end reads\ can be mapped to the genome sequence.

\

\ This track shows these mappings\ in cases where both ends could be mapped. These BAC end pairs can\ be useful for validating the assembly over relatively long ranges. In some\ cases, the BACs are useful biological reagents. This track can also be\ used for determining which BAC contains a given gene, useful information\ for certain wet lab experiments.

\

\ A valid pair of BAC end sequences must be\ at least 25 kb but no more than 350 kb away from each other. \ The orientation of the first BAC end sequence must be "+" and\ the orientation of the second BAC end sequence must be "-".

\

\ The scoring scheme used for this annotation assigns 1000 to an alignment \ when the BAC end pair aligns to only one location in the genome (after \ filtering). When a BAC end pair or clone aligns to multiple locations, the \ score is calculated as 1500/(number of alignments).

\ \

Methods

\

\ BAC end sequences are placed on the assembled sequence using\ Jim Kent's blat program.

\ \

Credits

\

\ Additional information about the clone, including how it\ can be obtained, may be found at the \ NCBI Clone Registry. To view the registry entry for a \ specific clone, open the details page for the clone and click on its name at \ the top of the page.

\ map 1 altColor 80,80,80\ color 0,0,0\ exonArrows off\ group map\ longLabel BAC End Pairs\ priority 15\ shortLabel BAC End Pairs\ track bacEndPairs\ type bed 6 +\ visibility hide\ encodeAffyEc1TestisSignal EC1 Sgnl Testis wig 0 62385 Affy Ext Trans Signal (1-base window) (Testis) 0 15 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 0 color 128,0,128\ longLabel Affy Ext Trans Signal (1-base window) (Testis)\ parent encodeAffyEcSignal\ priority 15\ shortLabel EC1 Sgnl Testis\ track encodeAffyEc1TestisSignal\ encodeAffyEc1TestisSites EC1 Sites Testis bed 3 . Affy Ext Trans Sites (1-base window) (Testis) 0 15 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 1 color 128,0,128\ longLabel Affy Ext Trans Sites (1-base window) (Testis)\ parent encodeAffyEcSites\ priority 15\ shortLabel EC1 Sites Testis\ track encodeAffyEc1TestisSites\ hapMapRelease24YRIRecombMap HapMap YRI bigWig 0.0 72.21 HapMap Release 24 YRI recombination map 0 15 110 110 110 182 182 182 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr16,chr14,chr15,chr17,chr18,chr19,chr20,chr22,chr21, map 0 color 110,110,110\ configurable on\ longLabel HapMap Release 24 YRI recombination map\ parent otherMaps\ priority 15\ shortLabel HapMap YRI\ subGroups view=other\ track hapMapRelease24YRIRecombMap\ type bigWig 0.0 72.21\ encodePseudogene Pseudogenes genePred ENCODE Pseudogene Predictions - All ENCODE Regions 0 15 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows the pseudogenes located in ENCODE regions generated by\ five different methods—Yale Pipeline, GenCode manual annotation, two \ different UCSC methods, and Gene Identification Signature (GIS)—as well \ as a consensus pseudogenes subtrack based on the \ pseudogenes from all five methods. Datasets are displayed in separate\ subtracks within the annotation and are individually described below.

\

\ The annotations are colored as follows:

\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
TypeColorDescription
Processed_pseudogenepinkPseudogenes arising via retrotransposition (exon structure of parent gene lost)
Unprocessed_pseudogenebluePseudogenes arising via gene duplication (exon structure of parent gene retained)
Pseudogene_fragmentlight bluePseudogenes sequences that are single-exon and cannot be confidently \ assigned to either the processed or the duplicated category
Undefinedgray 

\

\
\ \

Consensus Pseudogenes

\

Description

\

\ This subtrack shows pseudogenes derived from a consensus of the five \ methods listed above. In the pseudogene.org data freeze dated 6 Jan. 2006, \ 201 consensus pseudogenes were found.\ Here, pseudogenes are defined as genomic sequences that are similar to known \ genes but exhibit various inactivating disablements (e.g. premature \ stop codons or frameshifts) in their putative protein-coding regions and are \ flagged as either recently-processed or non-processed.

\ \

Methods

\

\ The pseudogene sets were processed as follows:\

\ \

Verification of the Consensus Pseudogenes

\

\ All pseudogenes in the list have been extensively curated by Adam Frankish and\ Jennifer Harrow at the The Wellcome Trust Sanger Institute.

\ \

References

\

\ More information about this data set is available from pseudogene.org/ENCODE.\

\

\
\ \

Havana-Gencode Annotated Pseudogenes and Immunglobulin Segments

\

Description

\

\ This track shows pseudogenes annotated by the \ HAVANA group \ at the Wellcome Trust Sanger Institute. Pseudogenes have homology to protein\ sequences but generally have a disrupted CDS. For all annotated\ pseudogenes, an active homologous gene (the parent) can be identified\ elsewhere in the genome. Pseudogenes are classified as processed or\ unprocessed.\ \

Methods

\

\ Prior to manual annotation, finished sequence is submitted to an\ automated analysis pipeline for similarity searches and ab initio gene\ predictions. The searches are run on a computer farm and stored in an\ Ensembl MySQL database using the Ensembl analysis pipeline system\ (Searle et al., 2004, Harrow et al., 2006).

\

\ A pseudogene is annotated\ where the total length of the protein homology to the genomic sequence\ is >20% of the length of the parent protein or >100 aa in length,\ whichever is shortest. If a gene structure has an ORF but has lost\ the structure of the parent gene, a pseudogene is annotated provided there\ is no evidence of transcription from the pseudogene locus. When an\ open but truncated reading frame is present, other evidence is used\ (for example, 3' genomic polyA tract) to allow classification as a\ pseudogene. When a parent gene has only a single coding exon (e.g.\ olfactory receptors), a small 5' or 3' truncation to the CDS at the\ pseudogene locus (compared to other family members) is sufficient to\ confirm pseudogene status where the truncation is predicted to\ significantly affect secondary structure by the literature and/or\ expert community.

\

\ Processed and unprocessed pseudogenes are\ distinguished on the basis of structure and genomic context. \ Processed pseudogenes, which arise via retrotransposition, lose the\ intron-exon structure of the parent gene, often have an A-rich tract\ indicative of the insertion site at their 3' end, and are flanked by\ different genomic sequence to the parent gene. Unprocessed\ pseudogenes, which arise via gene duplication, share both the\ intron-exon structure and flanking genomic sequence with the parent\ gene. Transcribed pseudogenes are indicated by the annotation of a\ pseudogene and transcript variant alongside each other.

\ \

References

\

\ Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, \ Gilbert JG, Storey R, Swarbreck D, et al.\ GENCODE: Producing a reference annotation for ENCODE. \ Genome Biol. 2006;7 Suppl 1:S4.1-9.

\

\ Searle SM, Gilbert J, Iyer V, Clamp M.\ The otter annotation system.\ Genome Res. 2004 May;14(5):963-70.

\

\
\ \

Yale Pseudogenes

\

Description

\

\ This subtrack shows pseudogenes in the ENCODE regions identified by the Yale \ Pseudogene Pipeline. In this analysis, pseudogenes are defined as genomic \ sequences that are similar to known genes with various inactivating \ disablements (e.g. premature stop codons or frameshifts) in their \ putative protein-coding regions. Pseudogenes are flagged as \ recently processed, recently duplicated, or of uncertain origin (either \ ancient fragments or resulting from a single-exon parent).

\ \

Methods

\

\

\ \

Verification of Yale Pseudogenes

\

\ All pseudogenes in the list have been manually checked.

\ \

References

\

\ Zhang Z, Harrison PM, Liu Y, Gerstein M. \ Millions of years of evolution preserved: a comprehensive catalog \ of the processed pseudogenes in the human genome.\ Genome Res. 2003 Dec;13(12):2541-58.

\

\ Zheng D, Zhang Z, Harrison PM, Karro J, Carriero N, Gerstein M. \ Integrated pseudogene annotation for human chromosome 22: evidence\ for transcription.\ J Mol Biol. 2005 May 27;349(1):27-45.

\

\
\ \

UCSC Retrogene Predictions

\

Description

\

\ The Retrogene subtrack shows processed mRNAs that have been inserted back \ into the genome since the mouse/human split. Retrogenes can be \ functional genes that have acquired a promoter from a neighboring gene, \ non-functional pseudogenes, or transcribed pseudogenes.

\ \

Methods

\

\

\

\ The "type" field has four possible values: \

\

\ These features can be downloaded from the table pseudoGeneLink in many \ formats using the Table Browser option on the menubar.

\ \

References

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: \ Duplication, deletion, and rearrangement in the mouse and human genomes. \ Proc Natl Acad Sci USA. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, \ Haussler D, Miller W.\ Human-mouse alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

\

\
\ \

UCSC Pseudogene Predictions

\

Methods

\

\

\

\


\ \

GIS-PET Pseudogene Predictions

\

Description

\

\ This subtrack shows retrotransposed pseudogenes predicted by multiple mapped \ GIS-PETs (gene identification signature-pair end ditags) collected from two \ different cancer cell lines HCT116 and MCF7. A total of 49 non-redundant \ processed pseudogenes predicted in the ENCODE regions are presented in this \ dataset. Each pseudogene is labeled with an ID of the format \ AAA-GISPgene-XX, \ where "AAA" indicates the parental gene name, "GISPgene" is the GIS pseudogene, and "XX" is the unique ID for each pseudogene.

\ \

Methods

\

\ PETs were generated from full-length transcripts and \ computationally mapped onto the human genome to demarcate the transcript start \ and end positions. The PETs that mapped to multiple genome locations were \ grouped into PET-based gene families that include parent gene and \ pseudogenes. A representative member—the shortest PET as defined by \ genomic coordinates—was selected from each family. This representative\ PET was aligned to the hg17 genome using in order to identify all the \ putative pseudogenes at the whole genome level. All hits with an \ identity >=70% and coverage >=50% within ENCODE regions were \ reported. In this context, "coverage" refers to alignment coverage of \ the query sequence, i.e. a measure of how complete the predicted pseudogene \ is relative to the query sequence.

\ \

Verification of GIS-PET Pseudogene Predictions

\

\ Pseudogenes were verified by manual examination.

\ \

Credits

\

\ These data were generated by the ENCODE Pseudogene Annotation group:\ \ Jennifer Harrow,\ \ \ Wei Chia-Lin,\ \ \ Siew Woh Choo\ \ \ Adam Frankish,\ \ \ Robert Baertsch,\ \ \ France Denoeud,\ \ \ Deyou Zheng,\ \ \ Yontao Lu,\ \ \ Alexandre Reymond,\ \ \ Roderic Guigo Serra,\ \ \ Tom Gingeras,\ \ \ Suganthi Balasubramanian and\ \ \ Mark Gerstein.\ \ encodeGenes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ cdsDrawDefault genomic codons\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ gClass_Known 33,91,51\ gClass_Novel_CDS 33,91,51\ gClass_Novel_transcript 84,188,0\ gClass_Novel_transcript_gencode_conf 33,91,51\ gClass_Processed_pseudogene 200,91,191\ gClass_Pseudogene_fragment 100,91,191\ gClass_Putative 84,188,0\ gClass_Putative_gencode_conf 33,91,51\ gClass_TEC 84,188,0\ gClass_Undefined 163,168,163\ gClass_Unprocessed_pseudogene 0,91,191\ geneClasses Artifact Known Novel_CDS Novel_transcript Novel_transcript_gencode_conf Putative Putative_gencode_conf TEC Processed_pseudogene Unprocessed_pseudogene Pseudogene_fragment Undefined\ group encodeGenes\ itemClassTbl encodePseudogeneClass\ longLabel ENCODE Pseudogene Predictions - All ENCODE Regions\ origAssembly hg17\ priority 15.0\ shortLabel Pseudogenes\ track encodePseudogene\ type genePred\ visibility hide\ encodeGencodeRaceFragsHL60 RACEfrags HL60 genePred Gencode RACEfrags from HL60 cells 0 15 0 0 255 127 127 255 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 0,0,255\ longLabel Gencode RACEfrags from HL60 cells\ parent encodeGencodeRaceFrags\ priority 15\ shortLabel RACEfrags HL60\ track encodeGencodeRaceFragsHL60\ encodeStanfordPromotersSnu182 Stan Pro Snu182 bed 9 + Stanford Promoter Activity (Snu182 cells) 0 15 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (Snu182 cells)\ parent encodeStanfordPromoters\ priority 15\ shortLabel Stan Pro Snu182\ track encodeStanfordPromotersSnu182\ encodeYaleAffyNB4UntrRNATransMap Yale RNA NB4 Un wig -2730 3394 Yale NB4 RNA Transcript Map, Untreated 0 15 90 50 50 172 152 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 0 color 90,50,50\ longLabel Yale NB4 RNA Transcript Map, Untreated\ parent encodeYaleAffyRNATransMap\ priority 15\ shortLabel Yale RNA NB4 Un\ subGroups celltype=nb4 samples=samples\ track encodeYaleAffyNB4UntrRNATransMap\ encodeYaleAffyNB4UntrRNATars Yale TAR NB4 Un bed 3 . Yale NB4 RNA, TAR, Untreated 0 15 90 50 50 172 152 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 color 90,50,50\ longLabel Yale NB4 RNA, TAR, Untreated\ parent encodeYaleAffyRNATars\ priority 15\ shortLabel Yale TAR NB4 Un\ subGroups celltype=nb4 samples=samples\ track encodeYaleAffyNB4UntrRNATars\ encodeAffyChIpHl60SitesCebpeHr32 Affy CEBPe RA 32h bed 3 . Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 32hrs) Sites 0 16 200 25 0 227 140 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 200,25,0\ longLabel Affymetrix ChIP/Chip (CEBPe retinoic acid-treated HL-60, 32hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 16\ shortLabel Affy CEBPe RA 32h\ subGroups factor=CEBPe time=32h\ track encodeAffyChIpHl60SitesCebpeHr32\ encodeAffyRnaHl60SitesHr08IntronsDistal Affy In Dst HL60 8h bed 4 . Affy Intronic Distal Hl60 8hr Transfrags 0 16 150 105 255 202 180 255 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 150,105,255\ longLabel Affy Intronic Distal Hl60 8hr Transfrags\ parent encodeNoncodingTransFrags\ priority 16\ shortLabel Affy In Dst HL60 8h\ subGroups region=intronicDistal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr08IntronsDistal\ encodeAffyEc51TestisSignal EC51 Sgnl Testis wig 0 62385 Affy Ext Trans Signal (51-base window) (Testis) 0 16 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 0 color 128,0,128\ longLabel Affy Ext Trans Signal (51-base window) (Testis)\ parent encodeAffyEcSignal\ priority 16\ shortLabel EC51 Sgnl Testis\ track encodeAffyEc51TestisSignal\ encodeAffyEc51TestisSites EC51 Site Testis bed 3 . Affy Ext Trans Sites (51-base window) (Testis) 0 16 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 1 color 128,0,128\ longLabel Affy Ext Trans Sites (51-base window) (Testis)\ parent encodeAffyEcSites\ priority 16\ shortLabel EC51 Site Testis\ track encodeAffyEc51TestisSites\ bacEndPairsBad Incorrect BAC End Pairs bed 6 + Orphan, Short and Incorrectly Oriented BAC End Pairs 0 16 0 0 0 90 90 90 0 0 0 map 1 altColor 90,90,90\ color 0,0,0\ exonArrows off\ group map\ longLabel Orphan, Short and Incorrectly Oriented BAC End Pairs\ priority 16\ shortLabel Incorrect BAC End Pairs\ track bacEndPairsBad\ type bed 6 +\ visibility hide\ encodeGencodeRaceFragsHela RACEfrags HeLaS3 genePred Gencode RACEfrags from HeLaS3 cells 0 16 125 130 255 190 192 255 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 125,130,255\ longLabel Gencode RACEfrags from HeLaS3 cells\ parent encodeGencodeRaceFrags\ priority 16\ shortLabel RACEfrags HeLaS3\ track encodeGencodeRaceFragsHela\ encodeEgaspFullSgp2 SGP2 genePred SGP2 Gene Predictions 0 16 100 12 100 177 133 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 100,12,100\ longLabel SGP2 Gene Predictions\ parent encodeEgaspFull\ priority 16\ shortLabel SGP2\ track encodeEgaspFullSgp2\ encodeStanfordPromotersU87 Stan Pro U87 bed 9 + Stanford Promoter Activity (U87 cells) 0 16 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (U87 cells)\ parent encodeStanfordPromoters\ priority 16\ shortLabel Stan Pro U87\ track encodeStanfordPromotersU87\ encodeAffyChIpHl60PvalCtcfHr00 Affy CTCF RA 0h wig 0.0 534.54 Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 0hrs) P-Value 0 17 175 50 0 215 152 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 175,50,0\ longLabel Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 0hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 17\ shortLabel Affy CTCF RA 0h\ subGroups factor=CTCF time=0h\ track encodeAffyChIpHl60PvalCtcfHr00\ encodeAffyRnaHl60SitesHr32IntronsDistal Affy In Dst HL60 32h bed 4 . Affy Intronic Distal Hl60 32hr Transfrags 0 17 125 130 255 190 192 255 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 125,130,255\ longLabel Affy Intronic Distal Hl60 32hr Transfrags\ parent encodeNoncodingTransFrags\ priority 17\ shortLabel Affy In Dst HL60 32h\ subGroups region=intronicDistal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr32IntronsDistal\ encodeAffyEc1FetalTestisSignal EC1 Sgnl FetalT wig 0 62385 Affy Ext Trans Signal (1-base window) (Fetal Testis) 0 17 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 0 color 128,0,128\ longLabel Affy Ext Trans Signal (1-base window) (Fetal Testis)\ parent encodeAffyEcSignal\ priority 17\ shortLabel EC1 Sgnl FetalT\ track encodeAffyEc1FetalTestisSignal\ encodeAffyEc1FetalTestisSites EC1 Sites FetalT bed 3 . Affy Ext Trans Sites (1-base window) (Fetal Testis) 0 17 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 1 color 128,0,128\ longLabel Affy Ext Trans Sites (1-base window) (Fetal Testis)\ parent encodeAffyEcSites\ priority 17\ shortLabel EC1 Sites FetalT\ track encodeAffyEc1FetalTestisSites\ bacEndPairsLong Long BAC End Pairs bed 6 + Long BAC End Pairs 0 17 0 0 0 90 90 90 0 0 0 map 1 altColor 90,90,90\ color 0,0,0\ exonArrows off\ group map\ longLabel Long BAC End Pairs\ priority 17\ shortLabel Long BAC End Pairs\ track bacEndPairsLong\ type bed 6 +\ visibility hide\ encodeEgaspFullSgp2U12 SGP2 U12 genePred SGP2 U12 Intron Predictions 0 17 200 132 12 227 193 133 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 200,132,12\ longLabel SGP2 U12 Intron Predictions\ parent encodeEgaspFull\ priority 17\ shortLabel SGP2 U12\ track encodeEgaspFullSgp2U12\ encodeStanfordPromotersAverage Stan Pro Average bed 9 + Stanford Promoter Activity (Average) 0 17 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX, encodeTxLevels 1 longLabel Stanford Promoter Activity (Average)\ parent encodeStanfordPromoters\ priority 17\ shortLabel Stan Pro Average\ track encodeStanfordPromotersAverage\ encodeAffyChIpHl60SitesCtcfHr00 Affy CTCF RA 0h bed 3 . Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 0hrs) Sites 0 18 175 50 0 215 152 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 175,50,0\ longLabel Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 0hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 18\ shortLabel Affy CTCF RA 0h\ subGroups factor=CTCF time=0h\ track encodeAffyChIpHl60SitesCtcfHr00\ encodeAffyEc51FetalTestisSignal EC51 Sgnl FetalT wig 0 62385 Affy Ext Trans Signal (51-base window) (Fetal Testis) 0 18 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 0 color 128,0,128\ longLabel Affy Ext Trans Signal (51-base window) (Fetal Testis)\ parent encodeAffyEcSignal\ priority 18\ shortLabel EC51 Sgnl FetalT\ track encodeAffyEc51FetalTestisSignal\ encodeAffyEc51FetalTestisSites EC51 Site FetalT bed 3 . Affy Ext Trans Sites (51-base window) (Fetal Testis) 0 18 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 1 color 128,0,128\ longLabel Affy Ext Trans Sites (51-base window) (Fetal Testis)\ parent encodeAffyEcSites\ priority 18\ shortLabel EC51 Site FetalT\ track encodeAffyEc51FetalTestisSites\ fosEndPairs Fosmid End Pairs bed 6 + Fosmid End Pairs 0 18 0 0 0 90 90 90 0 0 0

Description

\

A valid pair of fosmid end sequences must be\ at least 30 kb but no more than 50 kb away from each other. \ The orientation of the first fosmid end sequence must be "+" and\ the orientation of the second fosmid end sequence must be "-".

\ \

Methods

End sequences were trimmed at the NCBI using\ ssahaCLIP written by Jim Mullikin. Trimmed fosmid end sequences were\ placed on the assembled sequence using Jim Kent's \ blat \ program.

\ \

Credits

\

Sequencing of the fosmid ends was done at the \ Eli & Edythe L. Broad\ Institute of MIT and Harvard University. Clones are available through the\ BACPAC Resources\ Center at Children's Hospital Oakland Research Institute (CHORI).\

\ map 1 altColor 90,90,90\ color 0,0,0\ exonArrows off\ group map\ longLabel Fosmid End Pairs\ priority 18\ shortLabel Fosmid End Pairs\ track fosEndPairs\ type bed 6 +\ visibility hide\ encodeEgaspFullSpida SPIDA Exons genePred SPIDA Exon Predictions 0 18 100 12 100 177 133 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 100,12,100\ longLabel SPIDA Exon Predictions\ parent encodeEgaspFull\ priority 18\ shortLabel SPIDA Exons\ track encodeEgaspFullSpida\ encodeYaleAffyNB4RARNATarsIntronsDistal Yale In Dst NB4 bed 4 . Yale Intronic Distal NB4 Retinoic TARs 0 18 100 155 255 177 205 255 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 100,155,255\ longLabel Yale Intronic Distal NB4 Retinoic TARs\ parent encodeNoncodingTransFrags\ priority 18\ shortLabel Yale In Dst NB4\ subGroups region=intronicDistal celltype=nb4 source=yale\ track encodeYaleAffyNB4RARNATarsIntronsDistal\ encodeAffyChIpHl60PvalCtcfHr02 Affy CTCF RA 2h wig 0.0 534.54 Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 2hrs) P-Value 0 19 175 50 0 215 152 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 175,50,0\ longLabel Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 2hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 19\ shortLabel Affy CTCF RA 2h\ subGroups factor=CTCF time=2h\ track encodeAffyChIpHl60PvalCtcfHr02\ fosEndPairsBad Bad Fosmid End Pairs bed 6 + Orphan, Short and Incorrectly Oriented Fosmid End Pairs 0 19 0 0 0 90 90 90 0 0 0 map 1 altColor 90,90,90\ color 0,0,0\ exonArrows off\ group map\ longLabel Orphan, Short and Incorrectly Oriented Fosmid End Pairs\ priority 19\ shortLabel Bad Fosmid End Pairs\ track fosEndPairsBad\ type bed 6 +\ visibility hide\ encodeAffyEc1ProstateSignal EC1 Sgnl Prost wig 0 62385 Affy Ext Trans Signal (1-base window) (Prostate) 0 19 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 0 color 128,0,128\ longLabel Affy Ext Trans Signal (1-base window) (Prostate)\ parent encodeAffyEcSignal\ priority 19\ shortLabel EC1 Sgnl Prost\ track encodeAffyEc1ProstateSignal\ encodeAffyEc1ProstateSites EC1 Sites Prost bed 3 . Affy Ext Trans Sites (1-base window) (Prostate) 0 19 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 1 color 128,0,128\ longLabel Affy Ext Trans Sites (1-base window) (Prostate)\ parent encodeAffyEcSites\ priority 19\ shortLabel EC1 Sites Prost\ track encodeAffyEc1ProstateSites\ encodeEgaspFullTwinscan Twinscan genePred Twinscan Gene Predictions 0 19 12 20 150 133 137 202 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeGenes 1 color 12,20,150\ longLabel Twinscan Gene Predictions\ parent encodeEgaspFull\ priority 19\ shortLabel Twinscan\ track encodeEgaspFullTwinscan\ encodeYaleAffyNB4TPARNATarsIntronsDistal Yale In Dst NB4 TPA bed 4 . Yale Intronic Distal NB4 TPA-Treated TARs 0 19 75 180 255 165 217 255 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 75,180,255\ longLabel Yale Intronic Distal NB4 TPA-Treated TARs\ parent encodeNoncodingTransFrags\ priority 19\ shortLabel Yale In Dst NB4 TPA\ subGroups region=intronicDistal celltype=nb4 source=yale\ track encodeYaleAffyNB4TPARNATarsIntronsDistal\ encodeAffyRnaSignal Affy RNA Signal wig -1168.00 1686.5 Affymetrix PolyA+ RNA Signal 0 19.02 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows an estimate of RNA abundance (transcription) for\ all ENCODE regions. Retinoic acid-stimulated HL-60 cells were\ harvested after 0, 2, 8, and 32 hours. \ Purified cytosolic polyA+ RNA from unstimulated GM06990 and HeLa cells, \ as well as purified polyA+ RNA from the RA-stimulated HL-60 samples, \ was hybridized to Affymetrix ENCODE oligonucleotide\ tiling arrays, which have 25-mer probes tiled every 22 bp on\ average in the non-repetitive ENCODE regions. \ Composite signals are shown in\ separate subtracks for each cell type and for each of the four \ timepoints for RA-stimulated HL-60.

\

\ Data for all biological replicates can be downloaded from Affymetrix in \ wiggle,\ cel, and\ soft formats.

\ \

Display Conventions and Configuration

\

\ The subtracks within this composite annotation track\ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options for the subtracks \ are shown at the top of the track description page, followed by a list of \ subtracks. To show only selected subtracks, uncheck the boxes next to \ the tracks that you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\

\ Color differences among the subtracks are arbitrary. They provide a\ visual cue for distinguishing between the different cell types and \ timepoints.

\ \

Methods

\

\ The data from replicate arrays were quantile-normalized (Bolstad\ et al., 2003) and all arrays were scaled to a median array intensity\ of 22. Within a sliding 101 bp window centered on each probe, \ an estimate of RNA abundance (signal) was found by calculating the median \ of all pairwise average PM-MM values, where PM is a perfect match and MM is \ a mismatch. Both Kapranov et al. (2002) and Cawley \ et al. (2004) are good references for the experimental methods; \ Cawley et al. also describes the analytical methods.

\ \

Verification

\

\ Three independent biological replicates were generated and hybridized\ to duplicate arrays (two technical replicates). Transcribed regions\ were generated from the composite signal track by merging genomic positions\ to which probes are mapped. This merging was based on a 5% false\ positive rate cutoff in negative bacterial controls, a maximum\ gap (MaxGap) of 50 base-pairs and minimum run (MinRun) of 50 base-pairs (see\ the Affy TransFrags track for the merged regions).\

\ \

Credits

\

\ These data were generated and analyzed by the Gingeras/Struhl\ collaboration with the Tom Gingeras group at \ Affymetrix and the \ Kevin Struhl group at Harvard Medical School.

\ \

References

\

\ Please see the \ Affymetrix Transcriptome site for a project overview and\ additional references to Affymetrix tiling array publications.

\

\ Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. \ A comparison of normalization methods for high density \ oligonucleotide array data based on variance and bias. \ Bioinformatics 19(2), 185-193 (2003).

\

\ Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger,\ E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J.,\ Williams, A. J., et al. \ Unbiased mapping of transcription factor binding sites along \ human chromosomes 21 and 22 points to widespread regulation of noncoding \ RNAs. \ Cell 116(4), 499-509 (2004).

\

\ Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg,\ R. L., Fodor, S. P., and Gingeras, T. R. \ Large-scale transcriptional activity in chromosomes 21 and \ 22. \ Science 296(5569), 916-919 (2002).

\ encodeTxLevels 0 autoScale off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeTxLevels\ longLabel Affymetrix PolyA+ RNA Signal\ maxHeightPixels 128:16:16\ origAssembly hg16\ priority 19.02\ shortLabel Affy RNA Signal\ spanList 1\ track encodeAffyRnaSignal\ type wig -1168.00 1686.5\ viewLimits 0:25\ visibility hide\ encodeAffyRnaTransfrags Affy Transfrags bed 3 . Affymetrix PolyA+ RNA Transfrags 0 19.03 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows an estimate of RNA abundance (transcription) for\ all ENCODE regions. Retinoic acid-stimulated HL-60 cells were\ harvested after 0, 2, 8, and 32 hours. \ Purified cytosolic polyA+ RNA from unstimulated GM06990 and HeLa cells, \ as well as purified polyA+ RNA from the RA-stimulated HL-60 samples, \ was hybridized to Affymetrix ENCODE oligonucleotide\ tiling arrays, which have 25-mer probes tiled every 22 bp on\ average in the non-repetitive ENCODE regions. \ Clustered sites are shown in\ separate subtracks for each cell type and for each of the four \ timepoints for RA-stimulated HL-60.

\ \

Display Conventions and Configuration

\

\ To show only selected subtracks, uncheck the boxes next to the tracks \ that you wish to hide. \

\ Color differences among the subtracks are arbitrary. They provide a\ visual cue for distinguishing between the different cell types and \ timepoints.

\ \

Methods

\

\ The data from replicate arrays were quantile-normalized (Bolstad\ et al., 2003) and all arrays were scaled to a median array intensity\ of 22. Within a sliding 101 bp window centered on each probe, \ an estimate of RNA abundance (signal) was found by calculating the median \ of all pairwise average PM-MM values, where PM is a perfect match and MM is \ a mismatch. Both Kapranov et al. (2002) and Cawley \ et al. (2004) are good references for the experimental methods; \ Cawley et al. also describes the analytical methods.

\ \

Verification

\

\ Three independent biological replicates were generated and hybridized\ to duplicate arrays (two technical replicates). Transcribed regions (see the\ Affy RNA Signal track) were generated from the composite signal track by \ merging genomic positions to which probes are mapped. This merging was based \ on a 5% false positive rate cutoff in negative bacterial controls, a maximum\ gap (MaxGap) of 50 base-pairs and minimum run (MinRun) of 50 base-pairs.\

\ \

Credits

\

\ These data were generated and analyzed by the Gingeras/Struhl\ collaboration with the Tom Gingeras group at \ Affymetrix and the \ Kevin Struhl group at Harvard Medical School.

\ \

References

\

\ Please see the \ Affymetrix Transcriptome site for a project overview and\ additional references to Affymetrix tiling array publications.

\

\ Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. \ A comparison of normalization methods for high density \ oligonucleotide array data based on variance and bias. \ Bioinformatics 19(2), 185-193 (2003).

\

\ Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger,\ E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J.,\ Williams, A. J., et al. \ Unbiased mapping of transcription factor binding sites along \ human chromosomes 21 and 22 points to widespread regulation of noncoding \ RNAs. \ Cell 116(4), 499-509 (2004).

\

\ Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg,\ R. L., Fodor, S. P., and Gingeras, T. R. \ Large-scale transcriptional activity in chromosomes 21 and \ 22. \ Science 296(5569), 916-919 (2002).

\ encodeTxLevels 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeTxLevels\ longLabel Affymetrix PolyA+ RNA Transfrags\ origAssembly hg16\ priority 19.03\ shortLabel Affy Transfrags\ track encodeAffyRnaTransfrags\ type bed 3 .\ visibility hide\ encodeAffyChIpHl60SitesCtcfHr02 Affy CTCF RA 2h bed 3 . Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 2hrs) Sites 0 20 175 50 0 215 152 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 175,50,0\ longLabel Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 2hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 20\ shortLabel Affy CTCF RA 2h\ subGroups factor=CTCF time=2h\ track encodeAffyChIpHl60SitesCtcfHr02\ encodeAffyEc51ProstateSignal EC51 Sgnl Prost wig 0 62385 Affy Ext Trans Signal (51-base window) (Prostate) 0 20 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 0 color 128,0,128\ longLabel Affy Ext Trans Signal (51-base window) (Prostate)\ parent encodeAffyEcSignal\ priority 20\ shortLabel EC51 Sgnl Prost\ track encodeAffyEc51ProstateSignal\ encodeAffyEc51ProstateSites EC51 Site Prost bed 3 . Affy Ext Trans Sites (51-base window) (Prostate) 0 20 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 1 color 128,0,128\ longLabel Affy Ext Trans Sites (51-base window) (Prostate)\ parent encodeAffyEcSites\ priority 20\ shortLabel EC51 Site Prost\ track encodeAffyEc51ProstateSites\ fosEndPairsLong Long Fosmid End Pairs bed 6 + Long Fosmid End Pairs 0 20 0 0 0 90 90 90 0 0 0 map 1 altColor 90,90,90\ color 0,0,0\ exonArrows off\ group map\ longLabel Long Fosmid End Pairs\ priority 20\ shortLabel Long Fosmid End Pairs\ track fosEndPairsLong\ type bed 6 +\ visibility hide\ encodeYaleAffyNB4UntrRNATarsIntronsDistal Yale In Dst NB4 Un bed 4 . Yale Intronic Distal Untreated NB4 TARs 0 20 50 205 255 152 230 255 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 50,205,255\ longLabel Yale Intronic Distal Untreated NB4 TARs\ parent encodeNoncodingTransFrags\ priority 20\ shortLabel Yale In Dst NB4 Un\ subGroups region=intronicDistal celltype=nb4 source=yale\ track encodeYaleAffyNB4UntrRNATarsIntronsDistal\ encodeAffyChIpHl60PvalCtcfHr08 Affy CTCF RA 8h wig 0.0 534.54 Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 8hrs) P-Value 0 21 175 50 0 215 152 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 175,50,0\ longLabel Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 8hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 21\ shortLabel Affy CTCF RA 8h\ subGroups factor=CTCF time=8h\ track encodeAffyChIpHl60PvalCtcfHr08\ chr18deletions Chr18 Deletions bed 6 + Chromosome 18 Deletions 0 21 0 0 0 127 127 127 0 0 0 map 1 group map\ longLabel Chromosome 18 Deletions\ priority 21\ shortLabel Chr18 Deletions\ track chr18deletions\ type bed 6 +\ visibility hide\ encodeAffyEc1OvarySignal EC1 Sgnl Ovary wig 0 62385 Affy Ext Trans Signal (1-base window) (Ovary) 0 21 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 0 color 128,0,128\ longLabel Affy Ext Trans Signal (1-base window) (Ovary)\ parent encodeAffyEcSignal\ priority 21\ shortLabel EC1 Sgnl Ovary\ track encodeAffyEc1OvarySignal\ encodeAffyEc1OvarySites EC1 Sites Ovary bed 3 . Affy Ext Trans Sites (1-base window) (Ovary) 0 21 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 1 color 128,0,128\ longLabel Affy Ext Trans Sites (1-base window) (Ovary)\ parent encodeAffyEcSites\ priority 21\ shortLabel EC1 Sites Ovary\ track encodeAffyEc1OvarySites\ encodeYaleAffyNeutRNATarsAllIntronsDistal Yale In Dst Neu bed 4 . Yale Intronic Distal Neutrophil TARs 0 21 25 230 255 140 242 255 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 25,230,255\ longLabel Yale Intronic Distal Neutrophil TARs\ parent encodeNoncodingTransFrags\ priority 21\ shortLabel Yale In Dst Neu\ subGroups region=intronicDistal celltype=neut source=yale\ track encodeYaleAffyNeutRNATarsAllIntronsDistal\ encodeAffyChIpHl60SitesCtcfHr08 Affy CTCF RA 8h bed 3 . Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 8hrs) Sites 0 22 175 50 0 215 152 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 175,50,0\ longLabel Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 8hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 22\ shortLabel Affy CTCF RA 8h\ subGroups factor=CTCF time=8h\ track encodeAffyChIpHl60SitesCtcfHr08\ encodeBuFirstExon BU First Exon bed 12 + Boston University First Exon Activity 0 22 0 0 0 127 127 127 0 0 10 chr11,chr13,chr15,chr16,chr19,chr2,chr5,chr7,chr9,chrX,

Description

\

\ This track displays expression levels of computationally identified\ first exons and a constitutive exon of genes in ENCODE regions,\ based on the real competitive Polymerase Chain\ Reaction (rcPCR) technique described in Ding\ et al. (2003). \ Expression levels\ are indicated by color, ranging from black (no expression) to red (high\ expression).

\

\ Experiments were performed on total RNA samples of ten\ normal human tissues purchased from Clontech (Palo Alto, CA): \ cerebral cortex, colon, heart, kidney, liver, lung,\ skeletal muscle, spleen, stomach, and testis.

\

\ The name for each alternative transcript starts with the gene name,\ followed by an identifier for the alternative first exon or the\ constitutive exon. For example, for gene CAV1, there are three\ alternative first exons (CAV1-E1A, CAV1-E1B, and CAV1-E1C) and the\ third exon is chosen as the constitutively expressed exon (CAV1-E3).

\ \

Methods

\

\ Alternative transcription start sites (TSS) for 20 ENCODE genes were predicted\ using PromoSer, an in-house computational tool.\ PromoSer computationally identifies the TSS by considering alignments\ of a large number of partial and full-length mRNA sequences and ESTs to\ genomic DNA, with provision for alternative promoters. In PromoSer, the\ treatment of alternative first exons (or the resulting TSSs) is as\ follows: \

\

\ For each gene, all alternative first exons were identified based on manual\ selection of PromoSer predictions. An exon that is\ shared by all transcripts (called the constitutive exon) was also selected. \ The selection process involved visually\ examining the structure of the cluster, preferably using the latest\ data available on UCSC, to identify distinct first exons that were well\ formed (having multiple supporting sequences) and had no evidence\ (especially from newer sequences) of additional sequence that made\ them internal exons. After the first exon was identified, a subsequence \ (between 100-300 bases) was selected for use in the experiment. The\ selection process avoided repeat sequences as much as possible and if\ the two first exons partially overlapped, the non-overlapping region was\ selected. If those conditions caused the remaining sequence to be too\ short (or the first exon itself was too short), a junction with the\ second exon was used. A constitutive exon was also selected that was\ included in all (or most) of the alternative transcripts and \ suitable sequences were then extracted as above (no exon junctions are used).\

\

\ The absolute expression levels of all exons were individually quantified\ by rcPCR by designing four assays with PCR amplicons corresponding to\ each exon. \ Amplicons were designed according to transcript sequences\ and can span a large distance on the genomic sequence. In addition,\ some amplicons were designed across the junctions between first exons\ and the constitutive second exons, and thus these amplicons may overlap\ with the amplicons that correspond to the constitutive second exons. \

\

\ The rcPCR technique combined competitive PCR and matrix-assisted laser\ desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS)\ for gene expression analysis. To measure the expression level of a\ gene, an oligonucleotide standard (60-80 bases) of known concentration,\ complementary to the target sequence with a single base\ mismatch in the middle, was added as the competitor for PCR. The gene of\ interest and the oligonucleotide standard resembled two alleles of a\ heterozygous locus in an allele frequency analysis experiment, and thus\ could be quantified by the high-throughput MALDI-TOF MS\ based MassARRAY system (Sequenom Inc.).

\

\ After PCR, a base extension\ reaction was carried out with an extension primer, a ThermoSequenase and\ a mixture of ddNTPs/dNTP (for example, a mixture of\ ddA, ddC, ddT, and dG). The extension primer annealed the immediate\ 5’-upstream sequence of the mismatch position. Depending on the nature\ of the mismatch and the mixture composition of ddNTPs/dNTP, one or two\ bases were added to the extension primer, producing two extension\ products with one base-length difference. These two extension products\ were then detected and quantified by MALDI-TOF MS.

\

\ Expression ratios (e.g. CAV1-E1A/CAV1-E3, CAV1-E1B/CAV1-E3,\ CAV1-E1C/CAV1-E3) indicate the relative abundance of \ alternative first exons. \ 18S rRNA was used for exon absolute expression\ normalization among different tissues.

\

\ Values shown on this track represent the relative abundance of the\ alternative first exons with respect to the 18S rRNA. The raw values have\ been log10 transformed and scaled to show graded colors on the browser.

\ \

Verification

\

\ One biological replicate was performed for each gene. Two to four\ competitor concentrations were used to detect the expression level\ of each exon. Two to six technical replicates were performed for\ each competitor concentration. One more biological replicate will be\ performed in the future.

\ \

Credits

\

\ Data generation and analysis for this track were performed by \ ZLAB\ at Boston University. The following people contributed: Shengnan Jin,\ Anason Halees, Heather Burden, Yutao Fu, Ulas Karaoz, Yong Yu, Chunming\ Ding, Charles R. Cantor, and Zhiping Weng.

\ \

References

\

\ Ding, C. and Cantor, C.R. \ A\ high-throughput gene expression analysis technique using competitive PCR and \ matrix-assisted laser desorption ionization time-of-flight MS.\ Proc Natl Acad Sci U S A 100(6), 3059-64 (2003).

\

Ding, C. and Cantor, C.R. \ Direct molecular haplotyping of long-range genomic DNA with\ M1-PCR. \ Proc Natl Acad Sci U S A 100(13), 7449-53 (2003).

\

Halees, A.S., Leyfer, D. and Weng, Z. \ PromoSer: A large-scale mammalian promoter and transcription \ start site identification service. \ Nucleic Acids Res. 31(13), 3554-9 (2003).

\

Halees, A.S. and Weng, Z. \ PromoSer: improvements to the algorithm, visualization and\ accessibility.\ Nucleic Acids Res., 32, W191-W194 (2004).

\ encodeTxLevels 1 chromosomes chr11,chr13,chr15,chr16,chr19,chr2,chr5,chr7,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeTxLevels\ itemRgb on\ longLabel Boston University First Exon Activity\ origAssembly hg16\ priority 22.0\ shortLabel BU First Exon\ track encodeBuFirstExon\ type bed 12 +\ visibility hide\ encodeAffyEc51OvarySignal EC51 Sgnl Ovary wig 0 62385 Affy Ext Trans Signal (51-base window) (Ovary) 0 22 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 0 color 128,0,128\ longLabel Affy Ext Trans Signal (51-base window) (Ovary)\ parent encodeAffyEcSignal\ priority 22\ shortLabel EC51 Sgnl Ovary\ track encodeAffyEc51OvarySignal\ encodeAffyEc51OvarySites EC51 Site Ovary bed 3 . Affy Ext Trans Sites (51-base window) (Ovary) 0 22 128 0 128 191 127 191 0 0 2 chr21,chr22, encodeTxLevels 1 color 128,0,128\ longLabel Affy Ext Trans Sites (51-base window) (Ovary)\ parent encodeAffyEcSites\ priority 22\ shortLabel EC51 Site Ovary\ track encodeAffyEc51OvarySites\ isochores Isochores bed 4 + GC-Rich (dark) and AT-Rich (light) Isochores 0 22 0 0 0 127 127 127 1 0 0

What's an Isochore

\

Isochores describe a region of a chromosome where the CG-content is\ either higher or lower than the whole genome average (42%). A CG-rich\ isochore is given a dark color, while a CG-poor isochore is a light\ color.

\

Isochores were determined by first calculating the CG-content of 100,000 bp\ windows across the genome. These windows were either labeled H or L\ depending on whether the window contained a higher or lower GC-content\ than average. A two-state HMM was created in which one state represented\ GC-rich regions, and the other GC-poor. It was trained using the first 12\ chromosomes. The trained HMM was used to generate traces over all chromosomes.\ These traces define the boundaries of the isochores,\ and their type (GC-rich or AT-rich).

\ map 1 group map\ longLabel GC-Rich (dark) and AT-Rich (light) Isochores\ priority 22\ shortLabel Isochores\ spectrum on\ track isochores\ type bed 4 +\ visibility hide\ encodeYaleAffyPlacRNATarsIntronsDistal Yale In Dst Plac bed 4 . Yale Intronic Distal Placental TARs 0 22 0 255 255 127 255 255 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 0,255,255\ longLabel Yale Intronic Distal Placental TARs\ parent encodeNoncodingTransFrags\ priority 22\ shortLabel Yale In Dst Plac\ subGroups region=intronicDistal celltype=plac source=yale\ track encodeYaleAffyPlacRNATarsIntronsDistal\ encodeAffyChIpHl60PvalCtcfHr32 Affy CTCF RA 32h wig 0.0 534.54 Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 32hrs) P-Value 0 23 175 50 0 215 152 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 175,50,0\ longLabel Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 32hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 23\ shortLabel Affy CTCF RA 32h\ subGroups factor=CTCF time=32h\ track encodeAffyChIpHl60PvalCtcfHr32\ encodeAffyRnaGm06990SitesIntergenicProximal Affy Ig Prx GM06990 bed 4 . Affymetrix Intergenic Proximal GM06990 Transfrags 0 23 131 191 7 193 223 131 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 131,191,7\ longLabel Affymetrix Intergenic Proximal GM06990 Transfrags\ parent encodeNoncodingTransFrags\ priority 23\ shortLabel Affy Ig Prx GM06990\ subGroups region=intergenicProximal celltype=gm06990 source=affy\ track encodeAffyRnaGm06990SitesIntergenicProximal\ encodeAffyEc1HeLaC1S3Signal EC1 Sgnl HeLa wig 0 62385 Affy Ext Trans Signal (1-base window) (HeLa C1S3) 0 23 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 0 color 0,0,205\ longLabel Affy Ext Trans Signal (1-base window) (HeLa C1S3)\ parent encodeAffyEcSignal\ priority 23\ shortLabel EC1 Sgnl HeLa\ track encodeAffyEc1HeLaC1S3Signal\ encodeAffyEc1HeLaC1S3Sites EC1 Sites HeLa bed 3 . Affy Ext Trans Sites (1-base window) (HeLa C1S3) 0 23 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 1 color 0,0,205\ longLabel Affy Ext Trans Sites (1-base window) (HeLa C1S3)\ parent encodeAffyEcSites\ priority 23\ shortLabel EC1 Sites HeLa\ track encodeAffyEc1HeLaC1S3Sites\ gcPercent GC% 20K bed 4 + Percentage GC in 20,000-Base Windows 0 23 0 0 0 127 127 127 1 0 0

Description

\

\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in a 20,000 base window. Windows with high GC content are drawn more darkly \ than windows with low GC content. High GC content is typically associated with \ gene-rich areas.\

\

Credits

\

\ This track was generated at UCSC.\ map 1 group map\ longLabel Percentage GC in 20,000-Base Windows\ priority 23\ shortLabel GC% 20K\ spectrum on\ track gcPercent\ type bed 4 +\ visibility hide\ encodeRikenCage Riken CAGE bedGraph 4 Riken CAGE - Predicted Gene Start Sites 0 23 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows the number of 5' cap analysis gene expression (CAGE) tags \ that map to the genome on the "plus" and "minus" strands at \ a specific location. For clarity, only the first 5' nucleotide in the tag \ (relative to the transcript direction) is considered. Areas in which many tags \ map to the same region may indicate a significant transcription start site.

\ \

Display Conventions and Configuration

\

\ The position of the first 5' nucleotide in the tag is represented by a solid\ block. The height of the block indicates the number of 5' cDNA starts that map\ at that location.

\

\ This composite annotation track contains multiple subtracks that \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link. To display only selected subtracks, uncheck the \ boxes next to the tracks you wish to hide.

\ \

Methods

\

\ The CAGE tags are sequenced from the 5' ends of full-length cDNAs produced using RIKEN full-length cDNA technology. To create the tag, a linker was \ attached to the 5' end of full-length cDNAs which were selected by cap \ trapping. The first 20 bp of the cDNA were cleaved using class II restriction \ enzymes, followed by PCR amplification and then concatamers of the resulting\ 32 bp tags were formed for more efficient sequencing. For more information on \ CAGE analysis, see Shiraki et al. (2003) below. Refer to the \ RIKEN website\ for information about RIKEN full-length cDNA technologies. The mapping \ methodology employed in this annotation will be described in upcoming \ publications.

\ \

Verification

\

\ The techniques used to verify these data will be described in upcoming \ publications.

\ \

Credits

\

\ These data were contributed by the Functional Annotation of Mouse \ (FANTOM) \ Consortium, RIKEN Genome Science Laboratory and \ RIKEN Genome Exploration Research Group \ (Genome Network Project Core Group).

\

\ FANTOM Consortium: P. Carninci, T. Kasukawa, S. Katayama, Gough, \ M. Frith, N. Maeda, R. Oyama, T. Ravasi, B. Lenhard, C. Wells, R. \ Kodzius, K. Shimokawa, V. B. Bajic, S. E. Brenner, S. Batalov, A. R. R. \ Forrest, M. Zavolan, M. J. Davis, L. G. Wilming, V. Aidinis, J. Allen, \ A. Ambesi-Impiombato, R. Apweiler, R. N. Aturaliya, T. L. Bailey, M. \ Bansal, K. W. Beisel, T. Bersano, H. Bono, A. M. Chalk, K. P. Chiu, V. \ Choudhary, A. Christoffels, D. R. Clutterbuck, M. L. Crowe, E. Dalla, \ B. P. Dalrymple, B. de Bono, G. Della Gatta, D. di Bernardo, T. Down, \ P. Engstrom, M. Fagiolini, G. Faulkner, C. F. Fletcher, T. Fukushima, \ M. Furuno, S. Futaki, M. Gariboldi, P. Georgii-Hemming, T. R. Gingeras, \ T. Gojobori, R. E. Green, S. Gustincich, M. Harbers, V. Harokopos, Y. \ Hayashi, S. Henning, T. K. Hensch, N. Hirokawa, D. Hill, L. Huminiecki, \ M. Iacono, K. Ikeo, A. Iwama, T. Ishikawa, M. Jakt, A. Kanapin, M. \ Katoh, Y. Kawasawa, J. Kelso, H. Kitamura, H. Kitano, G. Kollias, S. \ P. T. Krishnan, A.F. Kruger, K. Kummerfeld, I. V. Kurochkin, \ L. F. Lareau, L. Lipovich, J. Liu, S. Liuni, S. McWilliam, M. Madan \ Babu, M. Madera, L. Marchionni, H. Matsuda, S. Matsuzawa, H. Miki, F. \ Mignone, S. Miyake, K. Morris, S. Mottagui-Tabar, N. Mulder, N. Nakano, \ H. Nakauchi, P. Ng, R. Nilsson, S. Nishiguchi, S. Nishikawa, F. Nori, \ O. Ohara, Y. Okazaki, V. Orlando, K. C. Pang, W. J. Pavan, G. Pavesi, \ G. Pesole, N. Petrovsky, S. Piazza, W. Qu, J. Reed, J. F. Reid, B. Z. \ Ring, M. Ringwald, B. Rost, Y. Ruan, S. Salzberg, A. Sandelin, C. \ Schneider, C. Schoenbach, K. Sekiguchi, C. A. M. Semple, S. Seno, \ L. Sessa, Y. Sheng, Y. Shibata, H. Shimada, K. Shimada, B. Sinclair, S. \ Sperling, E. Stupka, K. Sugiura, R. Sultana, Y. Takenaka, K. Taki, K. \ Tammoja, S. L. Tan, S. Tang, M. S. Taylor, J. Tegner, S. A. Teichmann, \ H. R. Ueda, E. van Nimwegene, R. Verardo, C. L. Wei, K. Yagi, H. \ Yamanishi, E. Zabarovsky, S. Zhu, A. Zimmer, W. Hide, C. Bult, S. M. \ Grimmond, R. D. Teasdale, E. T. Liu, V. Brusic, J. Quackenbush, C. \ Wahlestedt, J. Mattick, D. Hume.

\

\ RIKEN Genome Exploration Research Group: C. Kai, D. Sasaki, Y. \ Tomaru, S. Fukuda, M. Kanamori-Katayama, M. Suzuki, J. Aoki, T. \ Arakawa, J. Iida, K. Imamura, M. Itoh, T. Kato, H. Kawaji, N. \ Kawagashira, T. Kawashima, M. Kojima, S. Kondo, H. Konno, K. Nakano, N. \ Ninomiya, T. Nishio, M. Okada, C. Plessy, K. Shibata, T. Shiraki, S. \ Suzuki, M. Tagami, K Waki, A. Watahiki, Y. Okamura-Oho, H. Suzuki, J. \ Kawai.

\

\ General Organizer: Y. Hayashizaki\ \

References

\

\ Shiraki, T., Kondo, S., Katayama, S., Waki, K., Kasukawa, T., Kawaji, H., \ Kodzius, R., Watahiki, A., Nakamura, M. et al.\ Cap analysis gene expression for high-throughput analysis of \ transcriptional starting point and identification of promoter usage.\ Proc Natl Acad Sci U S A. 100(26), 15776-81 (2003).

\ encodeTxLevels 0 autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeTxLevels\ longLabel Riken CAGE - Predicted Gene Start Sites\ maxHeightPixels 128:16:16\ maxLimit 4316\ minLimit 1\ origAssembly hg16\ priority 23.0\ shortLabel Riken CAGE\ track encodeRikenCage\ type bedGraph 4\ viewLimits 1.0:10.0\ visibility hide\ windowingFunction mean\ gc5BaseBw GC Percent bigWig 0 100 GC Percent in 5-Base Windows 0 23.5 0 0 0 128 128 128 0 0 0

Description

\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in 5-base windows. High GC content is typically associated with\ gene-rich areas.\

\

\ This track may be configured in a variety of ways to highlight different aspects \ of the displayed information. Click the "Graph configuration help" link\ for an explanation of the configuration options.\ \

Credits

\

The data and presentation of this graph were prepared by\ Hiram Clawson.\ \ map 0 altColor 128,128,128\ autoScale Off\ color 0,0,0\ graphTypeDefault Bar\ gridDefault OFF\ group map\ html gc5Base\ longLabel GC Percent in 5-Base Windows\ maxHeightPixels 128:36:16\ priority 23.5\ shortLabel GC Percent\ track gc5BaseBw\ type bigWig 0 100\ viewLimits 30:70\ visibility hide\ windowingFunction Mean\ gc5Base GC Percent wig 0 100 Percentage GC in 5-Base Windows 0 23.5 0 0 0 128 128 128 0 0 0

Description

\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in 5-base windows. High GC content is typically associated with\ gene-rich areas.\

\

\ This track may be configured in a variety of ways to highlight different aspects \ of the displayed information. Click the "Graph configuration help" link\ for an explanation of the configuration options.\ \

Credits

\

The data and presentation of this graph were prepared by\ Hiram Clawson.\ \ map 0 altColor 128,128,128\ autoScale Off\ color 0,0,0\ graphTypeDefault Bar\ gridDefault OFF\ group map\ longLabel Percentage GC in 5-Base Windows\ maxHeightPixels 128:36:16\ priority 23.5\ shortLabel GC Percent\ spanList 5,1000\ track gc5Base\ type wig 0 100\ viewLimits 30:70\ visibility hide\ windowingFunction Mean\ gc5Win20K GC% Win20K wig 0 100 GC Percent in 5 Bases Smoothed to 20,000-Base Windows 0 23.6 0 128 255 255 128 0 0 0 0

Description

\

\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in a 5-base windows. High GC content is typically associated with\ gene-rich areas.\

\

\ This track was produced by measuring GC percent in 5 bases, then running\ a 20,000 base smoothing window over that data to indicate the average\ GC percent in the 20,000 base window, at each 5 base interval.\ Thus, each 5 base point in the graph represents the average GC percent\ in the next 20,000 bases.\

\

\ This track may be configured in a variety of ways to highlight different aspects \ of the displayed information. Click the \ Graph \ configuration help link for an explanation of the configuration options.

\ \

Credits

\

The data and presentation of this graph were prepared by\ Hiram Clawson (hiram@soe.ucsc.\ edu).\ map 0 altColor 255,128,0\ autoScale Off\ color 0,128,255\ graphTypeDefault Bar\ gridDefault OFF\ group map\ longLabel GC Percent in 5 Bases Smoothed to 20,000-Base Windows\ maxHeightPixels 128:36:16\ priority 23.6\ shortLabel GC% Win20K\ spanList 5\ track gc5Win20K\ type wig 0 100\ viewLimits 30:70\ visibility hide\ windowingFunction Mean\ qualityBw Quality Scores bigWig 0 100 Human Sequencing Quality Scores 0 23.6 0 128 255 255 128 0 0 0 0

Description

\

\ The Quality Scores track shows the sequencing quality score \ (range: 0 to 99) of each base in the assembly. \ The height at each position of the track \ indicates the quality of the base. \ When zoomed out to a large range, the heights reflect the averaged scores. \ Scores of 40 or higher reflect high confidence in the sequence (with an error rate of less than \ 1/10,000); scores of 20 or higher reflect reasonable confidence (of working draft \ quality).\

\

\ This track may be configured in a variety of ways to highlight different aspects \ of the displayed information. Click the \ Graph \ configuration help link for an explanation of the configuration options.

\ \

Credits

\

\ The quality scores were provided as part of the human assembly. \ The database representation and graphical display code were written by\ Hiram Clawson.\ map 0 altColor 255,128,0\ autoScale Off\ color 0,128,255\ graphTypeDefault Bar\ gridDefault OFF\ group map\ html quality\ longLabel $Organism Sequencing Quality Scores\ maxHeightPixels 128:36:16\ priority 23.6\ shortLabel Quality Scores\ track qualityBw\ type bigWig 0 100\ visibility hide\ windowingFunction Mean\ quality Quality Scores wig 0 100 Human Sequencing Quality Scores 0 23.6 0 128 255 255 128 0 0 0 0

Description

\

\ The Quality Scores track shows the sequencing quality score \ (range: 0 to 99) of each base in the assembly. \ The height at each position of the track \ indicates the quality of the base. \ When zoomed out to a large range, the heights reflect the averaged scores. \ Scores of 40 or higher reflect high confidence in the sequence (with an error rate of less than \ 1/10,000); scores of 20 or higher reflect reasonable confidence (of working draft \ quality).\

\

\ This track may be configured in a variety of ways to highlight different aspects \ of the displayed information. Click the \ Graph \ configuration help link for an explanation of the configuration options.

\ \

Credits

\

\ The quality scores were provided as part of the human assembly. \ The database representation and graphical display code were written by\ Hiram Clawson.\ map 0 altColor 255,128,0\ autoScale Off\ color 0,128,255\ graphTypeDefault Bar\ gridDefault OFF\ group map\ longLabel $Organism Sequencing Quality Scores\ maxHeightPixels 128:36:16\ priority 23.6\ shortLabel Quality Scores\ spanList 1,1024\ track quality\ type wig 0 100\ visibility hide\ windowingFunction Mean\ gc5Win100K GC% Win100K wig 0 100 GC Percent in 5 Bases Smoothed to 100,000-Base Windows 0 23.7 0 128 255 255 128 0 0 0 0 map 0 altColor 255,128,0\ autoScale Off\ color 0,128,255\ graphTypeDefault Bar\ gridDefault OFF\ group map\ longLabel GC Percent in 5 Bases Smoothed to 100,000-Base Windows\ maxHeightPixels 128:36:16\ priority 23.7\ shortLabel GC% Win100K\ spanList 5\ track gc5Win100K\ type wig 0 100\ viewLimits 30:70\ visibility hide\ windowingFunction Mean\ encodeAffyChIpHl60SitesCtcfHr32 Affy CTCF RA 32h bed 3 . Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 32hrs) Sites 0 24 175 50 0 215 152 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 175,50,0\ longLabel Affymetrix ChIP/Chip (CTCF retinoic acid-treated HL-60, 32hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 24\ shortLabel Affy CTCF RA 32h\ subGroups factor=CTCF time=32h\ track encodeAffyChIpHl60SitesCtcfHr32\ encodeAffyRnaHeLaSitesIntergenicProximal Affy Ig Prx HeLa bed 4 . Affymetrix Intergenic Proximal Hela Transfrags 0 24 137 185 19 196 220 137 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 137,185,19\ longLabel Affymetrix Intergenic Proximal Hela Transfrags\ parent encodeNoncodingTransFrags\ priority 24\ shortLabel Affy Ig Prx HeLa\ subGroups region=intergenicProximal celltype=hela source=affy\ track encodeAffyRnaHeLaSitesIntergenicProximal\ encodeAffyEc51HeLaC1S3Signal EC51 Sgnl HeLa wig 0 62385 Affy Ext Trans Signal (51-base window) (HeLa C1S3) 0 24 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 0 color 0,0,205\ longLabel Affy Ext Trans Signal (51-base window) (HeLa C1S3)\ parent encodeAffyEcSignal\ priority 24\ shortLabel EC51 Sgnl HeLa\ track encodeAffyEc51HeLaC1S3Signal\ encodeAffyEc51HeLaC1S3Sites EC51 Site HeLa bed 3 . Affy Ext Trans Sites (51-base window) (HeLa C1S3) 0 24 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 1 color 0,0,205\ longLabel Affy Ext Trans Sites (51-base window) (HeLa C1S3)\ parent encodeAffyEcSites\ priority 24\ shortLabel EC51 Site HeLa\ track encodeAffyEc51HeLaC1S3Sites\ gcPercentSmall GC % 100b bed 4 + Percentage GC in 100-Base Windows 0 24 0 0 0 127 127 127 1 0 0 map 1 group map\ longLabel Percentage GC in 100-Base Windows\ priority 24\ shortLabel GC % 100b\ spectrum on\ track gcPercentSmall\ type bed 4 +\ visibility hide\ encodeRikenCageMappedTagsScore Riken CAGE MT bedGraph 4 Riken CAGE Mapped Tags overlap count - TEST TRACK ONLY 0 24 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

TEST TRACK ONLY

\

\ This track shows the number of 5' CAGE tags mapped to the genome on the \ plus and minus strands. At each base in the genome, the count is the number\ of CAGE tags that overlap at this base.\

Methods

\

\ CAGE tags are sequenced 5' ends of full length cDNA (using RIKEN fl \ cDNA technology). Mapping methodology will be described in upcoming \ publications\ \

Verification

\

\ Verification will be described in upcoming publications\ \

Credits

\

\ \ The FANTOM Consortium,\ Riken Genome Science Laboratory and Riken Genome \ Exploration Research Group (Genome Network Project Core Group)\ \ FANTOM Consortium: P. Carninci, T. Kasukawa, S. Katayama, Gough, \ M. Frith, N. Maeda, R. Oyama, T. Ravasi, B. Lenhard, C. Wells, R. \ Kodzius, K. Shimokawa, V. B. Bajic, S. E. Brenner, S. Batalov, A. R. R. \ Forrest, M. Zavolan, M. J. Davis, L. G. Wilming, V. Aidinis, J. Allen, \ A. Ambesi-Impiombato, R. Apweiler, R. N. Aturaliya, T. L. Bailey, M. \ Bansal, K. W. Beisel, T. Bersano, H. Bono, A. M. Chalk, K. P. Chiu, V. \ Choudhary, A. Christoffels, D. R. Clutterbuck, M. L. Crowe, E. Dalla, \ B. P. Dalrymple, B. de Bono, G. Della Gatta, D. di Bernardo, T. Down, \ P. Engstrom, M. Fagiolini, G. Faulkner, C. F. Fletcher, T. Fukushima, \ M. Furuno, S. Futaki, M. Gariboldi, P. Georgii-Hemming, T. R. Gingeras, \ T. Gojobori, R. E. Green, S. Gustincich, M. Harbers, V. Harokopos, Y. \ Hayashi, S. Henning, T. K. Hensch, N. Hirokawa, D. Hill, L. Huminiecki, \ M. Iacono, K. Ikeo, A. Iwama, T. Ishikawa, M. Jakt, A. Kanapin, M. \ Katoh, Y. Kawasawa, J. Kelso, H. Kitamura, H. Kitano, G. Kollias, S. \ P. T. Krishnan, A.F. Kruger, S.K. Kummerfeld, I. V. Kurochkin, \ L. F. Lareau, L. Lipovich, J. Liu, S. Liuni, S. McWilliam, M. Madan \ Babu, M. Madera, L. Marchionni, H. Matsuda, S. Matsuzawa, H. Miki, F. \ Mignone, S. Miyake, K. Morris, S. Mottagui-Tabar, N. Mulder, N. Nakano, \ H. Nakauchi, P. Ng, R. Nilsson, S. Nishiguchi, S. Nishikawa, F. Nori, \ O. Ohara, Y. Okazaki, V. Orlando, K. C. Pang, W. J. Pavan, G. Pavesi, \ G. Pesole, N. Petrovsky, S. Piazza, W. Qu, J. Reed, J. F. Reid, B. Z. \ Ring, M. Ringwald, B. Rost, Y. Ruan, S. Salzberg, A. Sandelin, C. \ Schneider, C. Sch.ANvnbach, K. Sekiguchi, C. A. M. Semple, S. Seno, \ L. Sessa, Y. Sheng, Y. Shibata, H. Shimada, K. Shimada, B. Sinclair, S. \ Sperling, E. Stupka, K. Sugiura, R. Sultana, Y. Takenaka, K. Taki, K. \ Tammoja, S. L. Tan, S. Tang, M. S. Taylor, J. Tegner, S. A. Teichmann, \ H. R. Ueda, E. van Nimwegene, R. Verardo, C. L. Wei, K. Yagi, H. \ Yamanishi, E. Zabarovsky, S. Zhu, A. Zimmer, W. Hide, C. Bult, S. M. \ Grimmond, R. D. Teasdale, E. T. Liu, V. Brusic, J. Quackenbush, C. \ Wahlestedt, J. Mattick, D. Hume. \ \ RIKEN Genome Exploration Research Group: C. Kai, D. Sasaki, Y. \ Tomaru, S. Fukuda, M. Kanamori-Katayama, M. Suzuki, J. Aoki, T. \ Arakawa, J. Iida, K. Imamura, M. Itoh, T. Kato, H. Kawaji, N. \ Kawagashira, T. Kawashima, M. Kojima, S. Kondo, H. Konno, K. Nakano, N. \ Ninomiya, T. Nishio, M. Okada, C. Plessy, K. Shibata, T. Shiraki, S. \ Suzuki, M. Tagami, K Waki, A. Watahiki, Y. Okamura-Oho, H. Suzuki, J. \ Kawai, \ \ General Organizer: Y. Hayashizaki\ \

References

\

\ Cap analysis gene expression for high-throughput analysis of \ transcriptional starting point and identification of promoter usage.\ Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H, Kodzius \ R, Watahiki A, Nakamura M, Arakawa T, Fukuda S, Sasaki D, Podhajska A, \ Harbers M, Kawai J, Carninci P, Hayashizaki Y.\ Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15776-81\ encodeTxLevels 0 autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeTxLevels\ longLabel Riken CAGE Mapped Tags overlap count - TEST TRACK ONLY\ maxHeightPixels 128:16:16\ maxLimit 5945\ minLimit 1\ origAssembly hg16\ priority 24.0\ shortLabel Riken CAGE MT\ track encodeRikenCageMappedTagsScore\ type bedGraph 4\ viewLimits 0.0:10.0\ visibility hide\ windowingFunction mean\ encodeAffyChIpHl60PvalH3K27me3Hr00 Affy H3K27me3 RA 0h wig 0.0 534.54 Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 0hrs) P-Value 0 25 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 150,75,0\ longLabel Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 0hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 25\ shortLabel Affy H3K27me3 RA 0h\ subGroups factor=H3K27me3 time=0h\ track encodeAffyChIpHl60PvalH3K27me3Hr00\ encodeAffyRnaHl60SitesHr00IntergenicProximal Affy Ig Prx HL60 0h bed 4 . Affymetrix Intergenic Proximal HL60 Transfrags 0 25 143 179 31 199 217 143 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 143,179,31\ longLabel Affymetrix Intergenic Proximal HL60 Transfrags\ parent encodeNoncodingTransFrags\ priority 25\ shortLabel Affy Ig Prx HL60 0h\ subGroups region=intergenicProximal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr00IntergenicProximal\ encodeAffyEc1GM06990Signal EC1 Sgnl GM0699 wig 0 62385 Affy Ext Trans Signal (1-base window) (GM06990) 0 25 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 0 color 0,0,205\ longLabel Affy Ext Trans Signal (1-base window) (GM06990)\ parent encodeAffyEcSignal\ priority 25\ shortLabel EC1 Sgnl GM0699\ track encodeAffyEc1GM06990Signal\ encodeAffyEc1GM06990Sites EC1 Sites GM0699 bed 3 . Affy Ext Trans Sites (1-base window) (GM06990) 0 25 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 1 color 0,0,205\ longLabel Affy Ext Trans Sites (1-base window) (GM06990)\ parent encodeAffyEcSites\ priority 25\ shortLabel EC1 Sites GM0699\ track encodeAffyEc1GM06990Sites\ GCwiggle GC Samples sample GC Percent Sample Track (every 20,000 bases) 0 25 0 0 0 127 127 127 0 0 1 chr22, map 0 chromosomes chr22,\ group map\ longLabel GC Percent Sample Track (every 20,000 bases)\ priority 25\ shortLabel GC Samples\ track GCwiggle\ type sample\ visibility hide\ encodeRikenCageMappedTags Riken CAGE Tags bed 9 . Riken CAGE Mapped Tags - TEST TRACK ONLY 0 25 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows 5' CAGE tags mapped to the genome - TEST TRACK ONLY\ \

Methods

\

\ CAGE tags are sequenced 5' ends of full length cDNA (using RIKEN fl \ cDNA technology). Mapping methodology will be described in upcoming \ publications\ \

Verification

\

\ Verification will be described in upcoming publications.

\ \

Credits

\

\ These data were contributed by the Functional Annotation of Mouse (FANTOM) \ Consortium, Riken Genome Science Laboratory and \ Riken Genome Exploration Research Group \ (Genome Network Project Core Group).

\

\ FANTOM Consortium: P. Carninci, T. Kasukawa, S. Katayama, Gough, \ M. Frith, N. Maeda, R. Oyama, T. Ravasi, B. Lenhard, C. Wells, R. \ Kodzius, K. Shimokawa, V. B. Bajic, S. E. Brenner, S. Batalov, A. R. R. \ Forrest, M. Zavolan, M. J. Davis, L. G. Wilming, V. Aidinis, J. Allen, \ A. Ambesi-Impiombato, R. Apweiler, R. N. Aturaliya, T. L. Bailey, M. \ Bansal, K. W. Beisel, T. Bersano, H. Bono, A. M. Chalk, K. P. Chiu, V. \ Choudhary, A. Christoffels, D. R. Clutterbuck, M. L. Crowe, E. Dalla, \ B. P. Dalrymple, B. de Bono, G. Della Gatta, D. di Bernardo, T. Down, \ P. Engstrom, M. Fagiolini, G. Faulkner, C. F. Fletcher, T. Fukushima, \ M. Furuno, S. Futaki, M. Gariboldi, P. Georgii-Hemming, T. R. Gingeras, \ T. Gojobori, R. E. Green, S. Gustincich, M. Harbers, V. Harokopos, Y. \ Hayashi, S. Henning, T. K. Hensch, N. Hirokawa, D. Hill, L. Huminiecki, \ M. Iacono, K. Ikeo, A. Iwama, T. Ishikawa, M. Jakt, A. Kanapin, M. \ Katoh, Y. Kawasawa, J. Kelso, H. Kitamura, H. Kitano, G. Kollias, S. \ P. T. Krishnan, A.F. Kruger, S.K. Kummerfeld, I. V. Kurochkin, \ L. F. Lareau, L. Lipovich, J. Liu, S. Liuni, S. McWilliam, M. Madan \ Babu, M. Madera, L. Marchionni, H. Matsuda, S. Matsuzawa, H. Miki, F. \ Mignone, S. Miyake, K. Morris, S. Mottagui-Tabar, N. Mulder, N. Nakano, \ H. Nakauchi, P. Ng, R. Nilsson, S. Nishiguchi, S. Nishikawa, F. Nori, \ O. Ohara, Y. Okazaki, V. Orlando, K. C. Pang, W. J. Pavan, G. Pavesi, \ G. Pesole, N. Petrovsky, S. Piazza, W. Qu, J. Reed, J. F. Reid, B. Z. \ Ring, M. Ringwald, B. Rost, Y. Ruan, S. Salzberg, A. Sandelin, C. \ Schneider, C. Sch.ANvnbach, K. Sekiguchi, C. A. M. Semple, S. Seno, \ L. Sessa, Y. Sheng, Y. Shibata, H. Shimada, K. Shimada, B. Sinclair, S. \ Sperling, E. Stupka, K. Sugiura, R. Sultana, Y. Takenaka, K. Taki, K. \ Tammoja, S. L. Tan, S. Tang, M. S. Taylor, J. Tegner, S. A. Teichmann, \ H. R. Ueda, E. van Nimwegene, R. Verardo, C. L. Wei, K. Yagi, H. \ Yamanishi, E. Zabarovsky, S. Zhu, A. Zimmer, W. Hide, C. Bult, S. M. \ Grimmond, R. D. Teasdale, E. T. Liu, V. Brusic, J. Quackenbush, C. \ Wahlestedt, J. Mattick, D. Hume.

\

\ RIKEN Genome Exploration Research Group: C. Kai, D. Sasaki, Y. \ Tomaru, S. Fukuda, M. Kanamori-Katayama, M. Suzuki, J. Aoki, T. \ Arakawa, J. Iida, K. Imamura, M. Itoh, T. Kato, H. Kawaji, N. \ Kawagashira, T. Kawashima, M. Kojima, S. Kondo, H. Konno, K. Nakano, N. \ Ninomiya, T. Nishio, M. Okada, C. Plessy, K. Shibata, T. Shiraki, S. \ Suzuki, M. Tagami, K Waki, A. Watahiki, Y. Okamura-Oho, H. Suzuki, J. \ Kawai.

\

\ General Organizer: Y. Hayashizaki\ \

References

\

\ Shiraki, T., Kondo, S., Katayama, S., Waki, K., Kasukawa, T., Kawaji, H., \ Kodzius, R., Watahiki, A., Nakamura, M. et al.\ Cap analysis gene expression for high-throughput analysis of \ transcriptional starting point and identification of promoter usage.\ Proc Natl Acad Sci U S A. 100(26), 15776-81 (2003).

\ encodeTxLevels 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ dataVersion ENCODE June 2005 Freeze\ group encodeTxLevels\ itemRgb On\ longLabel Riken CAGE Mapped Tags - TEST TRACK ONLY\ origAssembly hg16\ priority 25.0\ shortLabel Riken CAGE Tags\ track encodeRikenCageMappedTags\ type bed 9 .\ visibility hide\ encodeAffyChIpHl60SitesH3K27me3Hr00 Affy H3K27me3 RA 0h bed 3 . Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 0hrs) Sites 0 26 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 150,75,0\ longLabel Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 0hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 26\ shortLabel Affy H3K27me3 RA 0h\ subGroups factor=H3K27me3 time=0h\ track encodeAffyChIpHl60SitesH3K27me3Hr00\ encodeAffyRnaHl60SitesHr02IntergenicProximal Affy Ig Prx HL60 2h bed 4 . Affymetrix Intergenic Proximal HL60 Retinoic 2hr Transfrags 0 26 149 173 43 202 214 149 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 149,173,43\ longLabel Affymetrix Intergenic Proximal HL60 Retinoic 2hr Transfrags\ parent encodeNoncodingTransFrags\ priority 26\ shortLabel Affy Ig Prx HL60 2h\ subGroups region=intergenicProximal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr02IntergenicProximal\ encodeAffyEc51GM06990Signal EC51 Sgnl GM0699 wig 0 62385 Affy Ext Trans Signal (51-base window) (GM06990) 0 26 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 0 color 0,0,205\ longLabel Affy Ext Trans Signal (51-base window) (GM06990)\ parent encodeAffyEcSignal\ priority 26\ shortLabel EC51 Sgnl GM0699\ track encodeAffyEc51GM06990Signal\ encodeAffyEc51GM06990Sites EC51 Site GM0699 bed 3 . Affy Ext Trans Sites (51-base window) (GM06990) 0 26 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 1 color 0,0,205\ longLabel Affy Ext Trans Sites (51-base window) (GM06990)\ parent encodeAffyEcSites\ priority 26\ shortLabel EC51 Site GM0699\ track encodeAffyEc51GM06990Sites\ ensemblGeneScaffold Ensembl Assembly bed 6 + Ensembl Gene Scaffold assembly 0 26 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the mapping of Ensembl gene scaffold coordinates onto\ the underlying assembly contigs or scaffolds. Items are displayed in black \ and gray. Gray items indicate segments of the gene scaffold that do not map \ in linear order to the underlying contig or scaffold.\

\

\ These gene scaffolds were generated by Ensembl.

\ \

Methods

\

\ This track is created from the Ensembl MySQL tables: assembly.txt\ and seq_region.txt.\

\

\ For a description of the methods used in Ensembl gene prediction, refer to \ Hubbard, T. et al. (2002) in the References section below.

\

\ \

Credits

\

\ Thanks to Ensembl for providing this annotation.

\ \

References

\

\ Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J,\ Curwen V, Down T, et al. \ The Ensembl genome database project.\ Nucleic Acids Res. 2002 Jan 1;30(1):38-41.

\ map 1 color 0,0,0\ group map\ longLabel Ensembl Gene Scaffold assembly\ priority 26\ shortLabel Ensembl Assembly\ track ensemblGeneScaffold\ type bed 6 +\ useScore 1\ visibility hide\ pGC GC Samples sample GC Percent Sample Track 0 26 0 0 0 127 127 127 0 0 0 map 0 group map\ longLabel GC Percent Sample Track\ priority 26\ shortLabel GC Samples\ track pGC\ type sample\ visibility hide\ encodeStanfordPromoters Stanf Promoter bed 9 + Stanford Promoter Activity 0 26 0 0 0 127 127 127 0 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays activity levels of 643 putative promoter fragments\ in the ENCODE regions, based on high-throughput transient transfection\ luciferase reporter assays. The activity of each putative promoter is\ indicated by color, ranging from black (no activity) to red (strong\ activity). Each of the fragments was tested in a panel of 16 cell\ lines:\

\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Cell LineClassificationIsolated From
AGSgastric adenocarcinomastomach
BE(2)-Cneuroblastomabrain (metastatic, from bone marrow)
T98G (CRL-1690)glioblastomabrain
G-402renal leiomyoblastomakidney
HCT 116colorectal carcinomacolon
HMCBmelanomaskin
HT-1080fibrosarcomaconnective tissue
SK-N-SH (HTB-11)neuroblastomabrain (metastatic, from bone marrow)
HeLaadenocarcinomacervix
HepG2hepatocellular carcinomaliver
JEG-3choriocarcinomaplacenta
MG-63osteosarcomabone
MRC-5fibroblastlung
PANC-1epithelioid carcinomapancreas (duct)
SNU-182hepatocellular carcinomaliver
U-87 MGglioblastoma-astrocytomabrain
\
\

\ \

Methods

\

\ Promoters in the ENCODE region were predicted using a variation on methods\ previously described (Trinklein et al., 2003, Trinklein et \ al., 2004). Using BLAT alignments of human cDNAs in Genbank to the \ genome, those with at least one bp of exon overlap were merged,\ generating gene models. The transcription start sites were predicted\ by assigning the 5' end of each gene model as one transcription start\ site and alternative 5' ends that were at least 500 bp downstream and\ supported by full-length cDNAs as other start sites. Promoters were \ defined as the regions approximately 600 bp upstream and 100 bp\ downstream of each transcription start site. \

\

\ Primer3 was used to pick primers yielding approximately 500 bp\ amplicons containing the predicted transcription start site. Each\ fragment of DNA represented in this track was cloned into a\ luciferase reporter vector (pGL3-Basic, Promega) using the BD\ Clontech Infusion Cloning System. The Dual Luciferase system\ (Promega) was used to co-transfect the experimental DNA along with a\ control plasmid expressing Renilla - to control for variation in \ transcription efficiency - in 96-well format into one of the sixteen \ cell types using FuGENE Transfection Reagent (Roche). Each\ transfection was done in duplicate. \

\

\ Data are reported as normalized and log2 transformed averages of the\ Luciferase/Renilla ratio. This normalization was based on the\ activity of 102 random genomic fragments (negative controls) derived \ from exons and intergenic regions. Such a normalization allows\ for a meaningful comparison between cell types. The average log transformed \ Luciferase/Renilla ratio was scaled linearly to create a score where the\ maximum value is 1000 and the minimum value is 0. This score is arbitrary\ and for visualization purposes only; the raw ratio values should be used\ for all analyses. \

\ \

Verification

\

\ Data were verified by repeating the preparation and measurement of\ 48 random fragments. No significant variation between the two\ preparations was detected.\

\

\ A spreadsheet containing the negative control data can be downloaded \ here.\

\ \

Credits

\

This\ work was done in collaboration at the \ Myers Lab at Stanford University (now at HudsonAlpha Institute for Biotechnology). The following people contributed: Sara J. Cooper, Nathan D. Trinklein, Elizabeth D. Anton, Loan Nguyen, and Richard M. Myers. \

\ \

References

\ Cooper SJ, Trinklein ND, Anton ED, Nguyen L, Myers RM.\ Comprehensive analysis of transcriptional promoter structure \ and function in 1% of the human genome.\ Genome Res. 2006 Jan;16(1):1-10. Epub 2005 Dec 12.\

\ \

Trinklein ND, Aldred SJ, Saldanha AJ, Myers RM.\ Identification and functional analysis of human transcriptional \ promoters.\ Genome Res. 2003 Feb;13(2):308-12.\

\ \

\ Trinklein ND, Aldred SF, Hartman SJ, Schroeder DI, Otillar RP,\ Myers RM.\ An abundance of bidirectional promoters in the human genome.\ Genome Res. 2004 Jan;14(1):62-6. \

\ encodeTxLevels 1 chromosomes chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ exonArrowsDense on\ group encodeTxLevels\ itemRgb on\ longLabel Stanford Promoter Activity\ origAssembly hg16\ priority 26.0\ shortLabel Stanf Promoter\ track encodeStanfordPromoters\ type bed 9 +\ visibility hide\ encodeAffyChIpHl60PvalH3K27me3Hr02 Affy H3K27me3 RA 2h wig 0.0 534.54 Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 2hrs) P-Value 0 27 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 150,75,0\ longLabel Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 2hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 27\ shortLabel Affy H3K27me3 RA 2h\ subGroups factor=H3K27me3 time=2h\ track encodeAffyChIpHl60PvalH3K27me3Hr02\ encodeAffyRnaHl60SitesHr08IntergenicProximal Affy Ig Prx HL60 8h bed 4 . Affymetrix Intergenic Proximal HL60 Retinoic 8hr Transfrags 0 27 155 167 55 205 211 155 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 155,167,55\ longLabel Affymetrix Intergenic Proximal HL60 Retinoic 8hr Transfrags\ parent encodeNoncodingTransFrags\ priority 27\ shortLabel Affy Ig Prx HL60 8h\ subGroups region=intergenicProximal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr08IntergenicProximal\ encodeAffyEc1HepG2Signal EC1 Sgnl HepG2 wig 0 62385 Affy Ext Trans Signal (1-base window) (HepG2) 0 27 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 0 color 0,0,205\ longLabel Affy Ext Trans Signal (1-base window) (HepG2)\ parent encodeAffyEcSignal\ priority 27\ shortLabel EC1 Sgnl HepG2\ track encodeAffyEc1HepG2Signal\ encodeAffyEc1HepG2Sites EC1 Sites HepG2 bed 3 . Affy Ext Trans Sites (1-base window) (HepG2) 0 27 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 1 color 0,0,205\ longLabel Affy Ext Trans Sites (1-base window) (HepG2)\ parent encodeAffyEcSites\ priority 27\ shortLabel EC1 Sites HepG2\ track encodeAffyEc1HepG2Sites\ hiSeqDepth Hi Seq Depth bed 3 Regions of Exceptionally High Depth of Aligned Short Reads 0 27 139 69 19 197 162 137 0 0 0

Description

\

\ This track displays regions of the reference genome that have exceptionally high\ sequence depth, inferred from alignments of short-read sequences from the\ 1000 Genomes Project.\ These regions may be caused by collapsed repetitive sequences\ in the reference genome assembly; they also have high read depth in assays such as\ ChIP-seq, and may trigger false positive calls from peak-calling algorithms.\ Excluding these regions from analysis of short-read alignments should reduce\ such false positive calls.\

\ \

Methods

\

\ Pickrell et al. downloaded sequencing reads for 57 Yoruba individuals\ from the 1000 Genomes Project's low-coverage pilot data, mapped them to the\ Mar. 2006 human genome assembly (NCBI36/hg18), computed the read depth for\ every base in the genome, and compiled a distribution of read depths.\ They then identified contiguous regions where read depth exceeded thresholds\ corresponding to the top 0.001, 0.005, 0.01, 0.05 and 0.1 of the per-base \ read depths, merging regions which fall within 50 bases of each other.\ The regions are available for download from\ http://eqtl.uchicago.edu/Masking/\ (see the\ readme file).\

\ \

Credits

\

\ Thanks to Joseph Pickrell at the University of Chicago for these data.\

\ \

References

\

\ Pickrell JK, Gaffney DJ, Gilad Y, Pritchard JK.\ \ False positive peaks in ChIP-seq and other sequencing-based\ functional assays caused by unannotated high copy number regions.\ Bioinformatics. 2011 Aug 1;27(15):2144-6. Epub 2011 Jun 19.\

\ map 1 altColor 0,0,0\ color 139,69,19\ compositeTrack on\ group map\ longLabel Regions of Exceptionally High Depth of Aligned Short Reads\ priority 27\ shortLabel Hi Seq Depth\ track hiSeqDepth\ type bed 3\ visibility hide\ ncbiIncidentDb NCBI Incident bigBed 4 + NCBI Incident database 0 27 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=$$

Description

\

\ This track indicates locations in this genome assembly where assembly\ problems have been noted or resolved. The data is taken directly from the\ Genome Reference Consortium. This track updates\ once a day to catch up with new issues.\

\

\ If you would like to report an assembly issue with this genome assembly,\ please use the Genome Reference Consortium\ issue reporting system.\

\

Methods

\

\ Data for this track is extracted from the Genome Reference Consortium\ \ incident database. This data is checked once a day for updates.\ The track will include any new updates on a daily basis.\

\

Credits

\

The data and presentation of this track were prepared by\ Hiram Clawson.\

\ map 1 group map\ longLabel NCBI Incident database\ nonBedFieldsLabel Summary information from incident database:\ priority 27\ shortLabel NCBI Incident\ track ncbiIncidentDb\ type bigBed 4 +\ url http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=$$\ urlLabel NCBI Incident:\ visibility hide\ encodeStanfordRtPcr Stanf RTPCR bed 5 + Stanford Endogenous Transcript Levels in HCT116 Cells 0 27 0 0 0 127 127 127 1 0 17 chr1,chr11,chr13,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays absolute transcript copy numbers for 136 genes and\ 12 negative control intergenic regions, determined by RTPCR in HCT116 cells.\ \

Display Conventions and Configuration

\

\ The genomic regions are indicated by solid blocks. The shade of an item gives a \ rough indication of its count, ranging from light gray for zero to black for a \ count of 7000 or greater. To display only those items that exceed a specific \ unnormalized score, enter a minimum score between 0 and 1000 in the text box at \ the top of the track description page.

\ \

Methods

\

\ Total RNA was prepared in quadruplicate from HCT116 cells grown in\ culture. cDNA was prepared as described in Trinklein et\ al. (2004). Duplicate primer pairs were designed to each gene, and the\ absolute number of cDNA molecules containing each amplicon were\ determined by real-time PCR. The submitted data are the calculated\ number of molecules of each transcript containing the defined\ amplicon.

\ \

Verification

\

\ Four biological replicates were performed, and two primer pairs were\ used to measure the abundance of each transcript.

\ \

Credits

\

\ These data were generated in the Richard M. Myers lab \ at Stanford University (now at \ HudsonAlpha Institute for Biotechnology).\ \

References

\

\ Trinklein, N.D., Chen, W.C., Kingston, R.E. and Myers, R.M. \ Transcriptional regulation and binding of HSF1 and HSF2\ to 32 human heat shock genes during thermal stress and\ differentiation.\ Cell Stress Chaperones 9(1), 21-28 (2004).

\ encodeTxLevels 1 chromosomes chr1,chr11,chr13,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX\ dataVersion ENCODE June 2005 Freeze\ group encodeTxLevels\ longLabel Stanford Endogenous Transcript Levels in HCT116 Cells\ origAssembly hg16\ priority 27.0\ shortLabel Stanf RTPCR\ track encodeStanfordRtPcr\ type bed 5 +\ useScore 1\ visibility hide\ nextNcbiIncidentDb TBD NCBIIncident bigBed 4 + In Progress NCBI Incident database 0 27 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=$$ map 1 group map\ longLabel In Progress NCBI Incident database\ nonBedFieldsLabel Summary information from incident database:\ priority 27\ shortLabel TBD NCBIIncident\ track nextNcbiIncidentDb\ type bigBed 4 +\ url http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=$$\ urlLabel NCBI Incident:\ visibility hide\ encodeAffyChIpHl60SitesH3K27me3Hr02 Affy H3K27me3 RA 2h bed 3 . Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 2hrs) Sites 0 28 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 150,75,0\ longLabel Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 2hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 28\ shortLabel Affy H3K27me3 RA 2h\ subGroups factor=H3K27me3 time=2h\ track encodeAffyChIpHl60SitesH3K27me3Hr02\ encodeAffyRnaHl60SitesHr32IntergenicProximal Affy Ig Prx HL60 32h bed 4 . Affymetrix Intergenic Proximal HL60 Retinoic 32hr Transfrags 0 28 161 161 67 208 208 161 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 161,161,67\ longLabel Affymetrix Intergenic Proximal HL60 Retinoic 32hr Transfrags\ parent encodeNoncodingTransFrags\ priority 28\ shortLabel Affy Ig Prx HL60 32h\ subGroups region=intergenicProximal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr32IntergenicProximal\ encodeAffyEc51HepG2Signal EC51 Sgnl HepG2 wig 0 62385 Affy Ext Trans Signal (51-base window) (HepG2) 0 28 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 0 color 0,0,205\ longLabel Affy Ext Trans Signal (51-base window) (HepG2)\ parent encodeAffyEcSignal\ priority 28\ shortLabel EC51 Sgnl HepG2\ track encodeAffyEc51HepG2Signal\ encodeAffyEc51HepG2Sites EC51 Site HepG2 bed 3 . Affy Ext Trans Sites (51-base window) (HepG2) 0 28 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 1 color 0,0,205\ longLabel Affy Ext Trans Sites (51-base window) (HepG2)\ parent encodeAffyEcSites\ priority 28\ shortLabel EC51 Site HepG2\ track encodeAffyEc51HepG2Sites\ humanParalog Human Paralog bed 5 + Human Paralogs Using Fgenesh++ Gene Predictions 0 28 0 100 0 255 240 200 1 0 0 map 1 altColor 255,240,200\ color 0,100,0\ group map\ longLabel Human Paralogs Using Fgenesh++ Gene Predictions\ priority 28\ shortLabel Human Paralog\ spectrum on\ track humanParalog\ type bed 5 +\ visibility hide\ encodeYaleMASPlacRNATransMap Yale MAS RNA bedGraph 4 Yale Maskless Array Synthesizer, RNA Transcript Map 0 28 0 0 0 127 127 127 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22,

Description

\

\ This track shows the forward (+) and reverse (-) strand transcript map of \ intensity scores (estimating RNA abundance) for human NB4 cell total RNA,\ and human placental Poly(A)+ RNA, hybridized \ to the Yale MAS (Maskless Array Synthesizer) ENCODE oligonucleotide \ microarray, transcription mapping design #1. This array has 36-mer \ oligonucleotide probes approximately every 36 bp (i.e. \ end-to-end) covering all the non-repetitive DNA sequence of the ENCODE \ regions ENm001-ENm012. See NCBI\ GEO GPL2105 for details of this array design.

\

\ This transcript map is a combined signal from three biological replicates, \ each with at least two technical replicates. Arrays were hybridized using \ either the standard Nimblegen protocol or the protocol described in Bertone \ et al. (2004). The label of each subtrack in this annotation \ indicates the specific protocol used for that particular data set.

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\ \

Methods

\

\ A score was assigned to each oligonucleotide probe position by combining \ two or more technical replicates and by using a sliding window \ approach. Within a sliding window of 160 bp (corresponding to 5 \ oligos), the hybridization intensities for all replicates of each \ oligonucleotide probe were compared to their respective array median \ score. Within the window and across all the replicates, the number of \ probes above and below their respective median were counted. Using the \ sign test, a one-sided P-value was then calculated and a score defined \ as score=-log(P-value) was assigned to the oligo in the center of \ the window.

\

\ Three independent biological replicates were generated and each was \ hybridized to at least 2 different arrays (technical replicates).

\ \

Verification

\

\ Reasonable correlation coefficients between replicates were ensured. \ Additionally, transcribed regions (TARs/transfrags) were called and \ compared between technical and biological replicates to ensure \ significant overlap.

\ \

Credits

\

\ These data were generated and analyzed by the labs of Michael Snyder, \ Mark Gerstein and Sherman Weissman at Yale University.

\ \

References

\

\ Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., \ Rinn, J.L., Tongprasit, W., Samanta, M. et al.\ Global identification of human transcribed sequences with \ genome tiling arrays. \ Science 306(5705), 2242-6 (2004).

\

\ Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., \ Long, J., Stern, D., Tammana, H. et al.\ Transcriptional maps of 10 human chromosomes at 5-nucleotide \ resolution. \ Science 308(5725), 1149-54 (2005).

\

\ Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., \ Fodor, S.P. and Gingeras, T.R.\ Large-scale transcriptional activity in chromosomes 21 and \ 22. \ Science 296(5569), 916-9 (2002).

\

\ Kluger, Y., Tuck, D.P., Chang, J.T., Nakayama, Y., Poddar, R., Kohya, N., \ Lian, Z., Ben Nasr, A., Halaban, H.R. et al.\ Lineage specificity of gene expression patterns. \ Proc Natl Acad Sci U S A 101(17), 6508-13 (2004).

\

\ Rinn, J.L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N.M., \ Hartman, S., Harrison, P.M., Nelson, F.K., Miller, P. et al.\ The transcriptional activity of human Chromosome 22. \ Genes Dev 17(4), 529-40 (2003).

\ encodeTxLevels 0 autoScale Off\ chromosomes chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeTxLevels\ longLabel Yale Maskless Array Synthesizer, RNA Transcript Map\ maxHeightPixels 128:16:16\ maxLimit 10.536\ minLimit 0\ origAssembly hg16\ priority 28.0\ shortLabel Yale MAS RNA\ superTrack encodeYaleRnaSuper dense\ track encodeYaleMASPlacRNATransMap\ type bedGraph 4\ viewLimits 0:11\ visibility hide\ encodeYaleRnaSuper Yale RNA Yale RNA (Neutrophil, Placenta and NB4 cells) 0 28 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Overview

\

\ This super-track combines related tracks from Yale Transcript Map analysis.\ These tracks contain transcriptome data from different\ cell lines and biological samples as well as analysis of transcriptionally\ active regions (TARs).\

\ Experiments were performed with Yale MAS (Maskless Array Synthesizer)\ ENCODE oligonucleotide microarray (see NCBI\ GEO GPL2105 for details of this array design) as well as\ the Affymetrix ENCODE oligonucleotide microarray. Multiple biological samples \ were assayed, such as total RNA from human NB4 cells. Experiments also included \ chemical treatments such as retinoic acid (RA) treatments.\ \

Credits

\

Yale MAS RNA, Yale MAS TAR

\

\ These data were generated and analyzed by the the labs of Michael Snyder,\ Mark Gerstein and Sherman Weissman at Yale University.

\ \

Yale RNA, Yale TAR

\

\ These data were generated and analyzed by the Yale/Affymetrix\ collaboration among the labs of Michael Snyder, Mark Gerstein and\ Sherman Weissman at Yale University and Tom Gingeras at Affymetrix.

\ \

Yale RACE

\

\ These data were generated and analyzed by the lab of Mark Gerstein \ at Yale University.

\ \

References

\

\ Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X,\ Rinn JL, Tongprasit W, Samanta M et al.\ Global identification of human transcribed sequences with\ genome tiling arrays.\ Science. 2004 Dec 24;306(5705):2242-6.

\

\ Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S,\ Long J, Stern D, Tammana H et al.\ Transcriptional maps of 10 human chromosomes at 5-nucleotide\ resolution.\ Science. 2005 May 20;308(5725):1149-54.

\

\ Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL,\ Fodor SP, Gingeras TR.\ Large-scale transcriptional activity in chromosomes 21 and 22.\ Science. 2002 May 3;296(5569):916-9.

\

\ Kluger Y, Tuck DP, Chang JT, Nakayama Y, Poddar R, Kohya N,\ Lian Z, Ben Nasr A, Halaban HR et al.\ Lineage specificity of gene expression patterns.\ Proc Natl Acad Sci U S A. 2004 Apr 27;101(17):6508-13.

\

\ Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM,\ Hartman S, Harrison PM, Nelson FK, Miller P et al.\ The transcriptional activity of human Chromosome 22.\ Genes Dev. 2003 Feb 15;17(4):529-40.

\ encodeTxLevels 0 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ group encodeTxLevels\ longLabel Yale RNA (Neutrophil, Placenta and NB4 cells)\ priority 28\ shortLabel Yale RNA\ superTrack on\ track encodeYaleRnaSuper\ encodeAffyChIpHl60PvalH3K27me3Hr08 Affy H3K27me3 RA 8h wig 0.0 534.54 Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 8hrs) P-Value 0 29 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 150,75,0\ longLabel Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 8hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 29\ shortLabel Affy H3K27me3 RA 8h\ subGroups factor=H3K27me3 time=8h\ track encodeAffyChIpHl60PvalH3K27me3Hr08\ encodeAffyEc1K562Signal EC1 Sgnl K562 wig 0 62385 Affy Ext Trans Signal (1-base window) (K562) 0 29 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 0 color 0,0,205\ longLabel Affy Ext Trans Signal (1-base window) (K562)\ parent encodeAffyEcSignal\ priority 29\ shortLabel EC1 Sgnl K562\ track encodeAffyEc1K562Signal\ encodeAffyEc1K562Sites EC1 Sites K562 bed 3 . Affy Ext Trans Sites (1-base window) (K562) 0 29 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 1 color 0,0,205\ longLabel Affy Ext Trans Sites (1-base window) (K562)\ parent encodeAffyEcSites\ priority 29\ shortLabel EC1 Sites K562\ track encodeAffyEc1K562Sites\ celeraCoverage WSSD Coverage bed 4 . Regions Assayed for SDD 0 29 0 0 0 127 127 127 0 0 0

Description

\

\ This track represents coverage of clones that were assayed for \ segmental duplications using high-depth Celera reads. Absent regions were \ not assessed by this version of the Segmental Duplication Database (SDD). \ For a description of the whole-genome shotgun sequence detection (WSSD)\ "fuguization" method, see Bailey, J.A. et al. (2001) in \ the References section below.

\ \

Credits

\

\ The data were provided by \ Xinwei She \ and Evan Eichler as part of their\ effort to map human paralogy at the \ University of Washington.

\ \

References

\

\ Bailey, J.A., et al., \ Recent segmental duplications in the human genome. \ Science 297(5583), 945-7 (2002).

\

\ Bailey, J.A., et al., \ Segmental duplications: organization and impact within the \ current human genome project assembly, Genome Res. 11(6), \ 1005-17 (2001).

\

\ She, X., et al., \ Shotgun sequence assembly and recent segmental duplications \ within the human genome. Nature 431(7011), 927-30 (2004).\

\ map 1 group map\ longLabel Regions Assayed for SDD\ priority 29\ shortLabel WSSD Coverage\ track celeraCoverage\ type bed 4 .\ visibility hide\ encodeYaleAffyNB4RARNATarsIntergenicProximal Yale Ig Prx NB4 RA bed 4 . Yale Intergenic Proximal NB4 Retinoic TARs 0 29 167 155 79 211 205 167 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 167,155,79\ longLabel Yale Intergenic Proximal NB4 Retinoic TARs\ parent encodeNoncodingTransFrags\ priority 29\ shortLabel Yale Ig Prx NB4 RA\ subGroups region=intergenicProximal celltype=nb4 source=yale\ track encodeYaleAffyNB4RARNATarsIntergenicProximal\ encodeYaleMASPlacRNATars Yale MAS TAR bed 6 . Yale Maskless Array Synthesizer, RNA Transcriptionally Active Regions 0 29 0 0 0 127 127 127 0 0 8 chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22,

Description

\

\ This track shows the locations of forward (+) and reverse (-) strand \ transcriptionally-active regions (TARs)/transcribed fragments \ (transfrags), for human NB4 cell total RNA and for\ human placenta Poly(A)+ RNA, hybridized to the Yale \ Maskless Array Synthesizer (MAS) ENCODE oligonucleotide microarray, \ transcription mapping design #1. This array has 36-mer oligonucleotide probes \ approximately every 36 bp (i.e. end-to-end) covering all the \ non-repetitive DNA sequence of the ENCODE regions ENm001 - ENm012. See \ NCBI GEO accession \ GPL2105 for details of this array design.

\

\ These TARs/transfrags are based on a transcript map combining \ hybridization intensities from three biological replicates, each with at \ least two technical replicates. Arrays were hybridized using either\ Nimblegen standard protocol, or the protocol described in Bertone \ et al. (2004). The label of each subtrack in this annotation \ indicates the specific protocol used for that particular data set.

\ \

Methods

\

\ A score was assigned to each oligonucleotide probe position by combining \ two or more technical replicates and by using a sliding window \ approach. Within a sliding window of 160 bp (corresponding to 5 \ oligos), the hybridization intensities for all replicates of each \ oligonucleotide probe were compared to their respective array median \ intensity. Within the window and across all the replicates, the number \ of probes above and below their respective median was counted. Using \ the sign test, a one-sided P-value was then calculated and a score \ defined as score=-log(p-value) was assigned to the oligo in the \ center of the window.

\

\ Three independent biological replicates were generated, and each was \ hybridized to at least two different arrays (technical replicates). \ Transcribed regions (TARs/transfrags) were then identified using a score \ threshold of 95th percentile as well as a maximum gap of 80 bp and a \ minimum run of 50 bp (between oligonucleotide positions), effectively \ allowing a gap of one oligo and demanding the TAR/transfrag to \ encompass at least 3 oligos.

\ \

Verification

\

\ Transcribed regions (TARs/transfrags), as determined by individual biological \ samples, were compared to ensure significant overlap.

\ \

Credits

\

\ These data were generated and analyzed by the the labs of Michael Snyder, \ Mark Gerstein and Sherman Weissman at Yale University.

\ \

References

\

\ Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, \ Gingeras TR,\ \ Large-scale transcriptional activity in chromosomes 21 and 22, \ Science. 2002 May 3;296(5569):916-9.\ \

\ Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S, \ Harrison PM, Nelson FK, Miller P, Gerstein M, Weissman S, Snyder M, \ \ The transcriptional activity of human Chromosome 22, \ Genes Dev, 2003 Feb 15;17(4):529-40.\ \

\ Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, \ Tongprasit W, Samanta M, Weissman S, Gerstein M, Snyder M,\ \ Global identification of human transcribed sequences with genome tiling arrays, \ Science. 2004 Dec 24;306(5705):2242-6. Epub 2004 Nov 11.\ \

\ Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, \ Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, \ Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR,\ \ Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution, \ Science. 2005 May 20;308(5725):1149-54. Epub 2005 Mar 24.\ encodeTxLevels 1 chromosomes chr5,chr7,chrX,chr11,chr16,chr19,chr21,chr22\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeTxLevels\ longLabel Yale Maskless Array Synthesizer, RNA Transcriptionally Active Regions\ origAssembly hg16\ priority 29.0\ shortLabel Yale MAS TAR\ superTrack encodeYaleRnaSuper dense\ track encodeYaleMASPlacRNATars\ type bed 6 .\ visibility hide\ gad GAD View bed 4 Genetic Association Studies of Complex Diseases and Disorders 0 29.5 200 0 0 227 127 127 0 0 0 http://geneticassociationdb.nih.gov/cgi-bin/tableview.cgi?table=allview&cond=gene=

Disclaimer

\

\ The Genetic Association Database (GAD) is intended for use primarily by \ medical scientists and other professionals concerned with genetic disorders, \ by genetics researchers, and by advanced students in science \ and medicine. While the GAD database is open to the public, \ users seeking information about a personal medical or \ genetic condition are urged to consult with a qualified \ physician for diagnosis and for answers to personal questions.\ These data are provided by the GAD\ and do not represent any additional curation by UCSC.

\ \

Description

\

\ The \ Genetic Association Database is an archive of human genetic \ association studies of complex diseases and disorders. The goal \ of the database is to allow the user to rapidly identify medically \ relevant polymorphism from the large volume of polymorphism and \ mutational data, in the context of standardized nomenclature.\

\

If the track is displayed in "pack" or "full" mode, \ mousing over an entry of this track will show a pop-up message listing all \ associated diseases. \ In "full" mode, each feature is labeled with the associated disease \ class code (as defined below). \

\ \

Methods

\

\ Study data are recorded in the context of official human gene \ nomenclature with additional molecular reference numbers and links. The data\ are gene-centered; that is, each record is based on a gene or marker. \ For example, if a study investigated six genes for a particular disorder, \ there will be six records. Gene information is standardized and annotated with \ molecular information, enabling integration with other molecular and genomic \ data resources.\

\ \

Data

\

\ Data are added to GAD on a periodic\ basis by the curator or investigators. A majority of the records in GAD\ are extracted from the online \ HuGE Navigator \ database, which is sponsored by the Centers for Disease Control and \ Prevention. HuGE Navigator\ provides access to a continuously updated, curated knowledge base of\ gene-disease associations, meta-analyses, and related information on genes\ and diseases extracted from NCBI PubMed. A gene-centered view is available via\ Genopedia, which is also available as \ HuGETrack.

\ \ \

Contacts

\

\ For more information on this dataset, contact \ Kevin G. Becker, PhD,\ \ Yongqing Zhang, PhD, and John Garner, MS, \ from the DNA Array Unit, NIA, NIH.\

\ \

References

\

\ Becker KG, Barnes KC, Bright TJ, Wang AS. \ The Genetic Association Database. \ Nature Genetics 2004 May; 36(5):431-432.\

\ phenDis 1 color 200,0,0\ group phenDis\ longLabel Genetic Association Studies of Complex Diseases and Disorders\ priority 29.5\ shortLabel GAD View\ track gad\ type bed 4\ url http://geneticassociationdb.nih.gov/cgi-bin/tableview.cgi?table=allview&cond=gene=\ visibility hide\ decipher DECIPHER bed 4 DECIPHER: Chromosomal Imbalance and Phenotype in Humans 0 29.6 0 0 0 127 127 127 0 0 0

Description

\ \
\

NOTE:
\ While the DECIPHER database is \ open to the public, users seeking information about a personal medical or\ genetic condition are urged to consult with a qualified physician for\ diagnosis and for answers to personal questions.\

\

Because the UCSC Genes mappings are based on associations from RefSeq and\ UniProt, they are dependent on any interpretations from those sources.\ Furthermore, because many DECIPHER records refer to multiple gene names,\ or syndromes not tightly mapped to individual genes, the associations\ in this track should be treated with skepticism and any conclusions\ based on them should be carefully scrutinized using independent\ resources.\

\

UCSC is authorized to present these data only via display in the\ Browser, and not for bulk download. Access to bulk data may be obtained\ directly from DECIPHER and is subject to a Data Access Agreement, in which\ the user certifies that no attempt to identify individual patients will\ be undertaken. The same restrictions apply to the public data displayed\ at UCSC: No one is authorized to attempt to identify patients by any\ means.\

\
\ \

\ The \ DECIPHER database of submicroscopic chromosomal imbalance \ collects clinical information about chromosomal \ microdeletions/duplications/insertions, translocations and inversions, \ and displays this information on the human genome map.\

\ This track shows genomic regions of reported cases and their \ associated phenotype information. All data have passed the strict\ consent requirements of the DECIPHER project and are approved for\ unrestricted public release. Clicking the Patient View ID link\ brings up a more detailed informational page on the patient at the \ DECIPHER web site. \

Method

\

\ Data provided by the DECIPHER project group are imported and processed \ to create a simple BED track to annotate the genomic regions associated \ with individual patients. \

\

\ The entries are colored red for deletions \ (mean log ratio < 0) and blue for duplications \ (mean log ratio > 0).\
Note that the color scheme changed in March, 2011.\ \

Contact

\

\ For more information on DECIPHER, please contact\ \ decipher@sanger.\ ac.\ uk.\

\ \

References

\

\ Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, Van Vooren S, Moreau Y, Pettett RM, Carter NP.\ \ DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources.\ Am J Hum Genet. 2009 Apr;84(4):524-33.\ (Cambridge University Department of Medical Genetics, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK.\ \ hvf21@cam.\ ac.\ uk.)\

\ \ phenDis 1 color 0,0,0\ group phenDis\ longLabel DECIPHER: Chromosomal Imbalance and Phenotype in Humans\ nextExonText Right edge\ prevExonText Left edge\ priority 29.6\ shortLabel DECIPHER\ tableBrowser off decipherRaw knownToDecipher knownCanonToDecipher\ track decipher\ type bed 4\ visibility hide\ omimAvSnp OMIM AV SNPs bed 4 OMIM Allelic Variant SNPs 0 29.71 0 80 0 127 167 127 0 0 0 http://www.omim.org/entry/

Description

\ \
\

NOTE:
\ OMIM is intended for use primarily by physicians and other\ professionals concerned with genetic disorders, by genetics researchers, and\ by advanced students in science and medicine. While the OMIM database is\ open to the public, users seeking information about a personal medical or\ genetic condition are urged to consult with a qualified physician for\ diagnosis and for answers to personal questions. Further, please be\ sure to click through to omim.org for the very latest, as they are continually \ updating data.

\ \

NOTE ABOUT DOWNLOADS:
\ OMIM is the property \ of Johns Hopkins University and is not available for download or mirroring \ by any third party without their permission. Please see \ OMIM\ for downloads.

\
\ \ \

OMIM is a compendium of human genes and genetic phenotypes. The full-text,\ referenced overviews in OMIM contain information on all known Mendelian\ disorders and over 12,000 genes. OMIM is authored and edited at the\ McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University\ School of Medicine, under the direction of Dr. Ada Hamosh. This database\ was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog\ of Mendelian traits and disorders, entitled Mendelian Inheritance\ in Man (MIM).\

\ \

\ The OMIM data are separated into three separate tracks:\

\ \

OMIM Allelic Variant SNPs\
    Variants in the OMIM database that have associated \ dbSNP identifiers.\ \

OMIM Genes\
    The genomic positions of gene entries in the OMIM \ database. The coloring indicates the associated OMIM phenotype map key.\

\ \

OMIM Phenotypes - Gene Unknown\
    Regions known to be associated with a phenotype, \ but for which no specific gene is known to be causative. This track \ also includes known multi-gene syndromes.\

\ \
\ \ \

\ This track shows the allelic variants in the Online Mendelian Inheritance in Man\ (OMIM) database that have associated\ dbSNP identifiers.\

\ \

Display Conventions and Configuration

\ \

Genomic positions of OMIM allelic variant SNPs are marked by solid blocks, which appear\ as tick marks when zoomed out. \

The details page for each variant displays the allelic variant description, the amino\ acid replacement, and the dbSNP identifier, with a link to that SNP's details page in the\ "All SNPs (132)" track.\

\

The descriptions of OMIM entries are shown on the main browser display when Full display\ mode is chosen. In Pack mode, the descriptions are shown when mousing over each entry.\

\ \

Methods

\

\ This track was constructed as follows: \

\ \

Credits

\

\ Thanks to OMIM and NCBI for the use of their data. This track was constructed by Fan Hsu,\ Robert Kuhn, and Brooke Rhead of the UCSC Genome Bioinformatics Group.

\ \

References

\

Amberger J, Bocchini CA, Scott AF, Hamosh A.\ McKusick's Online Mendelian Inheritance in Man (OMIM®).\ Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6. Epub 2008 Oct 8.\

\

\ Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA.\ Online Mendelian Inheritance in Man (OMIM), a knowledgebase of\ human genes and genetic disorders.\ Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7.\

\ phenDis 1 color 0, 80, 0\ group phenDis\ hgsid on\ longLabel OMIM Allelic Variant SNPs\ priority 29.71\ shortLabel OMIM AV SNPs\ tableBrowser off omimAv omimAvRepl\ track omimAvSnp\ type bed 4\ url http://www.omim.org/entry/\ visibility hide\ omimGene2 OMIM Genes bed 4 OMIM Genes - Dark Green Are Disease-causing 0 29.73 0 80 0 127 167 127 0 0 0 http://www.omim.org/entry/

Description

\ \
\

NOTE:
\ OMIM is intended for use primarily by physicians and other\ professionals concerned with genetic disorders, by genetics researchers, and\ by advanced students in science and medicine. While the OMIM database is\ open to the public, users seeking information about a personal medical or\ genetic condition are urged to consult with a qualified physician for\ diagnosis and for answers to personal questions. Further, please be\ sure to click through to omim.org for the very latest, as they are continually \ updating data.

\ \

NOTE ABOUT DOWNLOADS:
\ OMIM is the property \ of Johns Hopkins University and is not available for download or mirroring \ by any third party without their permission. Please see \ OMIM\ for downloads.

\
\ \ \

OMIM is a compendium of human genes and genetic phenotypes. The full-text,\ referenced overviews in OMIM contain information on all known Mendelian\ disorders and over 12,000 genes. OMIM is authored and edited at the\ McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University\ School of Medicine, under the direction of Dr. Ada Hamosh. This database\ was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog\ of Mendelian traits and disorders, entitled Mendelian Inheritance\ in Man (MIM).\

\ \

\ The OMIM data are separated into three separate tracks:\

\ \

OMIM Allelic Variant SNPs\
    Variants in the OMIM database that have associated \ dbSNP identifiers.\ \

OMIM Genes\
    The genomic positions of gene entries in the OMIM \ database. The coloring indicates the associated OMIM phenotype map key.\

\ \

OMIM Phenotypes - Gene Unknown\
    Regions known to be associated with a phenotype, \ but for which no specific gene is known to be causative. This track \ also includes known multi-gene syndromes.\

\ \
\ \ \

\ This track shows the genomic positions of all gene entries in the Online Mendelian\ Inheritance in Man (OMIM) database.\

\ \

Display Conventions and Configuration

\ \

Genomic locations of OMIM gene entries are displayed as solid blocks. The entries are colored\ according to the associated OMIM phenotype map key (if any):\

\

Gene symbol and disease information, when available, are displayed on the details page for an\ item, and links to related RefSeq Genes and UCSC Genes are given.\

\

The descriptions of the OMIM entries are shown on the main browser display when Full display\ mode is chosen. In Pack mode, the descriptions are shown when mousing over each entry. Items\ displayed can be filtered according to phenotype map key on the track controls page. \

\ \

Methods

\

\ The mappings displayed in this track are based on OMIM gene entries, their Entrez Gene IDs, and\ the corresponding RefSeq Gene locations:\

\

\ *The locations in the refGene table are from alignments of RefSeq Genes to the reference\ genome using BLAT.\

\ \

Credits

\

\ Thanks to OMIM and NCBI for the use of their data. This track was\ constructed by Fan Hsu, Robert Kuhn, and Brooke Rhead of the UCSC Genome Bioinformatics Group.

\ \

References

\

Amberger J, Bocchini CA, Scott AF, Hamosh A. \ McKusick's Online Mendelian Inheritance in Man (OMIM®). \ Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6. Epub 2008 Oct 8.\

\

\ Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. \ Online Mendelian Inheritance in Man (OMIM), a knowledgebase of \ human genes and genetic disorders. \ Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7.\

\ phenDis 1 color 0, 80, 0\ group phenDis\ hgsid on\ longLabel OMIM Genes - Dark Green Are Disease-causing\ priority 29.73\ shortLabel OMIM Genes\ tableBrowser off omimGeneMap omimPhenotype omimGeneSymbol mim2gene\ track omimGene2\ type bed 4\ url http://www.omim.org/entry/\ visibility hide\ omimLocation OMIM Pheno Loci bed 4 OMIM Phenotypes - Gene Unknown 0 29.75 0 80 0 127 167 127 0 0 0 http://www.omim.org/entry/

Description

\ \
\

NOTE:
\ OMIM is intended for use primarily by physicians and other\ professionals concerned with genetic disorders, by genetics researchers, and\ by advanced students in science and medicine. While the OMIM database is\ open to the public, users seeking information about a personal medical or\ genetic condition are urged to consult with a qualified physician for\ diagnosis and for answers to personal questions. Further, please be\ sure to click through to omim.org for the very latest, as they are continually \ updating data.

\ \

NOTE ABOUT DOWNLOADS:
\ OMIM is the property \ of Johns Hopkins University and is not available for download or mirroring \ by any third party without their permission. Please see \ OMIM\ for downloads.

\
\ \ \

OMIM is a compendium of human genes and genetic phenotypes. The full-text,\ referenced overviews in OMIM contain information on all known Mendelian\ disorders and over 12,000 genes. OMIM is authored and edited at the\ McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University\ School of Medicine, under the direction of Dr. Ada Hamosh. This database\ was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog\ of Mendelian traits and disorders, entitled Mendelian Inheritance\ in Man (MIM).\

\ \

\ The OMIM data are separated into three separate tracks:\

\ \

OMIM Allelic Variant SNPs\
    Variants in the OMIM database that have associated \ dbSNP identifiers.\ \

OMIM Genes\
    The genomic positions of gene entries in the OMIM \ database. The coloring indicates the associated OMIM phenotype map key.\

\ \

OMIM Phenotypes - Gene Unknown\
    Regions known to be associated with a phenotype, \ but for which no specific gene is known to be causative. This track \ also includes known multi-gene syndromes.\

\ \
\ \ \

\ This track shows the cytogenetic locations of phenotype entries in the Online Mendelian\ Inheritance in Man (OMIM) database for which\ the gene is unknown.\

\ \

Display Conventions and Configuration

\ \

Cytogenetic locations of OMIM entries are displayed as solid\ blocks. The entries are colored according to the OMIM phenotype map key of associated disorders:\ \

\

Gene symbols and disease information, when available, are displayed on the details pages.\

\

The descriptions of OMIM entries are shown on the main browser display when Full display\ mode is chosen. In Pack mode, the descriptions are shown when mousing over each entry. Items\ displayed can be filtered according to phenotype map key on the track controls page.\

\ \

Methods

\

\ This track was constructed as follows: \

\ \

Credits

\

\ Thanks to OMIM and NCBI for the use of their data. This track was constructed by Fan Hsu,\ Robert Kuhn, and Brooke Rhead of the UCSC Genome Bioinformatics Group.

\ \

References

\

Amberger J, Bocchini CA, Scott AF, Hamosh A.\ McKusick's Online Mendelian Inheritance in Man (OMIM®).\ Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6. Epub 2008 Oct 8.\

\

\ Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA.\ Online Mendelian Inheritance in Man (OMIM), a knowledgebase of\ human genes and genetic disorders.\ Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7.\

\ phenDis 1 color 0, 80, 0\ group phenDis\ hgsid on\ longLabel OMIM Phenotypes - Gene Unknown\ priority 29.75\ shortLabel OMIM Pheno Loci\ tableBrowser off\ track omimLocation\ type bed 4\ url http://www.omim.org/entry/\ visibility hide\ cosmic COSMIC bed 4 COSMIC: Catalogue Of Somatic Mutations In Cancer 0 29.8 200 0 0 227 127 127 0 0 0 http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=mut_summary phenDis 1 color 200, 0, 0\ group phenDis\ hgsid on\ longLabel COSMIC: Catalogue Of Somatic Mutations In Cancer\ priority 29.80\ shortLabel COSMIC\ track cosmic\ type bed 4\ url http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=mut_summary\ visibility hide\ encodeAffyChIpHl60SitesH3K27me3Hr08 Affy H3K27me3 RA 8h bed 3 . Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 8hrs) Sites 0 30 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 150,75,0\ longLabel Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 8hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 30\ shortLabel Affy H3K27me3 RA 8h\ subGroups factor=H3K27me3 time=8h\ track encodeAffyChIpHl60SitesH3K27me3Hr08\ encodeAffyEc51K562Signal EC51 Sgnl K562 wig 0 62385 Affy Ext Trans Signal (51-base window) (K562) 0 30 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 0 color 0,0,205\ longLabel Affy Ext Trans Signal (51-base window) (K562)\ parent encodeAffyEcSignal\ priority 30\ shortLabel EC51 Sgnl K562\ track encodeAffyEc51K562Signal\ encodeAffyEc51K562Sites EC51 Site K562 bed 3 . Affy Ext Trans Sites (51-base window) (K562) 0 30 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 1 color 0,0,205\ longLabel Affy Ext Trans Sites (51-base window) (K562)\ parent encodeAffyEcSites\ priority 30\ shortLabel EC51 Site K562\ track encodeAffyEc51K562Sites\ celeraDupPositive WSSD Duplication bed 4 + Sequence Identified as Duplicate by High-Depth Celera Reads 0 30 0 0 0 127 127 127 0 0 0

Description

\

\ High-depth sequence reads from the Celera project were used to \ detect paralogy in the human genome reference sequence.\ This track shows confirmed segmental duplications, defined as having \ similarity to sequences in the Segmental Duplication Database (SDD) of\ greater than 90% over more than 250 bp of repeatmasked sequence.\ For a description of the whole-genome shotgun sequence detection (WSSD) \ "fuguization" method, see Bailey, J.A. et al. (2001) in \ the References section below.

\ \

Credits

\

\ The data were provided by \ Xinwei She \ and Evan Eichler as part of their\ efforts to map human paralogy at the \ University of Washington.

\ \

References

\

\ Bailey, J.A., et al., \ Recent segmental duplications in the human genome. \ Science 297(5583), 945-7 (2002).

\

\ Bailey, J.A., et al., \ Segmental duplications: organization and impact within the \ current human genome project assembly, Genome Res. 11(6), \ 1005-17 (2001).

\

\ She, X., et al., \ Shotgun sequence assembly and recent segmental duplications \ within the human genome. Nature 431(7011), 927-30 (2004).\

\ map 1 group map\ longLabel Sequence Identified as Duplicate by High-Depth Celera Reads\ priority 30\ shortLabel WSSD Duplication\ track celeraDupPositive\ type bed 4 +\ visibility hide\ encodeYaleAffyNB4TPARNATarsIntergenicProximal Yale Ig Prx NB4 TPA bed 4 . Yale Intergenic Proximal NB4 TPA-Treated TARs 0 30 173 149 91 214 202 173 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 173,149,91\ longLabel Yale Intergenic Proximal NB4 TPA-Treated TARs\ parent encodeNoncodingTransFrags\ priority 30\ shortLabel Yale Ig Prx NB4 TPA\ subGroups region=intergenicProximal celltype=nb4 source=yale\ track encodeYaleAffyNB4TPARNATarsIntergenicProximal\ encodeYaleAffyRNATransMap Yale RNA wig -2730 3394 Yale RNA Transcript Map (Neutrophil, Placenta and NB4 cells) 0 30 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows the transcript map of signal intensity (estimating RNA \ abundance) for the following, hybridized to the Affymetrix ENCODE \ oligonucleotide microarray:\

\

\ The human NB4 cell can be made to differentiate towards either monocytes (by\ treatment with TPA) or neutrophils (by treatment with RA). See Kluger\ et al., 2004 in the References section for more details about the\ differentiation of hematopoietic cells.

\

\ This array has 25-mer oligonucleotide probes tiled \ approximately every 22 bp, covering all the non-repetitive DNA sequence \ of the ENCODE regions. The transcript map is a combined signal for both \ strands of DNA. This is derived from the number of different biological \ samples indicated above, each with at least two technical replicates.

\

\ See the following NCBI GEO accessions for details of experimental protocols:\

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ "wiggle" tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\

\ Color differences among the subtracks are arbitrary. They provide a\ visual cue for distinguishing between the different data samples.

\ \

Methods

\

\ The data from technical replicates were median-scaled and quantile-normalized \ to each other. Using a 101 bp sliding window centered on \ each oligonucleotide probe, a signal map (estimating RNA abundance) was\ generated by computing the pseudomedian signal of all PM-MM pairs \ (median of pairwise PM-MM averages) within the window, including \ replicates. Biological replicate signal maps were combined by \ quantile-normalizing them between replicates and computing the median signal \ at each oligonucleotide probe location.

\ \

Verification

\

\ Independent biological replicates (as indicated above) were generated,\ and each was hybridized to at least two different arrays\ (technical replicates). Transcribed regions were then identified using a\ signal theshold of 90 percentile of signal intensities, as well as a maximum\ gap of 50 bp and a minimum run of 50 bp (between oligonucleotide positions).\ Transcribed regions, as determined by individual biological samples, were\ compared to ensure significant overlap.

\ \

Credits

\

\ This data was generated and analyzed by the Yale/Affymetrix \ collaboration between the labs of Michael Snyder, Mark Gerstein and \ Sherman Weissman at Yale University and Tom Gingeras at Affymetrix.

\ \

References

\

\ Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., \ Rinn, J.L., Tongprasit, W., Samanta, M. et al.\ Global identification of human transcribed sequences with \ genome tiling arrays. \ Science 306(5705), 2242-6 (2004).

\

\ Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., \ Long, J., Stern, D., Tammana, H. et al.\ Transcriptional maps of 10 human chromosomes at 5-nucleotide \ resolution. \ Science 308(5725), 1149-54 (2005).

\

\ Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., \ Fodor, S.P. and Gingeras, T.R.\ Large-scale transcriptional activity in chromosomes 21 and \ 22. \ Science 296(5569), 916-9 (2002).

\

\ Kluger, Y., Tuck, D.P., Chang, J.T., Nakayama, Y., Poddar, R., Kohya, N., \ Lian, Z., Ben Nasr, A., Halaban, H.R. et al.\ Lineage specificity of gene expression patterns. \ Proc Natl Acad Sci U S A 101(17), 6508-13 (2004).

\

\ Rinn, J.L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N.M., \ Hartman, S., Harrison, P.M., Nelson, F.K., Miller, P. et al.\ The transcriptional activity of human Chromosome 22. \ Genes Dev 17(4), 529-40 (2003).

\ encodeTxLevels 0 autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeTxLevels\ longLabel Yale RNA Transcript Map (Neutrophil, Placenta and NB4 cells)\ maxHeightPixels 128:16:16\ priority 30.0\ shortLabel Yale RNA\ spanList 1\ subGroup1 samples Sample summary=Summary samples=samples1-10\ subGroup2 celltype Cell_Type neutro=Neutrophil plac=Placenta nb4=NB4\ track encodeYaleAffyRNATransMap\ type wig -2730 3394\ viewLimits 0:150\ visibility hide\ windowingFunction mean\ celeraOverlay WSSD Overlay bed 4 + Celera WGS Assembly Overlay on Public Assembly 0 30.1 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows regions detected as overlays of Celera\ whole-genome shotgun sequence assembly on the public human \ assembly.

\ \

Credits

\

\ The data were provided by \ Xinwei She \ and Evan Eichler \ as part of their effort to map \ human paralogy at the \ University of Washington.

\ \

References

\

\ Bailey, J.A., et al., \ Recent segmental duplications in the human genome. \ Science 297(5583), 945-7 (2002).

\

\ Bailey, J.A., et al., \ Segmental duplications: organization and impact within the \ current human genome project assembly, Genome Res. 11(6), \ 1005-17 (2001).

\

\ She, X., et al., \ Shotgun sequence assembly and recent segmental duplications \ within the human genome. Nature 431(7011), 927-30 (2004).\

\ map 1 group map\ longLabel Celera WGS Assembly Overlay on Public Assembly\ priority 30.1\ shortLabel WSSD Overlay\ track celeraOverlay\ type bed 4 +\ visibility hide\ encodeAffyChIpHl60PvalH3K27me3Hr32 Affy H3K27me3 RA 32h wig 0.0 534.54 Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 32hrs) P-Value 0 31 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 150,75,0\ longLabel Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 32hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 31\ shortLabel Affy H3K27me3 RA 32h\ subGroups factor=H3K27me3 time=32h\ track encodeAffyChIpHl60PvalH3K27me3Hr32\ genomicDups Duplications bed 6 + Duplications of >1000 Bases Sequence 0 31 170 0 0 160 150 0 0 0 0 This region was detected as a genomic duplication within the golden path. \ Duplications of 99% or greater similarity, which are likely missed overlaps, \ are shown as red. Duplications of 98% - 99% similarity are shown as yellow. \ Duplications of 90% - 98% similarity are shown as shades of gray. Cut off \ values were at least 1 kb of total sequence aligned (containing at least 500 bp \ non-RepeatMasked sequence) and at least 90% sequence identity. For a \ description of the 'fuguization' detection method see \ Bailey, et al (2001) Genome Res 11:1005-17. \ The data were provided by \ Jeff Bailey \ \ and Evan Eichler.\
\ map 1 altColor 160,150,0\ color 170,0,0\ group map\ longLabel Duplications of >1000 Bases Sequence\ priority 31\ shortLabel Duplications\ track genomicDups\ type bed 6 +\ visibility hide\ encodeAffyEc1TertBJSignal EC1 Sgnl TertBJ wig 0 62385 Affy Ext Trans Signal (1-base window) (Tert-BJ) 0 31 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 0 color 0,0,205\ longLabel Affy Ext Trans Signal (1-base window) (Tert-BJ)\ parent encodeAffyEcSignal\ priority 31\ shortLabel EC1 Sgnl TertBJ\ track encodeAffyEc1TertBJSignal\ encodeAffyEc1TertBJSites EC1 Sites TertBJ bed 3 . Affy Ext Trans Sites (1-base window) (Tert-BJ) 0 31 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 1 color 0,0,205\ longLabel Affy Ext Trans Sites (1-base window) (Tert-BJ)\ parent encodeAffyEcSites\ priority 31\ shortLabel EC1 Sites TertBJ\ track encodeAffyEc1TertBJSites\ encodeYaleAffyNB4UntrRNATarsIntergenicProximal Yale Ig Prx NB4 Un bed 4 . Yale Intergenic Proximal Untreated NB4 TARs 0 31 179 143 103 217 199 179 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 179,143,103\ longLabel Yale Intergenic Proximal Untreated NB4 TARs\ parent encodeNoncodingTransFrags\ priority 31\ shortLabel Yale Ig Prx NB4 Un\ subGroups region=intergenicProximal celltype=nb4 source=yale\ track encodeYaleAffyNB4UntrRNATarsIntergenicProximal\ encodeYaleAffyRNATars Yale TAR bed 3 . Yale RNA Transcriptionally Active Regions (TARs) 0 31 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows the locations of transcriptionally active regions \ (TARs)/transcribed fragments (transfrags) for the following, hybridized to\ the Affymetrix ENCODE oligonucleotide microarray:\

\

\ The human NB4 cell can be made to differentiate towards either monocytes (by\ treatment with TPA) or neutrophils (by treatment with RA). See Kluger\ et al., 2004 in the References section for more details about the\ differentiation of hematopoietic cells.

\

\ This array has 25-mer oligonucleotide probes tiled \ approximately every 22 bp, covering all the non-repetitive DNA sequence \ of the ENCODE regions. The transcript map is a combined signal for both \ strands of DNA. This is derived from the number of different biological \ samples indicated above, each with at least two technical replicates.

\

\ See the following NCBI GEO accessions for details of experimental protocols:\

\ \

Display Conventions and Configuration

\

\ TARs are represented by blocks in the graphical display. This composite \ annotation track consists of several subtracks that are listed at the top of\ the track description page. To display only selected subtracks, uncheck the \ boxes next to the tracks you wish to hide.

\

\ Color differences among the subtracks are arbitrary. They provide a\ visual cue for distinguishing between the different data samples.

\ \

Methods

\

\ The data from technical replicates were median-scaled and quantile-normalized \ to each other. Using a 101 bp sliding window centered on \ each oligonucleotide probe, a signal map estimating RNA abundance was\ generated by computing the pseudomedian signal of all PM-MM pairs \ (median of pairwise PM-MM averages) within the window, including \ replicates. Biological replicate signal maps were combined by \ quantile-normalizing them between replicates and computing the median signal \ at each oligonucleotide probe location. Independent biological \ replicates (as described above) were generated, and each was hybridized\ to at least two different arrays (technical replicates). Transcribed regions \ (TARs/transfrags) were then identified using a signal theshold of 90 \ percentile of signal intensities, as well as a maximum gap of 50 bp and \ a minimum run of 50 bp (between oligonucleotide positions).

\ \

Verification

\

\ Transcribed regions (TARs/transfrags), as determined by individual \ biological samples, were compared to ensure significant overlap.

\ \

Credits

\

\ These data were generated and analyzed by the Yale/Affymetrix \ collaboration between the labs of Michael Snyder, Mark Gerstein and \ Sherman Weissman at Yale University and Tom Gingeras at Affymetrix.

\ \

References

\

\ Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., \ Rinn, J.L., Tongprasit, W., Samanta, M. et al.\ Global identification of human transcribed sequences with \ genome tiling arrays. \ Science 306(5705), 2242-6 (2004).

\

\ Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., \ Long, J., Stern, D., Tammana, H. et al.\ Transcriptional maps of 10 human chromosomes at 5-nucleotide \ resolution. \ Science 308(5725), 1149-54 (2005).

\

\ Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., \ Fodor, S.P. and Gingeras, T.R.\ Large-scale transcriptional activity in chromosomes 21 and \ 22. \ Science 296(5569), 916-9 (2002).

\

\ Kluger, Y., Tuck, D.P., Chang, J.T., Nakayama, Y., Poddar, R., Kohya, N., \ Lian, Z., Ben Nasr, A., Halaban, H.R. et al.\ Lineage specificity of gene expression patterns. \ Proc Natl Acad Sci U S A 101(17), 6508-13 (2004).

\

\ Rinn, J.L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N.M., \ Hartman, S., Harrison, P.M., Nelson, F.K., Miller, P. et al.\ The transcriptional activity of human Chromosome 22. \ Genes Dev 17(4), 529-40 (2003).

\ encodeTxLevels 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeTxLevels\ longLabel Yale RNA Transcriptionally Active Regions (TARs)\ priority 31.0\ shortLabel Yale TAR\ subGroup1 samples Sample summary=Summary samples=samples1-10\ subGroup2 celltype Cell_Type neutro=Neutrophil plac=Placenta nb4=NB4\ track encodeYaleAffyRNATars\ type bed 3 .\ visibility hide\ encodeAffyChIpHl60SitesH3K27me3Hr32 Affy H3K27me3 RA 32h bed 3 . Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 32hrs) Sites 0 32 150 75 0 202 165 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 150,75,0\ longLabel Affymetrix ChIP/Chip (H3K27me3 retinoic acid-treated HL-60, 32hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 32\ shortLabel Affy H3K27me3 RA 32h\ subGroups factor=H3K27me3 time=32h\ track encodeAffyChIpHl60SitesH3K27me3Hr32\ dupes Duplications bed 6 . Duplications of >98% Identity >1kb 1 32 0 0 0 127 127 127 0 0 0 map 1 group map\ longLabel Duplications of >98% Identity >1kb\ priority 32\ shortLabel Duplications\ track dupes\ type bed 6 .\ visibility dense\ encodeAffyEc51TertBJSignal EC51 Sgnl TertBJ wig 0 62385 Affy Ext Trans Signal (51-base window) (Tert-BJ) 0 32 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 0 color 0,0,205\ longLabel Affy Ext Trans Signal (51-base window) (Tert-BJ)\ parent encodeAffyEcSignal\ priority 32\ shortLabel EC51 Sgnl TertBJ\ track encodeAffyEc51TertBJSignal\ encodeAffyEc51TertBJSites EC51 Site TertBJ bed 3 . Affy Ext Trans Sites (51-base window) (Tert-BJ) 0 32 0 0 205 127 127 230 0 0 2 chr21,chr22, encodeTxLevels 1 color 0,0,205\ longLabel Affy Ext Trans Sites (51-base window) (Tert-BJ)\ parent encodeAffyEcSites\ priority 32\ shortLabel EC51 Site TertBJ\ track encodeAffyEc51TertBJSites\ encodeYaleAffyNeutRNATarsAllIntergenicProximal Yale Ig Prx Neu bed 4 . Yale Intergenic Proximal Neutrophil TARs 0 32 185 137 115 220 196 185 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 185,137,115\ longLabel Yale Intergenic Proximal Neutrophil TARs\ parent encodeNoncodingTransFrags\ priority 32\ shortLabel Yale Ig Prx Neu\ subGroups region=intergenicProximal celltype=neut source=yale\ track encodeYaleAffyNeutRNATarsAllIntergenicProximal\ Nregion N Regions bed 4 . N Regions 0 32.5 150 100 30 202 177 142 0 0 0

Description

\

\ This track displays contiguous Ns of 1000 or more.\

Credits

\ It was generated with the nibCheck utility.\ \ map 1 color 150,100,30\ group map\ longLabel N Regions\ priority 32.5\ shortLabel N Regions\ track Nregion\ type bed 4 .\ visibility hide\ encodeAffyChIpHl60PvalH4Kac4Hr00 Affy H4Kac4 RA 0h wig 0.0 534.54 Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 0hrs) P-Value 0 33 125 100 0 190 177 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 125,100,0\ longLabel Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 0hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 33\ shortLabel Affy H4Kac4 RA 0h\ subGroups factor=H4Kac4 time=0h\ track encodeAffyChIpHl60PvalH4Kac4Hr00\ genieKnown Known Genes genePred Known Genes (from Full-Length mRNAs) 3 33 20 20 170 137 137 212 0 0 0 genes 1 color 20,20,170\ group genes\ longLabel Known Genes (from Full-Length mRNAs)\ priority 33\ shortLabel Known Genes\ track genieKnown\ type genePred\ visibility pack\ encodeYaleAffyPlacRNATarsIntergenicProximal Yale Ig Prx Plac bed 4 . Yale Intergenic Proximal Placental TARs 0 33 191 131 127 223 193 191 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 191,131,127\ longLabel Yale Intergenic Proximal Placental TARs\ parent encodeNoncodingTransFrags\ priority 33\ shortLabel Yale Ig Prx Plac\ subGroups region=intergenicProximal celltype=plac source=yale\ track encodeYaleAffyPlacRNATarsIntergenicProximal\ encodeAffyEcSuper Affy EC Affymetrix ENCODE Extension Transcription 0 34 0 0 0 127 127 127 0 0 2 chr21,chr22,

Overview

\ This super-track combines related tracks of the ENCODE Extension data generated \ by Affymetrix. There are two member tracks:\ \ \

Methods

\ The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) \ and all arrays were scaled to a median array intensity of 330. Using two \ different approaches: i) no sliding window ii) sliding 51-bp window centered \ on each probe, an estimate of RNA abundance (signal) was computed by \ calculating the median of all pairwise average PM-MM values, where PM is a \ perfect match and MM is a mismatch. Both Kapranov et al. (2002) and Cawley \ et al. (2004) are good references for the experimental methods. The latter\ also describes the analytical methods.\ \

Verification

\ Single biological replicates were generated and hybridized to duplicate arrays \ (two technical replicates). Transcribed regions were generated from the \ composite signal track by merging genomic positions to which probes are mapped.\ This merging was based on a 5% false positive rate cutoff in negative bacterial\ controls, a maximum gap (MaxGap) of 25 basepairs and minimum run (MinRun) of \ 25 basepairs (see the Affy TransFrags track for the merged regions). \ \

Credits

\ These data were generated and analyzed by the collaboration of the following \ groups: the Tom Gingeras group at Affymetrix, Roderic Guigo group at Centre \ de Regulacio Genomica, Alexandre Reymond group at the University of Lausanne \ and Stylianos Antonarakis group at the University of Geneva. \ \

References

\ Please see the \ Affymetrix Transcriptome site for a project overview and additional \ references to Affymetrix tiling array publications.\

\ Bolstad BM, Irizarry RA, Astrand M, Speed TP. \ A comparison of normalization methods for high density oligonucleotide\ array data based on variance and bias. \ Bioinformatics. 2003 Jan 22;19(2):185-93.\

\ Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA,\ Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al.\ Unbiased mapping of transcription factor \ binding sites along human chromosomes 21 and 22 points to widespread \ regulation of noncoding RNAs. Cell. 2004 Feb 20;116(4):499-509.\

\ Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, \ Gingeras TR. \ Large-scale transcriptional activity in chromosomes 21 and 22. \ Science. 2002 May 3;296(5569):916-9.\ encodeTxLevels 0 chromosomes chr21,chr22\ group encodeTxLevels\ longLabel Affymetrix ENCODE Extension Transcription\ priority 34.0\ shortLabel Affy EC\ superTrack on\ track encodeAffyEcSuper\ encodeAffyChIpHl60SitesH4Kac4Hr00 Affy H4Kac4 RA 0h bed 3 . Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 0hrs) Sites 0 34 125 100 0 190 177 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 125,100,0\ longLabel Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 0hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 34\ shortLabel Affy H4Kac4 RA 0h\ subGroups factor=H4Kac4 time=0h\ track encodeAffyChIpHl60SitesH4Kac4Hr00\ encodeAffyRnaGm06990SitesIntergenicDistal Affy Ig Dst GM06990 bed 4 . Affymetrix Intergenic Distal GM06990 Transfrags 0 34 0 0 255 127 127 255 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 0,0,255\ longLabel Affymetrix Intergenic Distal GM06990 Transfrags\ parent encodeNoncodingTransFrags\ priority 34\ shortLabel Affy Ig Dst GM06990\ subGroups region=intergenicDistal celltype=gm06990 source=affy\ track encodeAffyRnaGm06990SitesIntergenicDistal\ knownGene Known Genes genePred knownGenePep knownGeneMrna Known Genes (March 04) Based on SWISS-PROT, TrEMBL, mRNA, and RefSeq 0 34 12 12 120 133 133 187 0 0 0

Description

\

\ The UCSC Known Genes track shows known protein-coding genes based on \ protein data from SWISS-PROT, TrEMBL, and TrEMBL-NEW and their\ corresponding mRNAs from \ GenBank.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for\ gene prediction\ tracks. Black coloring indicates features that have corresponding entries\ in the Protein Databank (PDB). Blue indicates features associated with\ mRNAs from NCBI RefSeq or (dark blue) items having associated proteins in\ the SWISS-PROT database. The variation in blue shading of RefSeq items\ corresponds to the level of review the RefSeq record has undergone:\ predicted (light), provisional (medium), or reviewed (dark).

\

\ This track contains an optional codon coloring\ feature that allows users to quickly validate and compare gene predictions.\ To display codon colors, select the genomic codons option from the\ Color track by codons pull-down menu. Click\ here for more\ information about this feature.

\ \

Methods

\

\ mRNA sequences were aligned against the human genome using blat. When a \ single mRNA aligned in multiple places, only alignments having at least 98% \ base identity with the genomic sequence were kept. This set of mRNA \ alignments was further reduced by keeping only those mRNAs referenced by a \ protein in SWISS-PROT, TrEMBL, or TrEMBL-NEW.

\

\ Among multiple mRNAs referenced by a single protein, the best mRNA was \ selected, based on a quality score derived from its length, the level of the\ match between its translation and the protein sequence, and its release date.\ The resulting mRNA and protein pairs were further filtered by removing \ short invalid entries and consolidating entries with identical CDS regions.\

\

\ Finally, RefSeq entries derived from DNA sequences instead of \ mRNA sequences were added to produce the final data set shown in this track. \ Disease annotations were obtained from SWISS-PROT.

\ \

Credits

\

\ The Known Genes track was produced at UCSC based primarily on cross-references\ between proteins from \ SWISS-PROT \ (including TrEMBL and TrEMBL-NEW) and mRNAs from \ GenBank\ contributed by scientists worldwide. \ NCBI RefSeq \ data were also included in this track.

\ \

Data Use Restrictions

\

\ The UniProt data have the following terms of use, UniProt copyright(c) 2002 - \ 2004 UniProt consortium:

\

\ For non-commercial use, all databases and documents in the UniProt FTP\ directory may be copied and redistributed freely, without advance\ permission, provided that this copyright statement is reproduced with\ each copy.

\

\ For commercial use, all databases and documents in the UniProt FTP\ directory except the files\

\ may be copied and redistributed freely, without advance permission,\ provided that this copyright statement is reproduced with each copy.\ More information for commercial users can be found \ here.\

\ From January 1, 2005, all databases and documents in the UniProt FTP\ directory may be copied and redistributed freely by all entities,\ without advance permission, provided that this copyright statement is\ reproduced with each copy.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J,\ Wheeler DL.\ GenBank: update.\ Nucleic Acids Res. 2004 Jan 1;32:D23-6.

\

\ Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D.\ The UCSC Known Genes.\ Bioinformatics. 2006 May 1;22(9):1036-46.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 12,12,120\ defaultLinkedTables kgXref\ directUrl /cgi-bin/hgGene?hgg_gene=%s&hgg_chrom=%s&hgg_start=%d&hgg_end=%d&hgg_type=%s&db=%s\ group genes\ hgGene on\ hgsid on\ idXref kgAlias kgID alias\ intronGap 12\ longLabel Known Genes (March 04) Based on SWISS-PROT, TrEMBL, mRNA, and RefSeq\ priority 34\ shortLabel Known Genes\ track knownGene\ type genePred knownGenePep knownGeneMrna\ visibility hide\ encodeAffyEcSites Affy EC Sites bed 3 . Affymetrix ENCODE Extension Transcription Sites 0 34.1 0 0 0 127 127 127 0 0 2 chr21,chr22,

Description

\ This track shows the location of sites showing transcription (transfrags) for \ chromosomes 21 and 22 for 5 cell lines and 11 tissues. The 5 cell lines used \ were: GM06990, HepG2, K562, HeLaS3 and Tert-BJ; the 11 tissues used were: \ cerebellum, brain frontal lobe, hippocampus, hypothalamus, fetal spleen, fetal \ kidney, fetal thymus, ovary, placenta, prostate and testis. Purified cytosolic \ polyA+ RNA from GM06990, HepG2 and Tert-BJ cell lines, as well as purified \ polyA+ RNA from whole-cell extracts of the remaining cell lines and tissues, \ were hybridized to Affymetrix Chromosome 21_22_v2 oligonucleotide tiling \ arrays, which have 25-mer probes spaced on average every 17 bp (center-center \ of each 25mer) in the non-repetitive regions of human chromosomes 21 and 22. \ Clustered sites are shown in separate subtracks for each cell and tissue types. \

\ Data for all biological replicates can be \ downloaded from Affymetrix in wig, BED, and cel formats.\ \

Display Conventions and Configuration

\ The subtracks within this composite annotation track may be configured\ in a variety of ways to highlight different aspects of the displayed\ data. The graphical configuration options for the subtracks are shown\ at the top of the track description page, followed by a list of\ subtracks. To show only selected subtracks, uncheck the boxes next to\ the tracks that you wish to hide.\ \

Methods

\ The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) \ and all arrays were scaled to a median array intensity of 330. Using two \ different approaches: i) no sliding window ii) sliding 51-bp window centered on \ each probe, an estimate of RNA abundance (signal) was computed by calculating \ the median of all pairwise average PM-MM values, where PM is a perfect match \ and MM is a mismatch. Both Kapranov et al. (2002) and Cawley et al. \ (2004) are good references for the experimental methods. The latter also describes the \ analytical methods.\

\

Verification

\ Single biological replicates were generated and hybridized to duplicate arrays \ (two technical replicates). Transcribed regions (see the Affy RNA Signal track)\ were generated from the composite signal track by merging genomic positions to \ which probes are mapped. This merging was based on a 5% false positive rate \ cutoff in negative bacterial controls, a maximum gap (MaxGap) of 25 basepairs \ and minimum run (MinRun) of 25 basepairs. \ \

Credits

\ These data were generated and analyzed by the collaboration of the\ following groups: the Tom Gingeras group at Affymetrix, Roderic Guigo group at Centre de Regulacio Genomica, \ Alexandre Reymond group at the University of Lausanne and Stylianos Antonarakis \ group at the University of Geneva. \ \

References

\ Please see the \ Affymetrix Transcriptome site for a project overview and additional\ references to Affymetrix tiling array publications.\

\ Bolstad BM, Irizarry RA, Astrand M, Speed TP.\ \ A comparison of normalization methods for high density oligonucleotide\ array data based on variance and bias.\ Bioinformatics. 2003 Jan 22;19(2):185-93.\

\ Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA,\ Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al.\ Unbiased mapping of transcription factor \ binding sites along human chromosomes 21 and 22 points to widespread \ regulation of noncoding RNAs. Cell. 2004 Feb 20;116(4):499-509.\

\ Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP,\ Gingeras TR. \ Large-scale transcriptional activity in chromosomes 21 and 22.\ Science. 2002 May 3;296(5569):916-9.\ encodeTxLevels 1 chromosomes chr21,chr22\ compositeTrack on\ group encodeTxLevels\ longLabel Affymetrix ENCODE Extension Transcription Sites\ origAssembly hg17\ priority 34.1\ shortLabel Affy EC Sites\ superTrack encodeAffyEcSuper dense\ track encodeAffyEcSites\ type bed 3 .\ visibility hide\ hg17Kg Known Genes II genePred UCSC Known Genes II (June 05, Based on hg17 UCSC Known Genes) 3 34.1 12 12 120 133 133 187 0 0 0 http://genome.ucsc.edu/cgi-bin/hgGene?db=hg17&hgg_gene=$$&hgg_chrom=none&hgg_type=knownGene

Description

\

\ The Known Genes II track was built based on UCSC Known Genes data set of hg17 (Human May 2004 Assembly).\ Clicking the "Outside Link" entry above will bring you to the gene details page of hg17 (Human May 2004 Assembly). The original "Known Genes" track of hg16 \ (built in March, 2004) \ is somewhat outdated, but still available.\

Methods

\

The hg17 UCSC Known Genes was built by a new process, KG II, \ as described below.\

\ UniProt protein sequences (including alternative splicing isoforms) \ and mRNA sequences from RefSeq and GenBank \ were aligned against the base genome using BLAT. \ RefSeq alignments having a base identity level within 0.1% of the best \ and at least 96% base identity with the genomic sequence were kept. \ GenBank mRNA alignments having a base identity level within 0.2% of \ the best and at least 97% base identity with the genomic sequence were kept. \ Protein alignments having a base identity level within 0.2% of the best and \ at least 80% base identity with the genomic sequence were kept.\

Then the genomic mRNA and protein alignments were compared, \ and protein-mRNA pairings were determined from their overlaps. \ mRNA CDS data were obtained from RefSeq and GenBank data \ and supplemented by CDS structures derived from UCSC protein-mRNA BLAT alignments. \ The initial set of UCSC Known Genes candidates consists of \ all protein-mRNA pairs with valid mRNA CDS structures. \ A gene-check program (similar to the one used for the Consensus CDS (CCDS) project) \ is used to remove questionable candidates, such as those with in-frame stop \ codons, missing start or stop codons, etc.\

From each group of gene candidates that share the same CDS structure, \ the protein-mRNA pair having the best ranking and protein-mRNA alignment score \ is selected as a UCSC Known Gene. \ The ranking of a gene candidate depends on its gene-check quality measures. \ When all else is equal, \ a preference is given to RefSeq mRNAs and next to MGC mRNAs. \ Similarly, preference is given to gene candidates represented by Swiss-Prot \ proteins. \ The protein-mRNA alignment score is calculated based on a protein-to-mRNA \ alignment using TBLASTN, plus weighted sub-scores according \ to the date and length of the mRNA. \

Credits

\

\ The UCSC Known Genes track was produced using protein data from \ UniProt and mRNA \ data from NCBI \ RefSeq\ and GenBank.

\ \

Data Use Restrictions

\

\ The UniProt data have the following terms of use, UniProt copyright(c) 2002 - \ 2004 UniProt consortium:

\

\ For non-commercial use, all databases and documents in the UniProt FTP\ directory may be copied and redistributed freely, without advance\ permission, provided that this copyright statement is reproduced with\ each copy.

\

\ For commercial use, all databases and documents in the UniProt FTP\ directory except the files\

\ may be copied and redistributed freely, without advance permission,\ provided that this copyright statement is reproduced with each copy.\ More information for commercial users can be found \ here.\

\ From January 1, 2005, all databases and documents in the UniProt FTP\ directory may be copied and redistributed freely by all entities,\ without advance permission, provided that this copyright statement is\ reproduced with each copy.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, \ Wheeler DL. \ GenBank: update. Nucleic Acids Res. \ 2004 Jan 1;32(Database issue):D23-6.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ \ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 12,12,120\ group genes\ longLabel UCSC Known Genes II (June 05, Based on hg17 UCSC Known Genes)\ priority 34.1\ shortLabel Known Genes II\ track hg17Kg\ type genePred\ url http://genome.ucsc.edu/cgi-bin/hgGene?db=hg17&hgg_gene=$$&hgg_chrom=none&hgg_type=knownGene\ visibility pack\ encodeAffyEcSignal Affy EC Signal wig 0 62385 Affymetrix ENCODE Extension Transcription Signal 0 34.2 0 0 0 127 127 127 0 0 2 chr21,chr22,

Description

\ This track shows an estimate of RNA abundance (transcription) for chromosomes 21 \ and 22 for 5 cell lines and 11 tissues. The 5 cell lines used were: GM06990, \ HepG2, K562, HeLaS3 and Tert-BJ; the 11 tissues used were: cerebellum, brain \ frontal lobe, hippocampus, hypothalamus, fetal spleen, fetal kidney, fetal thymus, \ ovary, placenta, prostate and testis. Purified cytosolic polyA+ RNA from GM06990, \ HepG2 and Tert-BJ cell lines, as well as purified polyA+ RNA from whole cell \ extracts of the remaining cell lines and tissues, were hybridized to Affymetrix \ Chromosome 21_22_v2 oligonucleotide tiling arrays, which have 25-mer probes \ spaced on average every 17 bp (center-center of each 25mer) in the non-repetitive \ regions of human chromosomes 21 and 22. Composite signals are shown in separate \ subtracks for each cell and tissue types. \

\ Data for all biological replicates can be \ downloaded from Affymetrix in wig, BED, and cel formats.\ \

Display Conventions and Configuration

\ The subtracks within this composite annotation track may be configured\ in a variety of ways to highlight different aspects of the displayed\ data. The graphical configuration options for the subtracks are shown\ at the top of the track description page, followed by a list of\ subtracks. To show only selected subtracks, uncheck the boxes next to\ the tracks that you wish to hide. For more information about the\ graphical configuration options, click the Graph configuration help\ link.\ \

Methods

\ The data from replicate arrays were quantile-normalized (Bolstad et al., \ 2003) and all arrays were scaled to a median array intensity of 330. Using two \ different approaches: i) no sliding window ii) sliding 51-bp window centered on \ each probe, an estimate of RNA abundance (signal) was computed by calculating \ the median of all pairwise average PM-MM values, where PM is a perfect match \ and MM is a mismatch. Both Kapranov et al. (2002) and Cawley \ et al. (2004) are good references for the experimental methods. The \ latter also describes the analytical methods.\ \

Verification

\ Single biological replicates were generated and hybridized to duplicate arrays \ (two technical replicates). Transcribed regions were generated from the \ composite signal track by merging genomic positions to which probes are mapped. \ This merging was based on a 5% false positive rate cutoff in negative bacterial \ controls, a maximum gap (MaxGap) of 25 basepairs and minimum run (MinRun) of \ 25 basepairs (see the Affy TransFrags track for the merged regions). \ \

Credits

\ These data were generated and analyzed by the collaboration of the following \ groups: the Tom Gingeras group at Affymetrix, Roderic Guigo group at Centre \ de Regulacio Genomica, Alexandre Reymond group at the University of Lausanne \ and Stylianos Antonarakis group at University of Geneva. \ \

References

\ Please see the Affymetrix \ Transcriptome site for a project overview and additional references to \ Affymetrix tiling array publications.\

\ Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A\ comparison of normalization methods for high density oligonucleotide\ array data based on variance and bias. Bioinformatics 19(2),\ 185-193 (2003).\

\ Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A.,\ Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams,\ A. J., et al. Unbiased mapping of transcription factor binding sites along human \ chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell \ 116(4), 499-509 (2004).\

\ Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg,\ R. L., Fodor, S. P., and Gingeras, T. R. Large-scale transcriptional activity in chromosomes 21 and \ 22. Science 296(5569), 916-919 (2002). \ encodeTxLevels 0 autoScale Off\ chromosomes chr21,chr22\ compositeTrack on\ group encodeTxLevels\ longLabel Affymetrix ENCODE Extension Transcription Signal\ maxHeightPixels 100:30:10\ origAssembly hg17\ priority 34.2\ shortLabel Affy EC Signal\ superTrack encodeAffyEcSuper dense\ track encodeAffyEcSignal\ type wig 0 62385\ viewLimits 0:5000\ visibility hide\ knownAlt Alt Events bed 6 . Alternative Splicing, Alternative Promoter and Similar Events in UCSC Genes 0 34.2 90 0 150 172 127 202 0 0 0

Description

\

This track shows various types of alternative splicing and other\ events that result in more than a single transcript from the same\ gene. The label by an item describes the type of event. The events are:

\ \ \

Credits

\

This track is based on an analysis by the txgAnalyse program of splicing graphs\ produced by the txGraph program. Both of these programs were written by Jim\ Kent at UCSC.

\ genes 1 color 90,0,150\ group genes\ longLabel Alternative Splicing, Alternative Promoter and Similar Events in UCSC Genes\ noScoreFilter .\ priority 34.2\ shortLabel Alt Events\ track knownAlt\ type bed 6 .\ visibility hide\ ccdsGene CCDS genePred Consensus CDS 0 34.5 12 120 12 133 187 133 0 0 0

Description

\

\ This track shows human genome high-confidence gene annotations from the\ Consensus \ Coding Sequence (CCDS) project. This project is a collaborative effort \ to identify a core set of \ human protein-coding regions that are consistently annotated and of high \ quality. The long-term goal is to support convergence towards a standard set \ of gene annotations on the human genome.\

\

Collaborators include:\

\ \

Methods

\

\ CDS annotations of the human genome were obtained from two sources:\ NCBI \ RefSeq and a union of the gene annotations from \ Ensembl and \ Vega, collectively known \ as Hinxton.

\

\ Genes with identical CDS genomic coordinates in both sets become CCDS \ candidates. The genes undergo a quality evaluation, which must be approved by \ all collaborators. The following criteria are currently used to assess each\ gene: \

\

\ A unique CCDS ID is assigned to the CCDS, which links together all gene \ annotations with the same CDS. CCDS gene annotations are under continuous\ review, with periodic updates to this track.\

\ \

Credits

\

\ This track was produced at UCSC from data downloaded from the\ CCDS project \ web site.\

\ \

References

\

\ Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, \ Farrell CM, Loveland JE, Ruef BJ et al. \ The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. \ Genome Res. 2009 Jun 4. [Epub ahead of print]\

\ Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, et al.\ The Ensembl genome database project. \ Nucl. Acids Res. 2002 Jan 1;30(1):38-41.

\

\ Pruitt KD, Tatusova T, Maglott DR.\ NCBI Reference Sequence (RefSeq): a curated non-redundant \ sequence database of genomes, transcripts and proteins. \ Nucl. Acids Res. 2005 Jan 1;33(Database Issue):D501-D504. \

\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 12,120,12\ group genes\ longLabel Consensus CDS\ priority 34.5\ shortLabel CCDS\ track ccdsGene\ type genePred\ visibility hide\ interPro InterPro bed 4 InterPro Domains 0 34.6 12 12 120 133 133 187 0 0 0

Description

\

\ Description of InterPro goes here.\ \

Methods

\

\ Methods goes here.\

Credits

\

\ Credits goes here.\ \ genes 1 color 12,12,120\ group genes\ longLabel InterPro Domains\ priority 34.6\ shortLabel InterPro\ track interPro\ type bed 4\ visibility hide\ encodeAffyChIpHl60PvalH4Kac4Hr02 Affy H4Kac4 RA 2h wig 0.0 534.54 Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 2hrs) P-Value 0 35 125 100 0 190 177 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 125,100,0\ longLabel Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 2hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 35\ shortLabel Affy H4Kac4 RA 2h\ subGroups factor=H4Kac4 time=2h\ track encodeAffyChIpHl60PvalH4Kac4Hr02\ encodeAffyRnaHeLaSitesIntergenicDistal Affy Ig Dst HeLa bed 4 . Affymetrix Intergenic Distal HeLa Transfrags 0 35 5 0 250 130 127 252 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 5,0,250\ longLabel Affymetrix Intergenic Distal HeLa Transfrags\ parent encodeNoncodingTransFrags\ priority 35\ shortLabel Affy Ig Dst HeLa\ subGroups region=intergenicDistal celltype=hela source=affy\ track encodeAffyRnaHeLaSitesIntergenicDistal\ refGene RefSeq Genes genePred refPep refMrna RefSeq Genes 1 35 12 12 120 133 133 187 0 0 0

Description

\

\ The RefSeq Genes track shows known human protein-coding and \ non-protein-coding genes taken from the NCBI RNA reference sequences \ collection (RefSeq). The data underlying this track are updated daily.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ \ gene prediction tracks.\ The color shading indicates the level of review the RefSeq record has \ undergone: predicted (light), provisional (medium), reviewed (dark).

\

\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page. \ This page is accessed via the small button to the left of the track's \ graphical display or through the link on the track's control menu. \

\ \

Methods

\

\ RefSeq RNAs were aligned against the human genome using blat; those\ with an alignment of less than 15% were discarded. When a single RNA \ aligned in multiple places, the alignment having the highest base identity \ was identified. Only alignments having a base identity level within 0.1% of \ the best and at least 96% base identity with the genomic sequence were kept.\

\ \ \

Credits

\

\ This track was produced at UCSC from RNA sequence data\ generated by scientists worldwide and curated by the \ NCBI RefSeq project.

\ \

References

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ \

Pruitt KD, Tatusova T, Maglott DR. \ NCBI Reference Sequence (RefSeq): a curated non-redundant \ sequence database of genomes, transcripts and proteins. Nucleic Acids \ Res. 2005 Jan 1;33(Database issue):D501-4.\

\ genes 1 baseColorUseCds given\ color 12,12,120\ group genes\ idXref refLink mrnaAcc name\ longLabel RefSeq Genes\ priority 35\ shortLabel RefSeq Genes\ track refGene\ type genePred refPep refMrna\ visibility dense\ xenoRefGene Other RefSeq genePred xenoRefPep xenoRefMrna Non-Human RefSeq Genes 0 35.1 12 12 120 133 133 187 0 0 0

Description

\

\ This track shows known protein-coding and non-protein-coding genes \ for organisms other than human, taken from the NCBI RNA reference \ sequences collection (RefSeq). The data underlying this track are \ updated daily.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks.\ The color shading indicates the level of review the RefSeq record has \ undergone: predicted (light), provisional (medium), reviewed (dark).

\

\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page. \

\ \

Methods

\

\ The RNAs were aligned against the human genome using blat; those\ with an alignment of less than 15% were discarded. When a single RNA aligned \ in multiple places, the alignment having the highest base identity was \ identified. Only alignments having a base identity level within 0.5% of \ the best and at least 25% base identity with the genomic sequence were kept.\

\ \

Credits

\

\ This track was produced at UCSC from RNA sequence data\ generated by scientists worldwide and curated by the \ NCBI RefSeq project.

\ \

References

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ genes 1 color 12,12,120\ group genes\ longLabel Non-$Organism RefSeq Genes\ priority 35.1\ shortLabel Other RefSeq\ track xenoRefGene\ type genePred xenoRefPep xenoRefMrna\ visibility hide\ rgdGene RGD Genes genePred Rat Genome Database Curated Genes 1 35.5 12 12 120 133 133 187 0 0 0 http://rgd.mcw.edu/generalSearch/RgdSearch.jsp?quickSearch=1&searchKeyword=

Description

\

\ This track shows RefSeq genes curated by the Rat Genome Database (RGD).\ Coding exons are represented by \ blocks connected by horizontal lines representing introns. The 5' and 3' \ untranslated regions (UTRs) are displayed as thinner blocks on the leading \ and trailing ends of the aligning regions. In full display mode, arrowheads \ on the connecting intron lines indicate the direction of transcription.

\ \

Methods

\

\ The annotation data file, \ RGD_curated_genes.gff, was downloaded from the RGD website\ and processed to create this track.

\ \

Credits

\

\ Thanks to the RGD for \ providing this annotation. RGD is funded by grant HL64541 entitled "Rat \ Genome Database", awarded to Dr. Howard J Jacob, Medical College of \ Wisconsin, from the National Heart Lung and Blood Institute \ (NHLBI) of the National \ Institutes of Health (NIH).\

\ genes 1 color 12,12,120\ group genes\ longLabel Rat Genome Database Curated Genes\ priority 35.5\ shortLabel RGD Genes\ track rgdGene\ type genePred\ url http://rgd.mcw.edu/generalSearch/RgdSearch.jsp?quickSearch=1&searchKeyword=\ visibility dense\ encodeAffyChIpHl60SitesH4Kac4Hr02 Affy H4Kac4 RA 2h bed 3 . Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 2hrs) Sites 0 36 125 100 0 190 177 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 125,100,0\ longLabel Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 2hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 36\ shortLabel Affy H4Kac4 RA 2h\ subGroups factor=H4Kac4 time=2h\ track encodeAffyChIpHl60SitesH4Kac4Hr02\ encodeAffyRnaHl60SitesHr00IntergenicDistal Affy Ig Dst HL60 0h bed 4 . Affymetrix Intergenic Distal HL60 Transfrags 0 36 30 0 225 142 127 240 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 30,0,225\ longLabel Affymetrix Intergenic Distal HL60 Transfrags\ parent encodeNoncodingTransFrags\ priority 36\ shortLabel Affy Ig Dst HL60 0h\ subGroups region=intergenicDistal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr00IntergenicDistal\ mgcFullMrna MGC Genes psl Mammalian Gene Collection Full ORF mRNAs 0 36 0 100 0 127 177 127 0 0 0

Description

\

\ This track shows alignments of human mRNAs from the\ Mammalian Gene Collection \ (MGC) having full-length open reading frames (ORFs) to the genome.\ The goal of the Mammalian Gene Collection is to provide researchers with\ unrestricted access to sequence-validated full-length protein-coding cDNA\ clones for human, mouse, and rat genes.

\ \

Display Conventions and Configuration

\

\ The track follows the display conventions for \ gene prediction \ tracks.

\

\ An optional codon coloring feature is available for quick\ validation and comparison of gene predictions.\ To display codon colors, select the genomic codons option from the\ Color track by codons pull-down menu. For more information\ about this feature, go to the \ \ Coloring Gene Predictions and Annotations by Codon page.

\ \

Methods

\

\ GenBank human MGC mRNAs identified as having full-length ORFs \ were aligned against the genome using blat. When a single mRNA \ aligned in multiple places, the alignment having the highest base identity was\ found. Only alignments having a base identity level within 1% of\ the best and at least 95% base identity with the genomic sequence \ were kept.

\ \

Credits

\

\ The human MGC full-length mRNA track was produced at UCSC from \ mRNA sequence data submitted to \ GenBank by the Mammalian Gene Collection project.

\ \

References

\

\ Mammalian Gene Collection project references.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ genes 1 baseColorDefault diffCodons\ baseColorUseCds genbank\ baseColorUseSequence genbank\ color 0,100,0\ group genes\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Mammalian Gene Collection Full ORF mRNAs\ priority 36\ shortLabel MGC Genes\ showCdsAllScales .\ showCdsMaxZoom 10000.0\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 10000.0\ track mgcFullMrna\ type psl\ visibility hide\ orfeomeMrna ORFeome Clones psl ORFeome Collaboration Gene Clones 0 36.1 34 139 34 144 197 144 0 0 0

Description

\

\ This track shows alignments of human clones from the\ ORFeome Collaboration. The project goal is to be an\ "unrestricted source of fully sequence-validated full-ORF human cDNA \ clones in a format allowing easy transfer of the ORF sequences into \ virtually any type of expression vector. A major goal is to provide \ at least one fully sequenced full-ORF clone for each human gene."\ This track is updated automatically as new clones become available.\

\ \

Display Conventions and Configuration

\

\ The track follows the display conventions for \ gene prediction \ tracks.

\ \

Methods

\

\ ORFeome human clones were obtained from GenBank and aligned against the\ genome using the blat program. When a single clone aligned in multiple \ places,\ the alignment having the highest base identity was found. Only alignments\ having a base identity level within 0.5% of the best and at least 96% base\ identity with the genomic sequence were kept.\

\ \

Credits and references

\

\ Visit the ORFeome Collaboration \ members page for a list of credits and references.\

\ genes 1 baseColorDefault diffCodons\ baseColorUseCds genbank\ baseColorUseSequence genbank\ color 34,139,34\ group genes\ indelDoubleInsert on\ indelQueryInsert on\ longLabel ORFeome Collaboration Gene Clones\ priority 36.1\ shortLabel ORFeome Clones\ showCdsAllScales .\ showCdsMaxZoom 10000.0\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 10000.0\ track orfeomeMrna\ type psl\ visibility hide\ encodeAffyChIpHl60PvalH4Kac4Hr08 Affy H4Kac4 RA 8h wig 0.0 534.54 Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 8hrs) P-Value 0 37 125 100 0 190 177 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 125,100,0\ longLabel Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 8hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 37\ shortLabel Affy H4Kac4 RA 8h\ subGroups factor=H4Kac4 time=8h\ track encodeAffyChIpHl60PvalH4Kac4Hr08\ encodeAffyRnaHl60SitesHr02IntergenicDistal Affy Ig Dst HL60 2h bed 4 . Affymetrix Intergenic Distal HL60 Retinoic 2hr Transfrags 0 37 55 0 200 155 127 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 55,0,200\ longLabel Affymetrix Intergenic Distal HL60 Retinoic 2hr Transfrags\ parent encodeNoncodingTransFrags\ priority 37\ shortLabel Affy Ig Dst HL60 2h\ subGroups region=intergenicDistal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr02IntergenicDistal\ protBlat Protein BLAT psl protein Protein Blatted Against Genome 0 37 0 100 0 255 240 200 0 0 0 genes 1 altColor 255,240,200\ color 0,100,0\ group genes\ longLabel Protein Blatted Against Genome\ priority 37\ shortLabel Protein BLAT\ track protBlat\ type psl protein\ visibility hide\ transMap TransMap TransMap Alignments 0 37.001 0 0 0 127 127 127 0 0 0

Description

\

\ These tracks contain cDNA and gene alignments produced by\ the TransMap cross-species alignment algorithm\ from other vertebrate species in the UCSC Genome Browser.\ For closer evolutionary distances, the alignments are created using\ syntenically filtered BLASTZ alignment chains, resulting in a prediction of the\ orthologous genes in human.\

\ \ TransMap maps genes and related annotations in one species to another\ using synteny-filtered pairwise genome alignments (chains and nets) to\ determine the most likely orthologs. For example, for the mRNA TransMap track\ on the human assembly, more than 400,000 mRNAs from 25 vertebrate species were\ aligned at high stringency to the native assembly using BLAT. The alignments\ were then mapped to the human assembly using the chain and net alignments\ produced using blastz, which has higher sensitivity than BLAT for diverged\ organisms.\

\ Compared to translated BLAT, TransMap finds fewer paralogs and aligns more UTR\ bases. For closely related low-coverage assemblies, a reciprocal-best\ relationship is used in the chains and nets to improve the synteny prediction.\

\ \ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks.

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare cDNAs against the genomic sequence. For more \ information about this option, click \ here.\ Several types of alignment gap may also be colored; \ for more information, click \ here.\ \ \

Methods

\

\

    \
  1. Source transcript alignments were obtained from vertebrate organisms\ in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank \ mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes,\ were used as available.\
  2. For all vertebrate assemblies that had BLASTZ alignment chains and\ nets to the human (hg16) genome, a subset of the alignment chains were\ selected as follows:\ \
  3. The pslMap program was used to do a base-level projection of\ the source transcript alignments via the selected chains\ to the human genome, resulting in pairwise alignments of the source transcripts to\ the genome.\
  4. The resulting alignments were filtered with pslCDnaFilter\ with a global near-best criteria of 0.5% in finished genomes\ (human and mouse) and 1.0% in other genomes. Alignments\ where less than 20% of the transcript mapped were discarded.\
\

\ \

\ To ensure unique identifiers for each alignment, cDNA and gene accessions were\ made unique by appending a suffix for each location in the source genome and\ again for each mapped location in the destination genome. The format is:\

\
   accession.version-srcUniq.destUniq\
\ \ Where srcUniq is a number added to make each source alignment unique, and\ destUniq is added to give the subsequent TransMap alignments unique\ identifiers.\

\

\ For example, in the cow genome, there are two alignments of mRNA BC149621.1.\ These are assigned the identifiers BC149621.1-1 and BC149621.1-2.\ When these are mapped to the human genome, BC149621.1-1 maps to a single\ location and is given the identifier BC149621.1-1.1. However, BC149621.1-2\ maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note\ that multiple TransMap mappings are usually the result of tandem duplications, where both\ chains are identified as syntenic.\

\ \ \

Credits

\

\ This track was produced by Mark Diekhans at UCSC from cDNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. \ Comparative genomics search for losses of long-established genes \ on the human lineage. \ PLoS Comput Biol. 2007 Dec;3(12):e247.

\

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ Using native and syntenically mapped cDNA alignments to improve \ de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.

\

\ Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, \ Ewing B, Oommen S, Lau C et al.\ Targeted discovery of novel human exons by comparative \ genomics.\ Genome Res. 2007 Dec;17(12):1763-73.

\ \ genes 0 group genes\ longLabel TransMap Alignments\ priority 37.001\ shortLabel TransMap\ superTrack on\ track transMap\ transMapAlnUcscGenes TransMap UCSC psl TransMap UCSC Gene Mappings 3 37.002 0 100 0 127 177 127 0 0 0

Description

\

\ This track contains UCSC Gene alignments produced by\ the TransMap cross-species alignment algorithm\ from other vertebrate species in the UCSC Genome Browser.\ For closer evolutionary distances, the alignments are created using\ syntenically filtered BLASTZ alignment chains, resulting in a prediction of the\ orthologous genes in human.\

\ \ \ \ TransMap maps genes and related annotations in one species to another\ using synteny-filtered pairwise genome alignments (chains and nets) to\ determine the most likely orthologs. For example, for the mRNA TransMap track\ on the human assembly, more than 400,000 mRNAs from 23 vertebrate species were\ aligned at high stringency to the native assembly using BLAT. The alignments\ were then mapped to the human assembly using the chain and net alignments\ produced using blastz, which has higher sensitivity than BLAT for diverged\ organisms.\

\ Compared to translated BLAT, TransMap finds fewer paralogs and aligns more UTR\ bases. For closely related low-coverage assemblies, a reciprocal-best\ relationship is used in the chains and nets to improve the synteny prediction.\

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks.

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare cDNAs against the genomic sequence. For more \ information about this option, click \ here.\ Several types of alignment gap may also be colored; \ for more information, click \ here.\ \ \

Methods

\

\

    \
  1. Source transcript alignments were obtained from vertebrate organisms\ in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank \ mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes,\ were used as available.\
  2. For all vertebrate assemblies that had BLASTZ alignment chains and\ nets to the human (hg16) genome, a subset of the alignment chains were\ selected as follows:\ \
  3. The pslMap program was used to do a base-level projection of\ the source transcript alignments via the selected chains\ to the human genome, resulting in pairwise alignments of the source transcripts to\ the genome.\
  4. The resulting alignments were filtered with pslCDnaFilter\ with a global near-best criteria of 0.5% in finished genomes\ (human and mouse) and 1.0% in other genomes. Alignments\ where less than 20% of the transcript mapped were discarded.\
\

\ \

\ To ensure unique identifiers for each alignment, cDNA and gene accessions were\ made unique by appending a suffix for each location in the source genome and\ again for each mapped location in the destination genome. The format is:\

\
   accession.version-srcUniq.destUniq\
\ \ Where srcUniq is a number added to make each source alignment unique, and\ destUniq is added to give the subsequent TransMap alignments unique\ identifiers.\

\

\ For example, in the cow genome, there are two alignments of mRNA BC149621.1.\ These are assigned the identifiers BC149621.1-1 and BC149621.1-2.\ When these are mapped to the human genome, BC149621.1-1 maps to a single\ location and is given the identifier BC149621.1-1.1. However, BC149621.1-2\ maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note\ that multiple TransMap mappings are usually the result of tandem duplications, where both\ chains are identified as syntenic.\

\ \ \

Credits

\

\ This track was produced by Mark Diekhans at UCSC from cDNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. \ Comparative genomics search for losses of long-established genes \ on the human lineage. \ PLoS Comput Biol. 2007 Dec;3(12):e247.

\

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ Using native and syntenically mapped cDNA alignments to improve \ de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.

\

\ Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, \ Ewing B, Oommen S, Lau C et al.\ Targeted discovery of novel human exons by comparative \ genomics.\ Genome Res. 2007 Dec;17(12):1763-73.

\ \ genes 1 baseColorDefault diffCodons\ baseColorUseCds table hgFixed.transMapGeneUcscGenes\ baseColorUseSequence extFile hgFixed.transMapSeqUcscGenes hgFixed.transMapExtFileUcscGenes\ color 0,100,0\ group genes\ indelDoubleInsert on\ indelQueryInsert on\ longLabel TransMap UCSC Gene Mappings\ priority 37.002\ shortLabel TransMap UCSC\ showCdsAllScales .\ showCdsMaxZoom 10000.0\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 10000.0\ superTrack transMap pack\ track transMapAlnUcscGenes\ transMapGene hgFixed.transMapGeneUcscGenes\ transMapInfo transMapInfoUcscGenes\ transMapSrc hgFixed.transMapSrcUcscGenes\ transMapTypeDesc UCSC Gene\ type psl\ visibility pack\ transMapAlnRefSeq TransMap RefGene psl TransMap RefSeq Gene Mappings 3 37.003 0 100 0 127 177 127 0 0 0

Description

\

\ This track contains RefSeq Gene alignments produced by\ the TransMap cross-species alignment algorithm\ from other vertebrate species in the UCSC Genome Browser.\ For closer evolutionary distances, the alignments are created using\ syntenically filtered BLASTZ alignment chains, resulting in a prediction of the\ orthologous genes in human.\

\ \ \ \ TransMap maps genes and related annotations in one species to another\ using synteny-filtered pairwise genome alignments (chains and nets) to\ determine the most likely orthologs. For example, for the mRNA TransMap track\ on the human assembly, more than 400,000 mRNAs from 23 vertebrate species were\ aligned at high stringency to the native assembly using BLAT. The alignments\ were then mapped to the human assembly using the chain and net alignments\ produced using blastz, which has higher sensitivity than BLAT for diverged\ organisms.\

\ Compared to translated BLAT, TransMap finds fewer paralogs and aligns more UTR\ bases. For closely related low-coverage assemblies, a reciprocal-best\ relationship is used in the chains and nets to improve the synteny prediction.\

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks.

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare cDNAs against the genomic sequence. For more \ information about this option, click \ here.\ Several types of alignment gap may also be colored; \ for more information, click \ here.\ \ \

Methods

\

\

    \
  1. Source transcript alignments were obtained from vertebrate organisms\ in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank \ mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes,\ were used as available.\
  2. For all vertebrate assemblies that had BLASTZ alignment chains and\ nets to the human (hg16) genome, a subset of the alignment chains were\ selected as follows:\ \
  3. The pslMap program was used to do a base-level projection of\ the source transcript alignments via the selected chains\ to the human genome, resulting in pairwise alignments of the source transcripts to\ the genome.\
  4. The resulting alignments were filtered with pslCDnaFilter\ with a global near-best criteria of 0.5% in finished genomes\ (human and mouse) and 1.0% in other genomes. Alignments\ where less than 20% of the transcript mapped were discarded.\
\

\ \

\ To ensure unique identifiers for each alignment, cDNA and gene accessions were\ made unique by appending a suffix for each location in the source genome and\ again for each mapped location in the destination genome. The format is:\

\
   accession.version-srcUniq.destUniq\
\ \ Where srcUniq is a number added to make each source alignment unique, and\ destUniq is added to give the subsequent TransMap alignments unique\ identifiers.\

\

\ For example, in the cow genome, there are two alignments of mRNA BC149621.1.\ These are assigned the identifiers BC149621.1-1 and BC149621.1-2.\ When these are mapped to the human genome, BC149621.1-1 maps to a single\ location and is given the identifier BC149621.1-1.1. However, BC149621.1-2\ maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note\ that multiple TransMap mappings are usually the result of tandem duplications, where both\ chains are identified as syntenic.\

\ \ \

Credits

\

\ This track was produced by Mark Diekhans at UCSC from cDNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. \ Comparative genomics search for losses of long-established genes \ on the human lineage. \ PLoS Comput Biol. 2007 Dec;3(12):e247.

\

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ Using native and syntenically mapped cDNA alignments to improve \ de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.

\

\ Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, \ Ewing B, Oommen S, Lau C et al.\ Targeted discovery of novel human exons by comparative \ genomics.\ Genome Res. 2007 Dec;17(12):1763-73.

\ \ genes 1 baseColorDefault diffCodons\ baseColorUseCds table hgFixed.transMapGeneRefSeq\ baseColorUseSequence extFile hgFixed.transMapSeqRefSeq hgFixed.transMapExtFileRefSeq\ color 0,100,0\ group genes\ indelDoubleInsert on\ indelQueryInsert on\ longLabel TransMap RefSeq Gene Mappings\ priority 37.003\ shortLabel TransMap RefGene\ showCdsAllScales .\ showCdsMaxZoom 10000.0\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 10000.0\ superTrack transMap pack\ track transMapAlnRefSeq\ transMapGene hgFixed.transMapGeneRefSeq\ transMapInfo transMapInfoRefSeq\ transMapSrc hgFixed.transMapSrcRefSeq\ type psl\ visibility pack\ transMapAlnMRna TransMap mRNA psl TransMap GenBank mRNA Mappings 3 37.004 0 100 0 127 177 127 0 0 0

Description

\

\ This track contains GenBank mRNA alignments produced by\ the TransMap cross-species alignment algorithm\ from other vertebrate species in the UCSC Genome Browser.\ For closer evolutionary distances, the alignments are created using\ syntenically filtered BLASTZ alignment chains, resulting in a prediction of the\ orthologous genes in human.\

\ \ \ \ TransMap maps genes and related annotations in one species to another\ using synteny-filtered pairwise genome alignments (chains and nets) to\ determine the most likely orthologs. For example, for the mRNA TransMap track\ on the human assembly, more than 400,000 mRNAs from 23 vertebrate species were\ aligned at high stringency to the native assembly using BLAT. The alignments\ were then mapped to the human assembly using the chain and net alignments\ produced using blastz, which has higher sensitivity than BLAT for diverged\ organisms.\

\ Compared to translated BLAT, TransMap finds fewer paralogs and aligns more UTR\ bases. For closely related low-coverage assemblies, a reciprocal-best\ relationship is used in the chains and nets to improve the synteny prediction.\

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks.

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare cDNAs against the genomic sequence. For more \ information about this option, click \ here.\ Several types of alignment gap may also be colored; \ for more information, click \ here.\ \ \

Methods

\

\

    \
  1. Source transcript alignments were obtained from vertebrate organisms\ in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank \ mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes,\ were used as available.\
  2. For all vertebrate assemblies that had BLASTZ alignment chains and\ nets to the human (hg16) genome, a subset of the alignment chains were\ selected as follows:\ \
  3. The pslMap program was used to do a base-level projection of\ the source transcript alignments via the selected chains\ to the human genome, resulting in pairwise alignments of the source transcripts to\ the genome.\
  4. The resulting alignments were filtered with pslCDnaFilter\ with a global near-best criteria of 0.5% in finished genomes\ (human and mouse) and 1.0% in other genomes. Alignments\ where less than 20% of the transcript mapped were discarded.\
\

\ \

\ To ensure unique identifiers for each alignment, cDNA and gene accessions were\ made unique by appending a suffix for each location in the source genome and\ again for each mapped location in the destination genome. The format is:\

\
   accession.version-srcUniq.destUniq\
\ \ Where srcUniq is a number added to make each source alignment unique, and\ destUniq is added to give the subsequent TransMap alignments unique\ identifiers.\

\

\ For example, in the cow genome, there are two alignments of mRNA BC149621.1.\ These are assigned the identifiers BC149621.1-1 and BC149621.1-2.\ When these are mapped to the human genome, BC149621.1-1 maps to a single\ location and is given the identifier BC149621.1-1.1. However, BC149621.1-2\ maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note\ that multiple TransMap mappings are usually the result of tandem duplications, where both\ chains are identified as syntenic.\

\ \ \

Credits

\

\ This track was produced by Mark Diekhans at UCSC from cDNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. \ Comparative genomics search for losses of long-established genes \ on the human lineage. \ PLoS Comput Biol. 2007 Dec;3(12):e247.

\

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ Using native and syntenically mapped cDNA alignments to improve \ de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.

\

\ Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, \ Ewing B, Oommen S, Lau C et al.\ Targeted discovery of novel human exons by comparative \ genomics.\ Genome Res. 2007 Dec;17(12):1763-73.

\ \ genes 1 baseColorDefault diffCodons\ baseColorUseCds table hgFixed.transMapGeneMRna\ baseColorUseSequence extFile hgFixed.transMapSeqMRna hgFixed.transMapExtFileMRna\ color 0,100,0\ group genes\ indelDoubleInsert on\ indelQueryInsert on\ longLabel TransMap GenBank mRNA Mappings\ priority 37.004\ shortLabel TransMap mRNA\ showCdsAllScales .\ showCdsMaxZoom 10000.0\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 10000.0\ superTrack transMap pack\ track transMapAlnMRna\ transMapGene hgFixed.transMapGeneMRna\ transMapInfo transMapInfoMRna\ transMapSrc hgFixed.transMapSrcMRna\ type psl\ visibility pack\ transMapAlnSplicedEst TransMap ESTs psl TransMap Spliced EST Mappings 0 37.005 0 100 0 127 177 127 0 0 0

Description

\

\ This track contains GenBank spliced EST alignments produced by\ the TransMap cross-species alignment algorithm\ from other vertebrate species in the UCSC Genome Browser.\ For closer evolutionary distances, the alignments are created using\ syntenically filtered BLASTZ alignment chains, resulting in a prediction of the\ orthologous genes in human.\

\ \ \ \ TransMap maps genes and related annotations in one species to another\ using synteny-filtered pairwise genome alignments (chains and nets) to\ determine the most likely orthologs. For example, for the mRNA TransMap track\ on the human assembly, more than 400,000 mRNAs from 23 vertebrate species were\ aligned at high stringency to the native assembly using BLAT. The alignments\ were then mapped to the human assembly using the chain and net alignments\ produced using blastz, which has higher sensitivity than BLAT for diverged\ organisms.\

\ Compared to translated BLAT, TransMap finds fewer paralogs and aligns more UTR\ bases. For closely related low-coverage assemblies, a reciprocal-best\ relationship is used in the chains and nets to improve the synteny prediction.\

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks.

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare cDNAs against the genomic sequence. For more \ information about this option, click \ here.\ Several types of alignment gap may also be colored; \ for more information, click \ here.\ \ \

Methods

\

\

    \
  1. Source transcript alignments were obtained from vertebrate organisms\ in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank \ mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes,\ were used as available.\
  2. For all vertebrate assemblies that had BLASTZ alignment chains and\ nets to the human (hg16) genome, a subset of the alignment chains were\ selected as follows:\ \
  3. The pslMap program was used to do a base-level projection of\ the source transcript alignments via the selected chains\ to the human genome, resulting in pairwise alignments of the source transcripts to\ the genome.\
  4. The resulting alignments were filtered with pslCDnaFilter\ with a global near-best criteria of 0.5% in finished genomes\ (human and mouse) and 1.0% in other genomes. Alignments\ where less than 20% of the transcript mapped were discarded.\
\

\ \

\ To ensure unique identifiers for each alignment, cDNA and gene accessions were\ made unique by appending a suffix for each location in the source genome and\ again for each mapped location in the destination genome. The format is:\

\
   accession.version-srcUniq.destUniq\
\ \ Where srcUniq is a number added to make each source alignment unique, and\ destUniq is added to give the subsequent TransMap alignments unique\ identifiers.\

\

\ For example, in the cow genome, there are two alignments of mRNA BC149621.1.\ These are assigned the identifiers BC149621.1-1 and BC149621.1-2.\ When these are mapped to the human genome, BC149621.1-1 maps to a single\ location and is given the identifier BC149621.1-1.1. However, BC149621.1-2\ maps to two locations, resulting in BC149621.1-2.1 and BC149621.1-2.2. Note\ that multiple TransMap mappings are usually the result of tandem duplications, where both\ chains are identified as syntenic.\

\ \ \

Credits

\

\ This track was produced by Mark Diekhans at UCSC from cDNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. \ Comparative genomics search for losses of long-established genes \ on the human lineage. \ PLoS Comput Biol. 2007 Dec;3(12):e247.

\

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ Using native and syntenically mapped cDNA alignments to improve \ de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.

\

\ Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, \ Ewing B, Oommen S, Lau C et al.\ Targeted discovery of novel human exons by comparative \ genomics.\ Genome Res. 2007 Dec;17(12):1763-73.

\ \ genes 1 baseColorDefault none\ baseColorUseSequence extFile hgFixed.transMapSeqSplicedEst hgFixed.transMapExtFileSplicedEst\ color 0,100,0\ group genes\ indelDoubleInsert on\ indelQueryInsert on\ longLabel TransMap Spliced EST Mappings\ priority 37.005\ shortLabel TransMap ESTs\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 10000.0\ superTrack transMap hide\ track transMapAlnSplicedEst\ transMapInfo transMapInfoSplicedEst\ transMapSrc hgFixed.transMapSrcSplicedEst\ type psl\ visibility hide\ vegaGene Vega Genes genePred vegaPep Vega Annotations 0 37.1 0 100 180 127 177 217 0 0 14 chr1,chr16,chr18,chr19,chr6,chr7,chr9,chr10,chr13,chr14,chr20,chr22,chrX,chrY, http://vega.sanger.ac.uk/Homo_sapiens/geneview?transcript=$$

Description and Methods

\

\ This track shows gene annotations from the Vertebrate Genome Annotation (Vega)\ database.

\

\ The following information is excerpted from the\ Vertebrate Genome Annotation\ home page:

\

\ "The Vega database\ is designed to be a central repository for high-quality, frequently updated\ manual annotation of different vertebrate finished genome sequence.\ Vega attempts to present consistent high-quality curation of the published\ chromosome sequences. Finished genomic sequence is analysed on a\ clone-by-clone basis using\ a combination of similarity searches against DNA and protein databases\ as well as a series of ab initio gene predictions (GENSCAN, Fgenes).\ The annotation is based on supporting evidence only."

\

\ "In addition, comparative analysis using vertebrate datasets such as\ the Riken mouse cDNAs and Genoscope Tetraodon nigroviridis Ecores\ (Evolutionary Conserved Regions) are used for novel gene discovery."

\

\ NOTE: VEGA annotations do not appear on every chromosome in this assembly.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for\ gene prediction\ tracks using the following color scheme to indicate the status of the gene\ annotation:\

\

\ The details pages show the only the Vega gene type and not the transcript type.\ A single gene can have more than one transcript which can belong to\ different classes, so the gene as a whole is classified according to the\ transcript with the "highest" level of classification. Transcript\ type (and other details) may be found by clicking on the transcript\ identifier which forms the outside link to the Vega transcript details page.\ Further information on the gene and transcript classification may be found\ here.\

\ \

Credits

\

\ Thanks to Steve Searle at the\ Sanger Institute \ for providing the GTF and FASTA files for the Vega annotations. Vega gene annotations are \ generated by manual annotation from the following groups:\

\ Chromosome 6:\ \ The HAVANA group, \ \ Wellcome Trust Sanger Institute
\ \ Relevant publication: Mungall AJ et al.,\ The DNA sequence and analysis of human \ \ chromosome 6. \ Nature. 2003 Oct 23;425:805-11.

\

\ Chromosome 7:\ \ Hillier et al., \ \ The Genome Center at Washington University
\ \ Relevant publication: Hillier LW et al., \ The DNA sequence of human \ \ chromosome 7. \ Nature. 2003 Jul 10;424:157-64.

\

\ Chromosome 9:\ \ The HAVANA group, \ \ Wellcome Trust Sanger Institute
\ Relevant publication: Humphray SJ et al., \ The DNA sequence and analysis of human chromosome 9. \ Nature. 2004 May 27;429;369-74.

\

\ Chromosome 10:\ \ The HAVANA group, \ \ Wellcome Trust Sanger Institute
\ \ Relevant publication: Deloukas P et al., \ The DNA sequence and comparative analysis of human chromosome 10. \ Nature. 2004 May 27;429:375-81.

\

\ Chromosome 13:\ \ The HAVANA group, \ \ Wellcome Trust Sanger Institute
\ \ Relevant publication: Dunham A et al., \ The DNA sequence and analysis of human chromosome 13. \ Nature. 2001 Apr 1;428:522-8.

\

\ Chromosome 14: \ \ \ \ Genoscope
\ \ Relevant publication: Heilig R et al., \ The DNA sequence and analysis of \ \ human chromosome 14. \ Nature. 2003 Feb 6;421:601-7.

\

\ Chromosome 20: \ \ The HAVANA Group, \ \ Wellcome Trust Sanger Institute
\ \ Relevant publication: Deloukas P et al., \ The DNA sequence and \ \ comparative analysis of human chromosome 20. \ Nature. 2001 Dec 20;414:865-71.

\

\ Chromosome 22: Chromosome 22 Group,\ \ \ \ Wellcome Trust Sanger Institute
\ \ Relevant publications:
\ \ — Collins JE et al., \ Reevaluating Human Gene Annotation: \ \ A Second-Generation Analysis of Chromosome 22. \ Genome Research. 2003 Jan;13(1):27-36.
\ \ — Dawson E et al., \ A \ \ first-generation linkage disequilibrium map of human chromosome 22. \ Nature. 2002 Aug 1;418:544-8.
\ \ — Dunham I, et al., \ The DNA sequence of human chromosome 22. \ Nature. 1999 Dec 2;402:489-95.

\

\ Chromosome X: \ \ The HAVANA Group, \ \ Wellcome Trust Sanger Institute
\ \ Relevant publication: Ross MT et al., \ The DNA sequence and \ \ comparative analysis of human chromosome X. \ Nature 2005 Mar 17;434:325-37.

\ genes 1 chromosomes chr1,chr16,chr18,chr19,chr6,chr7,chr9,chr10,chr13,chr14,chr20,chr22,chrX,chrY\ color 0,100,180\ group genes\ longLabel Vega Annotations\ priority 37.1\ shortLabel Vega Genes\ track vegaGene\ type genePred vegaPep\ url http://vega.sanger.ac.uk/Homo_sapiens/geneview?transcript=$$\ visibility hide\ vegaGene2 Vega Genes2 genePred vegaPep Vega Annotations from Sanger, Genoscope 0 37.1 0 100 180 127 177 217 0 0 0 http://vega.sanger.ac.uk/Homo_sapiens/geneview?transcript=$$ genes 1 color 0,100,180\ group genes\ longLabel Vega Annotations from Sanger, Genoscope\ priority 37.1\ shortLabel Vega Genes2\ track vegaGene2\ type genePred vegaPep\ url http://vega.sanger.ac.uk/Homo_sapiens/geneview?transcript=$$\ visibility hide\ vegaPseudoGene Vega Pseudogenes genePred Vega Annotated Pseudogenes and Immunoglobulin Segments 0 37.11 30 130 210 142 192 232 0 0 13 chr1,chr16,chr18,chr6,chr7,chr9,chr10,chr13,chr14,chr20,chr22,chrX,chrY, http://vega.sanger.ac.uk/Homo_sapiens/geneview?transcript=$$

Description and Methods

\

\ This track shows pseudogene annotations from the Vertebrate Genome Annotation \ (Vega) database.

\

\ The following information is excerpted from the\ Vertebrate Genome Annotation\ home page:

\

\ "The Vega database\ is designed to be a central repository for high-quality, frequently updated\ manual annotation of different vertebrate finished genome sequence.\ Vega attempts to present consistent high-quality curation of the published\ chromosome sequences. Finished genomic sequence is analysed on a\ clone-by-clone basis using\ a combination of similarity searches against DNA and protein databases\ as well as a series of ab initio gene predictions (GENSCAN, Fgenes).\ The annotation is based on supporting evidence only."

\

\ "In addition, comparative analysis using vertebrate datasets such as\ the Riken mouse cDNAs and Genoscope Tetraodon nigroviridis Ecores\ (Evolutionary Conserved Regions) are used for novel gene discovery."

\

\ NOTE: VEGA annotations do not appear on every chromosome in this assembly.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for\ gene prediction\ tracks using the following color scheme to indicate the status of the gene\ annotation:\

\

\ The details pages show the only the Vega gene type and not the transcript type.\ A single gene can have more than one transcript which can belong to\ different classes, so the gene as a whole is classified according to the\ transcript with the "highest" level of classification. Transcript\ type (and other details) may be found by clicking on the transcript\ identifier which forms the outside link to the Vega transcript details page.\ Further information on the gene and transcript classification may be found\ here.\

\ \

Credits

\

\ Thanks to Steve Searle at the\ Sanger Institute \ for providing the GTF and FASTA files for the Vega annotations. Vega gene annotations are \ generated by manual annotation from the following groups:\

\ Chromosome 6:\ \ The HAVANA group, \ \ Wellcome Trust Sanger Institute
\ \ Relevant publication: Mungall AJ et al.,\ The DNA sequence and analysis of human \ \ chromosome 6. \ Nature. 2003 Oct 23;425:805-11.

\

\ Chromosome 7:\ \ Hillier et al., \ \ The Genome Institute at Washington University
\ \ Relevant publication: Hillier LW et al., \ The DNA sequence of human \ \ chromosome 7. \ Nature. 2003 Jul 10;424:157-64.

\

\ Chromosome 9:\ \ The HAVANA group, \ \ Wellcome Trust Sanger Institute
\ Relevant publication: Humphray SJ et al., \ The DNA sequence and analysis of human chromosome 9. \ Nature. 2004 May 27;429;369-74.

\

\ Chromosome 10:\ \ The HAVANA group, \ \ Wellcome Trust Sanger Institute
\ \ Relevant publication: Deloukas P et al., \ The DNA sequence and comparative analysis of human chromosome 10. \ Nature. 2004 May 27;429:375-81.

\

\ Chromosome 13:\ \ The HAVANA group, \ \ Wellcome Trust Sanger Institute
\ \ Relevant publication: Dunham A et al., \ The DNA sequence and analysis of human chromosome 13. \ Nature. 2001 Apr 1;428:522-8.

\

\ Chromosome 14: \ \ \ \ Genoscope
\ \ Relevant publication: Heilig R et al., \ The DNA sequence and analysis of \ \ human chromosome 14. \ Nature. 2003 Feb 6;421:601-7.

\

\ Chromosome 20: \ \ The HAVANA Group, \ \ Wellcome Trust Sanger Institute
\ \ Relevant publication: Deloukas P et al., \ The DNA sequence and \ \ comparative analysis of human chromosome 20. \ Nature. 2001 Dec 20;414:865-71.

\

\ Chromosome 22: Chromosome 22 Group,\ \ \ \ Wellcome Trust Sanger Institute
\ \ Relevant publications:
\ \ — Collins JE et al., \ Reevaluating Human Gene Annotation: \ \ A Second-Generation Analysis of Chromosome 22. \ Genome Research. 2003 Jan;13(1):27-36.
\ \ — Dawson E et al., \ A \ \ first-generation linkage disequilibrium map of human chromosome 22. \ Nature. 2002 Aug 1;418:544-8.
\ \ — Dunham I, et al., \ The DNA sequence of human chromosome 22. \ Nature. 1999 Dec 2;402:489-95.

\

\ Chromosome X: \ \ The HAVANA Group, \ \ Wellcome Trust Sanger Institute
\ \ Relevant publication: Ross MT et al., \ The DNA sequence and \ \ comparative analysis of human chromosome X. \ Nature 2005 Mar 17;434:325-37.

\ genes 1 chromosomes chr1,chr16,chr18,chr6,chr7,chr9,chr10,chr13,chr14,chr20,chr22,chrX,chrY,\ color 30,130,210\ group genes\ longLabel Vega Annotated Pseudogenes and Immunoglobulin Segments\ priority 37.11\ shortLabel Vega Pseudogenes\ track vegaPseudoGene\ type genePred\ url http://vega.sanger.ac.uk/Homo_sapiens/geneview?transcript=$$\ visibility hide\ sanger20 Sanger 20 genePred Sanger Institute Chromosome 20 Genes 3 37.2 0 100 180 127 177 217 0 0 1 chr20,

Description

\

\ This track shows sequence annotation curated at \ The Welcome Trust Sanger \ Institute.

\

\ Over 10% of the human genome, including two complete chromosomes — \ 20\ and 22 —\ have been annotated by the sequence annotation team in collaboration with \ the individual \ chromosome \ project teams.

\

\ NOTE: Sanger20 annotations appear only on chromosome 20.

\ \

Methods

\

\ Finished genomic sequence is analyzed on a clone by clone basis using a\ combination of similarity searches against DNA and protein databases as\ well as a series of ab initio gene predictions (Genscan, Fgenesh). \ Gene structures are annotated on the basis of human interpretation of the\ combined supportive evidence generated during sequence analysis. In\ parallel, experimental methods are applied to extend incomplete\ gene structures and discover new genes. The latter is initiated by\ comparative analysis of the finished sequence with vertebrate datasets\ such as the Riken mouse cDNAs, mouse whole-genome shotgun data and\ GenescopeTetraodon Ecores.

\ \

Credits

\

\ Thanks to the Sanger Institute for providing this data set. Email inquiries\ may be sent to humquery@sanger.ac.uk.

\ \ genes 1 chromosomes chr20,\ color 0,100,180\ group genes\ longLabel Sanger Institute Chromosome 20 Genes\ priority 37.2\ shortLabel Sanger 20\ track sanger20\ type genePred\ visibility pack\ sanger22 Sanger 22 genePred Sanger Institute Chromosome 22 Genes 3 37.3 0 100 180 127 177 217 0 0 1 chr22,

Description

\

\ This track contains curated annotations of chromosome 22 produced by the\ Chromosome \ 22 Group at the Sanger Institute. They are described in the \ paper Collins, J.E., et al. \ Reevaluating human gene annotation: a second generation \ analysis of human chromosome 22. Genome Res. 13(1), 27-36\ (2003).

\

\ Over 10% of the human genome, including two complete\ chromosomes — 20\ and 22 —\ have been annotated by the Sanger Institute Sequence\ Annotation Team in collaboration with the individual \ chromosome \ project teams. \

\ NOTE: Sanger22 annotations appear only on chromosome 22 in the Genome Browser.\

\ \

Methods

\

\ Finished genomic sequence is analyzed on a clone by clone basis using a\ combination of similarity searches against DNA and protein databases as\ well as a series of ab initio gene predictions (Genscan, Fgenesh). \ Gene structures are annotated on the basis of human interpretation of the\ combined supportive evidence generated during sequence analysis. In\ parallel, experimental methods are applied to extend incomplete\ gene structures and discover new genes. The latter is initiated by\ comparative analysis of the finished sequence with vertebrate datasets\ such as the Riken mouse cDNAs, mouse whole-genome shotgun data and\ GenescopeTetraodon Ecores.

\ \

Credits

\

\ These annotations were obtained from the\ Internet at http://www.sanger.ac.uk/HGP/Chr22. \ Thanks to the Sanger Institute for providing this data set. Email inquiries\ may be sent to humquery@sanger.ac.uk.\ \ genes 1 chromosomes chr22,\ color 0,100,180\ group genes\ longLabel Sanger Institute Chromosome 22 Genes\ priority 37.3\ shortLabel Sanger 22\ track sanger22\ type genePred\ visibility pack\ sanger22pseudo Sanger 22 Pseudo genePred Sanger Center Chromosome 22 Pseudogenes 0 37.4 30 130 210 142 192 232 0 0 1 chr22,

Description

\

\ This track contains curated annotations of chromosome 22 produced by the\ Chromosome \ 22 Group at the Sanger Institute. They are described in the \ paper Collins, J.E. et al.\ Reevaluating human gene annotation: a second generation \ analysis of human chromosome 22. Genome Res. 13(1),\ 27-36 (2003).

\

\ Over 10% of the human genome, including two complete\ chromosomes — 20 and 22 — have been annotated by the Sanger Institute Sequence\ Annotation Team in collaboration with the individual \ chromosome \ project teams.

\

\ NOTE: Sanger22 annotations appear only on chromosome 22 in the Genome Browser.\

\ \

Methods

\

\ Finished genomic sequence is analyzed on a clone by clone basis using a\ combination of similarity searches against DNA and protein databases as\ well as a series of ab initio gene predictions (Genscan, Fgenesh). \ Gene structures are annotated on the basis of human interpretation of the\ combined supportive evidence generated during sequence analysis. In\ parallel, experimental methods are applied to extend incomplete\ gene structures and discover new genes. The latter is initiated by\ comparative analysis of the finished sequence with vertebrate datasets\ such as the Riken mouse cDNAs, mouse whole-genome shotgun data and\ GenescopeTetraodon Ecores.

\ \

Credits

\

\ These annotations were obtained from the\ Internet at http://www.sanger.ac.uk/HGP/Chr22. \ Thanks to the Sanger Institute for providing this data set. Email inquiries\ may be sent to humquery@sanger.ac.uk.

\ \ genes 1 chromosomes chr22,\ color 30,130,210\ group genes\ longLabel Sanger Center Chromosome 22 Pseudogenes\ priority 37.4\ shortLabel Sanger 22 Pseudo\ track sanger22pseudo\ type genePred\ visibility hide\ encodeAffyChIpHl60SitesH4Kac4Hr08 Affy H4Kac4 RA 8h bed 3 . Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 8hrs) Sites 0 38 125 100 0 190 177 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 125,100,0\ longLabel Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 8hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 38\ shortLabel Affy H4Kac4 RA 8h\ subGroups factor=H4Kac4 time=8h\ track encodeAffyChIpHl60SitesH4Kac4Hr08\ encodeAffyRnaHl60SitesHr08IntergenicDistal Affy Ig Dst HL60 8h bed 4 . Affymetrix Intergenic Distal HL60 Retinoic 8hr Transfrags 0 38 80 0 175 167 127 215 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 80,0,175\ longLabel Affymetrix Intergenic Distal HL60 Retinoic 8hr Transfrags\ parent encodeNoncodingTransFrags\ priority 38\ shortLabel Affy Ig Dst HL60 8h\ subGroups region=intergenicDistal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr08IntergenicDistal\ reconTransMap Recon TransMap Reconstruction TransMap Alignments 0 38.001 0 0 0 127 127 127 0 0 0

Description

\

\ TransMap cross-species alignment tracks for debugging the ancestral\ reconstruction.\

\ genes 0 group genes\ longLabel Reconstruction TransMap Alignments\ priority 38.001\ shortLabel Recon TransMap\ superTrack on\ track reconTransMap\ reconTransMapAlnRefSeq Recon TransMap RefGene psl Reconstruction TransMap RefSeq Gene Mappings 3 38.003 0 100 0 127 177 127 0 0 0

Description

\

\ TransMap cross-species alignment tracks of RefSeq mRNA alignments for\ debugging the ancestral reconstruction.\

\ genes 1 baseColorDefault diffCodons\ baseColorUseCds table hgFixed.reconTransMapGeneRefSeq\ baseColorUseSequence extFile hgFixed.reconTransMapSeqRefSeq hgFixed.reconTransMapExtFileRefSeq\ color 0,100,0\ group genes\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Reconstruction TransMap RefSeq Gene Mappings\ priority 38.003\ shortLabel Recon TransMap RefGene\ showCdsAllScales .\ showCdsMaxZoom 10000.0\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 10000.0\ superTrack reconTransMap pack\ track reconTransMapAlnRefSeq\ transMapGene hgFixed.reconTransMapGeneRefSeq\ transMapInfo reconTransMapInfoRefSeq\ transMapSrc hgFixed.reconTransMapSrcRefSeq\ type psl\ visibility pack\ encodeAffyChIpHl60PvalH4Kac4Hr32 Affy H4Kac4 RA 32h wig 0.0 534.54 Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 32hrs) P-Value 0 39 125 100 0 190 177 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 125,100,0\ longLabel Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 32hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 39\ shortLabel Affy H4Kac4 RA 32h\ subGroups factor=H4Kac4 time=32h\ track encodeAffyChIpHl60PvalH4Kac4Hr32\ encodeAffyRnaHl60SitesHr32IntergenicDistal Affy Ig Dst HL60 32h bed 4 . Affymetrix Intergenic Distal HL60 Retinoic 32hr Transfrags 0 39 105 0 150 180 127 202 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 105,0,150\ longLabel Affymetrix Intergenic Distal HL60 Retinoic 32hr Transfrags\ parent encodeNoncodingTransFrags\ priority 39\ shortLabel Affy Ig Dst HL60 32h\ subGroups region=intergenicDistal celltype=hl60 source=affy\ track encodeAffyRnaHl60SitesHr32IntergenicDistal\ genieAlt AltGenie genePred genieAltPep Genie Gene Predictions from Affymetrix 1 39 125 0 150 190 127 202 0 0 0

Description

\

Genie predictions are based on \ Affymetrix's \ Genie gene finding software. Genie is a generalized HMM \ which accepts constraints based on mRNA and EST data.

\ genes 1 color 125,0,150\ group genes\ longLabel Genie Gene Predictions from Affymetrix\ priority 39\ shortLabel AltGenie\ track genieAlt\ type genePred genieAltPep\ visibility dense\ encodeAffyChIpHl60SitesH4Kac4Hr32 Affy H4Kac4 RA 32h bed 3 . Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 32hrs) Sites 0 40 125 100 0 190 177 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 125,100,0\ longLabel Affymetrix ChIP/Chip (H4Kac4 retinoic acid-treated HL-60, 32hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 40\ shortLabel Affy H4Kac4 RA 32h\ subGroups factor=H4Kac4 time=32h\ track encodeAffyChIpHl60SitesH4Kac4Hr32\ ensGene Ensembl Genes genePred ensPep Ensembl Genes 0 40 150 0 0 202 127 127 0 0 0

Description

\

\ These gene predictions were generated by Ensembl.

\ \

Methods

\

\ For a description of the methods used in Ensembl gene prediction, refer to \ Hubbard, T. et al. (2002) in the References section below.

\ \

Credits

\

\ Thanks to Ensembl for providing this annotation.\ A description of the process \ by which it was produced can be found on the Ensembl site.\

\ \

References

\

\ Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J,\ Curwen V, Down T, et al. \ The Ensembl genome database project.\ Nucleic Acids Res. 2002 Jan 1;30(1):38-41.

\ \ genes 1 color 150,0,0\ group genes\ longLabel Ensembl Genes\ priority 40\ shortLabel Ensembl Genes\ track ensGene\ type genePred ensPep\ visibility hide\ encodeYaleAffyNB4RARNATarsIntergenicDistal Yale Ig Dst NB4 RA bed 4 . Yale Intergenic Distal NB4 Retinoic TARs 0 40 130 0 125 192 127 190 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 130,0,125\ longLabel Yale Intergenic Distal NB4 Retinoic TARs\ parent encodeNoncodingTransFrags\ priority 40\ shortLabel Yale Ig Dst NB4 RA\ subGroups region=intergenicDistal celltype=nb4 source=yale\ track encodeYaleAffyNB4RARNATarsIntergenicDistal\ ensEstGene Ensembl EST Genes genePred ensEstPep Ensembl EST Gene Predictions 0 40.5 150 0 0 202 127 127 0 0 0 http://www.ensembl.org/perl/geneview?db=estgene&transcript=$$

Description

\

\ Gene predictions from Ensembl based on ESTs.

\ \

Methods

\

\ ESTs were mapped onto the genome using a combination of Exonerate, Blast \ and Est_Genome, with a threshold defined as an overall percentage identity \ of 90% and at least one exon having a percentage identity of 97% or higher. \ The results were processed by merging the redundant ESTs and setting \ splice sites to the most common ends, resulting in alternative spliced \ forms. This evidence was processed by Genomewise, which finds the longest \ ORF and assigns 5' and 3' UTRs.

\ \

Track Configuration

\

\ This track has an optional codon coloring feature that allows users to \ quickly validate and compare gene predictions. To display codon colors, \ select the genomic codons option from the Color track by \ codons pull-down menu at the top of the track description page.\ This page is accessed via the small button to the left of the track's\ graphical display or through the link on the track's control menu. Click \ here for more information about this feature.\

\

\ After you have made your configuration selections, click the \ Submit button to return to the tracks display page.

\ \

Credits

\

\ Thanks to Ensembl \ for providing this annotation.

\ \ genes 1 color 150,0,0\ group genes\ longLabel Ensembl EST Gene Predictions\ priority 40.5\ shortLabel Ensembl EST Genes\ track ensEstGene\ type genePred ensEstPep\ url http://www.ensembl.org/perl/geneview?db=estgene&transcript=$$\ visibility hide\ acembly Acembly Genes genePred acemblyPep acemblyMrna AceView Gene Models With Alt-Splicing 1 41 155 0 125 205 127 190 0 0 0 http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=hg16&l=$$

Description

\

\ This track shows AceView gene models constructed from\ mRNA, EST and genomic evidence by Danielle and Jean Thierry-Mieg\ and Vahan Simonyan using the \ Acembly program.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks. Gene models that fall into the "main" prediction class\ are displayed in purple; "putative" \ genes are displayed in pink.

\

\ The track description page offers the following filter and configuration\ options:\

\

\ \

Methods

\

\ AceView attempts to find the best alignment of each mRNA/EST against the\ genome, and clusters the alignments into the least possible number of\ alternatively spliced transcripts. The reconstructed transcripts are then\ clustered into genes by simple transitive contact. To see the evidence that \ supports each transcript, click the "Outside Link" on an individual \ transcript's details page to access the NCBI AceView web site.

\

\ Each AceView transcript model has a gene cluster designation\ (alternate name) that is categorized into a prediction class\ of either main or \ putative.

\

\ Prediction Class: main \
Class of genes that includes the protein coding genes (defined\ here by CDS > 100 amino acids) and all genes with at least one\ well-defined standard intron, i.e., an intron with a GT-AG or GC-AG\ boundary, supported by at least one clone matching exactly, with\ no ambiguous bases, and the 8 bases on either side of the intron \ identical to the genome. Genes with a CDS smaller than 100 amino acids are\ included in this class if they meet one of the following conditions: they \ have a NCBI RefSeq sequence (NM_#) or an OMIM identifier, or they encode a \ protein with BlastP homology (< 1e-3) to a cDNA-supported nematode AceView \ protein.

\

\ Prediction Class: putative\
Class of genes that have no standard intron and do not\ encode CDS of more than 100 amino acids, yet may be sufficiently useful to \ justify not disregarding them completely. Putative genes may be of two\ types: either those supported by more than six cDNA clones or those that\ encode a putative protein with an interesting annotation. Examples include\ a PFAM motif, a BlastP hit to a species other than itself (< 1e-3), \ a transmembrane domain or other rare and meaningful domains\ identified by Psort2, or a highly probable localization in a cell\ compartment (excluding cytoplasm and nucleus).

\ \

Credits

\

\ Thanks to Danielle and Jean \ Thierry-Mieg at NIH for providing this track.

\ \

References

\

\ Thierry-Mieg D, Thierry-Mieg J. \ AceView: a comprehensive cDNA-supported gene and transcripts \ annotation.\ Genome Biol. 2006;7 Suppl 1:S12.1-14.

\ genes 1 color 155,0,125\ group genes\ longLabel AceView Gene Models With Alt-Splicing\ priority 41\ shortLabel Acembly Genes\ track acembly\ type genePred acemblyPep acemblyMrna\ url http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=hg16&l=$$\ urlLabel AceView Gene Summary:\ visibility dense\ encodeAffyEncode25bpProbes Affy 25bp Probes bed 4 . Affymetrix 25 bp Probe Locations 0 41 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the locations of the 25 bp probes on the Affymetrix ENCODE \ tiling array, which are spaced every 22 bp on average. The \ chip, which was produced using Affymetrix GeneChip technology, has 737,680 \ probes representing all the non-repetitive DNA sequence of the ENCODE \ regions. This chip was designed for high throughput experiments to explore the \ human transcriptome at high resolution. The probes represent all the \ transcribed regions, including mRNAs as well as non-coding RNAs that are used \ both structurally and in the regulation of gene expression. Disruption of these\ structures or changes in the levels of transcription or translation may play a \ role in disease pathogenesis; therefore, this array is a valuable tool for the \ discovery and elucidation of disease processes.

\ \

Display Conventions and Configuration

\

\ Probe locations are indicated by solid blocks in the graphical display.

\ \

Methods

\

\ Probe positions were provided by Affymetrix, and the sequence was verified upon\ mapping to the genome. The array can be utilized to study transcribed\ regions (see Affy RNA Signal and Affy Transfrags tracks), transcription factor \ binding sites (Affy pVal and Affy Sites tracks), sites of chromatin \ modification, sites for DNA methylation and chromosomal origins of replication.\

\ \

Credits

\

\ This chip was generated and analyzed by the Gingeras/Struhl\ collaboration with the Tom Gingeras group at \ Affymetrix and the \ Kevin Struhl group at Harvard Medical School.

\ \

References

\

\ Please see the \ Affymetrix Transcriptome site for a project overview and\ additional references to Affymetrix tiling array publications.

\

\ Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. \ A comparison of normalization methods for high density \ oligonucleotide array data based on variance and bias. \ Bioinformatics 19(2), 185-193 (2003).

\

\ Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger,\ E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J.,\ Williams, A. J., et al. \ Unbiased mapping of transcription factor binding sites along \ human chromosomes 21 and 22 points to widespread regulation of noncoding \ RNAs. \ Cell 116(4), 499-509 (2004).

\

\ Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg,\ R. L., Fodor, S. P., and Gingeras, T. R. \ Large-scale transcriptional activity in chromosomes 21 and \ 22. \ Science 296(5569), 916-919 (2002).

\ encodeTxLevels 1 dataVersion ENCODE June 2005 Freeze\ group encodeTxLevels\ longLabel Affymetrix 25 bp Probe Locations\ origAssembly hg16\ priority 41\ shortLabel Affy 25bp Probes\ track encodeAffyEncode25bpProbes\ type bed 4 .\ visibility hide\ encodeAffyChIpHl60PvalP300Hr00 Affy P300 RA 0h wig 0.0 534.54 Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 0hrs) P-Value 0 41 100 125 0 177 190 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 100,125,0\ longLabel Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 0hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 41\ shortLabel Affy P300 RA 0h\ subGroups factor=P300 time=0h\ track encodeAffyChIpHl60PvalP300Hr00\ encodeYaleAffyNB4TPARNATarsIntergenicDistal Yale Ig Dst NB4 TPA bed 4 . Yale Intergenic Distal TPA-Treated NB4 TARs 0 41 155 0 100 205 127 177 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 155,0,100\ longLabel Yale Intergenic Distal TPA-Treated NB4 TARs\ parent encodeNoncodingTransFrags\ priority 41\ shortLabel Yale Ig Dst NB4 TPA\ subGroups region=intergenicDistal celltype=nb4 source=yale\ track encodeYaleAffyNB4TPARNATarsIntergenicDistal\ sibGene SIB Genes genePred Swiss Institute of Bioinformatics Gene Predictions from mRNA and ESTs 0 41.4 195 90 0 225 172 127 0 0 0 http://www.isrec.isb-sib.ch/cgi-bin/tromer/tromer_quick_search.pl?query_str=$$

Description

\

\ The SIB Genes track shows gene predictions based on data\ from RefSeq and EMBL/GenBank. This is transcript-based set of\ predictions. Genes all have the support of at least one GenBank full length RNA sequence,\ one RefSeq RNA, or one spliced EST. The track includes both protein-coding\ and non-coding transcripts. The coding regions are predicted using\ ESTScan.

\ \

Display Conventions and Configuration

\

\ This track in general follows the display conventions for\ gene prediction\ tracks. The exons for putative noncoding genes and untranslated regions \ are represented by relatively thin blocks, while those for coding open \ reading frames are thicker.

\

\ This track contains an optional codon coloring\ feature that allows users to quickly validate and compare gene predictions.\ To display codon colors, select the genomic codons option from the\ Color track by codons pull-down menu. Click\ here for more\ information about this feature.

\

Further information on the predicted transcripts can be found on the\ Transcriptome Web\ interface.

\ \ \

Methods

\

\ The SIB Genes are built using a multi-step pipeline: \

    \
  1. RefSeq and GenBank RNAs and ESTs are aligned to the genome with\ SIBsim4, keeping \ only the best alignments for each RNA.\
  2. Alignments are broken up at non-intronic gaps, with small isolated \ fragments thrown out.\
  3. A splicing graph is created for each set of overlapping alignments. This\ graph has an edge for each exon or intron, and a vertex for each splice site,\ start, and end. Each RNA that contributes to an edge is kept as evidence for\ that edge.\
  4. The graph is traversed to generate all unique transcripts. The traversal is \ guided by the initial RNAs to avoid a combinatorial explosion in alternative \ splicing.\
  5. Protein predictions are generated.\
\ \

Credits

\

\ The SIB Genes track was produced on the Vital-IT high-performance \ computing platform\ using a computational pipeline developed by Christian Iseli with help from\ colleagues at the Ludwig institute \ for Cancer\ Research and the Swiss Institute \ of Bioinformatics. It is based on data from NCBI RefSeq and GenBank/EMBL. Our\ thanks to the people running these databases and to the scientists worldwide\ who have made contributions to them.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J,\ Wheeler DL. \ GenBank: update. \ Nucleic Acids Res. 2004 Jan 1;32:D23-6.

\ genes 1 color 195,90,0\ group genes\ longLabel Swiss Institute of Bioinformatics Gene Predictions from mRNA and ESTs\ priority 41.4\ shortLabel SIB Genes\ track sibGene\ type genePred\ url http://www.isrec.isb-sib.ch/cgi-bin/tromer/tromer_quick_search.pl?query_str=$$\ urlLabel SIB link:\ visibility hide\ ECgene ECgene Genes genePred ECgenePep ECgene Gene Predictions with Alt-Splicing 0 41.5 155 0 125 205 127 190 0 0 0

Description

\

\ ECgene (gene prediction by EST clustering) predicts genes by combining \ genome-based EST clustering and transcript \ assembly methods. The EST clustering is based on genomic alignment of mRNA \ and ESTs similar to that of NCBI's UniGene for the human genome. The \ transcript assembly procedure yields gene models for each cluster that \ include alternative splicing variants. This algorithm was developed by Prof. \ Sanghyuk Lee's Lab of Bioinformatics at Ewha Womans University in Seoul, \ Korea.

\

\ For more detailed information, see the \ ECgene website.\

\ \

Display Conventions

\

\ This track follows the display conventions for \ gene prediction \ tracks.

\ \

Methods

\ The following is a brief summary of the ECgene algorithm: \
    \
  1. \ Genomic alignment of mRNA and ESTs: Input sequences are aligned against the \ genome using the Blat program developed by Jim Kent. Blat alignments are corrected for \ valid splice sites, and the SIM4 program is used for suspicious alignments if necessary.\
  2. \ Sequences that share more than one splice site are clustered together. This produces the \ primary clusters without unspliced sequences (singletons).\
  3. \ The genomic alignment of exons in each spliced sequence is represented as a directed \ acyclic graph (DAG), and all possible gene models are derived by the depth-first-search \ (DFS) method.\
  4. \ Sequences compatible with each gene model are grouped together as sub-clusters. Gene \ models without sufficient evidence are discarded at this stage. Sensitive detection of \ polyA tails is achieved by analyzing genomic alignment of mRNA and EST sequences,\ and specifically used to determine the gene boundary.\
  5. \ Finally, unspliced sequences are added so as not to change the splice sites of the \ existing gene model.\
\ \

Credits

\

\ The predictions for this track were produced by Namshin Kim and Sanghyuk Lee \ at Ewha Womans Univeristy, Seoul, KOREA.\ genes 1 color 155,0,125\ group genes\ longLabel ECgene Gene Predictions with Alt-Splicing\ priority 41.5\ shortLabel ECgene Genes\ track ECgene\ type genePred ECgenePep\ visibility hide\ encodeAffyChIpHl60SitesP300Hr00 Affy P300 RA 0h bed 3 . Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 0hrs) Sites 0 42 100 125 0 177 190 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 100,125,0\ longLabel Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 0hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 42\ shortLabel Affy P300 RA 0h\ subGroups factor=P300 time=0h\ track encodeAffyChIpHl60SitesP300Hr00\ ensEst Ensembl ESTs genePred ensEstPep Human ESTs From Ensembl 0 42 175 20 125 215 137 190 0 0 0

Description

\

\ Gene predictions from Ensembl based on expressed sequence tags (ESTs).

\ \

Methods

\

\ For a description of the methods used, refer to \ Hubbard, T. et al. (2002) in the References section below.

\ \

Track Configuration

\

\ This track has an optional codon coloring feature that allows users to \ quickly validate and compare gene predictions. To display codon colors, \ select the genomic codons option from the Color track by \ codons pull-down menu at the top of the track description page.\ This page is accessed via the small button to the left of the track's\ graphical display or through the link on the track's control menu. Click \ here for more information about this feature.

\

\ After you have made your configuration selections, click the \ Submit button to return to the tracks display page.

\ \

Credits

\

\ Thanks to Ensembl \ for providing this annotation.

\ \

References

\

\ Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J,\ Curwen V, Down T et al..\ The Ensembl genome database project.\ Nucleic Acids Research. 2002 Jan 1;30(1):38-41.

\ genes 1 color 175,20,125\ group genes\ longLabel $Organism ESTs From Ensembl\ priority 42\ shortLabel Ensembl ESTs\ track ensEst\ type genePred ensEstPep\ visibility hide\ encodeYaleAffyNB4UntrRNATarsIntergenicDistal Yale Ig Dst NB4 Un bed 4 . Yale Intergenic Distal Untreated NB4 TARs 0 42 180 0 75 217 127 165 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 180,0,75\ longLabel Yale Intergenic Distal Untreated NB4 TARs\ parent encodeNoncodingTransFrags\ priority 42\ shortLabel Yale Ig Dst NB4 Un\ subGroups region=intergenicDistal celltype=nb4 source=yale\ track encodeYaleAffyNB4UntrRNATarsIntergenicDistal\ encodeAffyChipSuper Affy ChIP Affymetrix ChIP-chip 0 43 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Overview

\

\ This super-track combines related tracks of ChIP-chip data\ generated by the Affymetrix/Harvard ENCODE collaboration.\ ChIP-chip, also known as genome-wide location analysis, is a technique for\ isolation and identification of DNA sequences bound by specific proteins in\ cells.\

\

\ These tracks contain ChIP-chip data of multiple transcription\ factors, RNA polymerase II and histones, in multiple cell lines,\ including HL-60 (leukemia) and ME-180 (cervical carcinoma),\ and at different time points after drug cell treatment.\ Binding was assayed on Affymetrix ENCODE tiling arrays.\ Data are displayed as signals, \ median p-values, "strict" p-values and sites.\

\ \

Credits

\

\ These data were generated and analyzed by collaboration of\ the Tom Gingeras group at\ Affymetrix and the\ Kevin Struhl lab\ at Harvard Medical School.

\ \

References

\

\ Please see the Affymetrix Transcriptome site for a project overview and\ additional references to Affymetrix tiling array publications.

\

\ Bolstad BM, Irizarry RA, Astrand M, and Speed TP.\ A comparison\ of normalization methods for high density oligonucleotide array data based\ on variance and bias.\ Bioinformatics. 2003 Jan 22;19(2):185-93.

\

\ Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA,\ Kampa D, Piccolboni A, Sementchenko V, Cheng J,\ Williams AJ et al.\ Unbiased mapping of\ transcription factor binding sites along human chromosomes 21 and 22 points\ to widespread regulation of noncoding RNAs.\ Cell. 2004 Feb 20;116(4):499-509.\ \ encodeChip 0 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ group encodeChip\ longLabel Affymetrix ChIP-chip\ priority 43.0\ shortLabel Affy ChIP\ superTrack on\ track encodeAffyChipSuper\ encodeAffyChIpHl60PvalP300Hr02 Affy P300 RA 2h wig 0.0 534.54 Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 2hrs) P-Value 0 43 100 125 0 177 190 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 100,125,0\ longLabel Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 2hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 43\ shortLabel Affy P300 RA 2h\ subGroups factor=P300 time=2h\ track encodeAffyChIpHl60PvalP300Hr02\ encodeAffyChIpHl60Pval Affy pVal wig 0.0 534.54 Affymetrix ChIP/Chip (retinoic acid-treated HL-60 cells) P-Values 0 43 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows regions that co-precipitate with antibodies against\ each of ten factors in all ENCODE regions, in retinoic-acid stimulated\ HL-60 cells harvested after 0, 2, 8, and 32 hours. Median P-values are shown in\ separate subtracks for each of the ten antibodies:\

\ Retinoic acid-stimulated HL-60 cells were harvested and\ whole cell extracts (control) were made. An antibody was used to\ immunoprecipitate bound chromatin fragments (treatment). DNA was\ purified from these samples and hybridized to Affymetrix ENCODE\ oligonucleotide tiling arrays, which have 25-mer probes tiled\ every 22 bp on average in the non-repetitive ENCODE regions.

\

\ Only median P-values are displayed; data for all biological replicates \ can be downloaded from Affymetrix in \ wiggle,\ cel, and\ soft formats.

\ \

Display Conventions and Configuration

\

\ The subtracks within this composite annotation track\ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options for the subtracks \ are shown at the top of the track description page, followed by a list of \ subtracks. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\

\ Color differences among the subtracks are arbitrary. They provide a\ visual cue for finding the same antibody in different timepoint tracks.

\ \

Methods

\

\ The data from replicate arrays were quantile-normalized (Bolstad et\ al., 2003) and all arrays were scaled to a median array intensity of\ 22. Within a sliding 1001 bp window centered on each probe, a signal\ estimator S = ln[max(PM - MM, 1)] (where PM is perfect match and MM is\ mismatch) was computed for each biological replicate treatment- and\ all replicate control-probe pairs. An estimate of the significance of\ the enrichment of treatment signal for each replicate over control\ signal in each window was given by the P-value computed using the\ Wilcoxon Rank Sum test over each biological replicate treatment and\ all control signal estimates in that window. The median of the log\ transformed P-value (-10 log[10] P) across processed replicate data is\ displayed.

\

\ Several independent biological replicates (four each for Brg1, CEBPe,\ CTCF, PU1, and SIRT1; five each for H3K27me3, H4Kac4, P300, Pol2 and\ RARA) were generated and hybridized\ to duplicate arrays (two technical replicates). Reproducible enriched\ regions were generated from the signal by first applying a cutoff of\ 20 to the log transformed P-values, a maxGap and minRun of 500 and 0\ basepairs respectively, to each biological replicate. Since each\ region or site may be comprised of more than one probe, a median\ based on the distribution of log transformed P-values was computed per\ site for each of the respective replicates. These seed sites were then\ ranked individually within each of the replicates. If a site was\ absent in a replicate, the maximum or worst rank of the distribution\ was assigned to it.

\

\ The following three values were computed for each\ site by combining data from all biological replicates: \

\

\ The final sites were selected when all of the\ above three metrics were relatively low, where "low" corresponds to \ the top 25 percentile of the distribution.

\ \

Verification

\

\ Using the P-values from the biological replicates, all pairwise\ rank correlation coefficients were computed among biological\ replicates. Data sets showing both consistent pairwise correlation\ coefficients and at least weak positive correlation across all pairs\ were considered reproducible.

\ \

Credits

\

\ These data were generated and analyzed by the Gingeras/Struhl\ collaboration with the Tom Gingeras group at \ Affymetrix and \ Kevin Struhl's group at Harvard Medical School.

\ \

References

\

\ Please see the \ Affymetrix Transcriptome site for a project overview and\ additional references to Affymetrix tiling array publications.

\

\ Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. \ A comparison \ of normalization methods for high density oligonucleotide array data based \ on variance and bias. \ Bioinformatics 19(2), 185-193 (2003).

\

\ Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger,\ E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J.,\ Williams, A. J., et al. \ Unbiased mapping of \ transcription factor binding sites along human chromosomes 21 and 22 points \ to widespread regulation of noncoding RNAs. \ Cell 116(4), 499-509 (2004).

\ encodeChip 0 autoScale off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel Affymetrix ChIP/Chip (retinoic acid-treated HL-60 cells) P-Values\ maxHeightPixels 128:16:16\ origAssembly hg16\ priority 43.0\ shortLabel Affy pVal\ spanList 1\ subGroup1 time Timepoint 0h=0hrs 2h=2hrs 8h=8hrs 32h=32hrs\ subGroup2 factor Factor Brg1=Brg1 CEBPe=CEBPe CTCF=CTCF H3K27me3=H3K27me3 H4Kac4=H4Kac4 P300=P300 PU1=PU1 RARA=RARA Pol2=Pol2 SIRT1=SIRT1 TFIIB=TFIIB\ track encodeAffyChIpHl60Pval\ type wig 0.0 534.54\ viewLimits 0:100\ visibility hide\ ncbiGenes NCBI Gene Models genePred ncbiPep Human Gene Models from NCBI 0 43 0 0 0 127 127 127 0 0 0

Description & Credits

\ \ Gene predictions from \ NCBI . \ See the human build \ \ release notes \ for a description of the build. \ genes 1 group genes\ longLabel $Organism Gene Models from NCBI\ priority 43\ shortLabel NCBI Gene Models\ track ncbiGenes\ type genePred ncbiPep\ visibility hide\ encodeYaleAffyNeutRNATarsAllIntergenicDistal Yale Ig Dst Neu bed 4 . Yale Intergenic Distal Neutrophil TARs 0 43 205 0 50 230 127 152 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 205,0,50\ longLabel Yale Intergenic Distal Neutrophil TARs\ parent encodeNoncodingTransFrags\ priority 43\ shortLabel Yale Ig Dst Neu\ subGroups region=intergenicDistal celltype=neut source=yale\ track encodeYaleAffyNeutRNATarsAllIntergenicDistal\ encodeAffyChIpHl60Sites Affy Sites bed 3 . Affymetrix ChIP/Chip (retinoic acid-treated HL-60 cells) Sites 0 43.1 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows regions that co-precipitate with antibodies against\ each of ten factors in all ENCODE regions, in retinoic-acid stimulated\ HL-60 cells harvested after 0, 2, 8, and 32 hours. Clustered sites are shown in\ separate subtracks for each of the ten antibodies:\

\ Retinoic acid-stimulated HL-60 cells were harvested and\ whole cell extracts (control) were made. An antibody was used to\ immunoprecipitate bound chromatin fragments (treatment). DNA was\ purified from these samples and hybridized to Affymetrix ENCODE\ oligonucleotide tiling arrays, which have 25-mer probes tiled\ every 22 bp on average in the non-repetitive ENCODE regions.

\ \

Display Conventions and Configuration

\

\ The subtracks within this composite annotation track\ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options for the subtracks \ are shown at the top of the track description page, followed by a list of \ subtracks. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\

\ Color differences among the subtracks are arbitrary. They provide a\ visual cue for finding the same antibody in different timepoint tracks.

\ \

Methods

\

\ The data from replicate arrays were quantile-normalized (Bolstad et\ al., 2003) and all arrays were scaled to a median array intensity of\ 22. Within a sliding 1001 bp window centered on each probe, a signal\ estimator S = ln[max(PM - MM, 1)] (where PM is perfect match and MM is\ mismatch) was computed for each biological replicate treatment- and\ all replicate control-probe pairs. An estimate of the significance of\ the enrichment of treatment signal for each replicate over control\ signal in each window was given by the P-value computed using the\ Wilcoxon Rank Sum test over each biological replicate treatment and\ all control signal estimates in that window. The median of the log\ transformed P-value (-10 log10 P) across processed replicate \ data is displayed.

\

\ Several independent biological replicates (four each for Brg1, CEBPe,\ CTCF, PU1, and SIRT1; five each for H3K27me3, H4Kac4, P300, Pol2 and\ RARA) were generated and hybridized\ to duplicate arrays (two technical replicates). Reproducible enriched\ regions were generated from the signal by first applying a cutoff of\ 20 to the log transformed P-values, a maxGap and minRun of 500 and 0\ basepairs respectively, to each biological replicate. Since each\ region or site may be comprised of more than one probe, a median\ based on the distribution of log transformed P-values was computed per\ site for each of the respective replicates. These seed sites were then\ ranked individually within each of the replicates. If a site was\ absent in a replicate, the maximum or worst rank of the distribution\ was assigned to it.

\

\ The following three values were computed for each\ site by combining data from all biological replicates: \

\

\ The final sites were selected when all of the\ above three metrics were relatively low, where "low" corresponds to \ the top 25 percentile of the distribution.

\ \

Verification

\

\ Using the P-values from the biological replicates, all pairwise\ rank correlation coefficients were computed among biological\ replicates. Data sets showing both consistent pairwise correlation\ coefficients and at least weak positive correlation across all pairs\ were considered reproducible.

\ \

Credits

\

\ These data were generated and analyzed by the Gingeras/Struhl\ collaboration with the Tom Gingeras group at \ Affymetrix and \ Kevin Struhl's group at Harvard Medical School.

\ \

References

\

\ Please see the \ Affymetrix Transcriptome site for a project overview and\ additional references to Affymetrix tiling array publications.

\

\ Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. \ A comparison \ of normalization methods for high density oligonucleotide array data based \ on variance and bias. \ Bioinformatics 19(2), 185-193 (2003).

\

\ Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger,\ E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J.,\ Williams, A. J., et al. \ Unbiased mapping of \ transcription factor binding sites along human chromosomes 21 and 22 points \ to widespread regulation of noncoding RNAs. \ Cell 116(4), 499-509 (2004).

\ encodeChip 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel Affymetrix ChIP/Chip (retinoic acid-treated HL-60 cells) Sites\ origAssembly hg16\ priority 43.1\ shortLabel Affy Sites\ subGroup1 time Timepoint 0h=0hrs 2h=2hrs 8h=8hrs 32h=32hrs\ subGroup2 factor Factor Brg1=Brg1 CEBPe=CEBPe CTCF=CTCF H3K27me3=H3K27me3 H4Kac4=H4Kac4 P300=P300 PU1=PU1 RARA=RARA Pol2=Pol2 SIRT1=SIRT1 TFIIB=TFIIB\ track encodeAffyChIpHl60Sites\ type bed 3 .\ visibility hide\ encodeAffyChIpHl60PvalStrict Affy Strict pVal wig 0 696.62 Affymetrix ChIP-chip (HL-60 and ME-180 cells) Strict P-Value 1 43.7 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows regions that co-precipitate with antibodies against\ each of 4 factors in all ENCODE regions, in retinoic-acid stimulated\ \ HL-60\ (leukemia) cells harvested after 0, 2, 8, and 32 hours, and in a fifth\ factor tested in \ \ ME-180 cervical carcinoma cells.\ Median of the transformed P-value (-10 log[10] P) across\ processed replicate data is displayed as \ separate subtracks for each antibody:\

\ Retinoic acid-stimulated HL-60 cells and ME-180 cells (actinomycin-D treated\ or untreated) were harvested and\ whole cell extracts (control) were made. An antibody was used to\ immunoprecipitate bound chromatin fragments (treatment). DNA was\ purified from these samples and hybridized to Affymetrix ENCODE\ oligonucleotide tiling arrays, which have 25-mer probes tiled\ every 22 bp on average in the non-repetitive ENCODE regions.

\

\ Only the median of the transformed P-value (-10 log[10] P) is displayed; \ data for all biological replicates \ can be downloaded from Affymetrix in \ wiggle,\ cel, and\ soft formats.

\ \

Display Conventions and Configuration

\

\ The subtracks within this composite annotation track\ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options for the subtracks \ are shown at the top of the track description page, followed by a list of \ subtracks. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\

\ Color differences among the subtracks are arbitrary. They provide a\ visual cue for finding the same antibody in different timepoint tracks.

\ \

Methods

\

\ The data from replicate arrays were quantile-normalized (Bolstad et\ al., 2003) and all arrays were scaled to a median array intensity of\ 22. Within a sliding 1001 bp window centered on each probe, a signal\ estimator S = ln[max(PM - MM, 1)] (where PM is perfect match and MM is\ mismatch) was computed for each biological replicate treatment- and\ all replicate control-probe pairs. An estimate of the significance of\ the enrichment of treatment signal for each replicate over control\ signal in each window was given by the P-value computed using the\ Wilcoxon Rank Sum test over each biological replicate treatment and\ all control signal estimates in that window. The median of the \ transformed P-value (-10 log[10] P) across processed replicate data is\ displayed.

\ \

Verification

\

\ Using the P-values from the biological replicates, all pairwise\ rank correlation coefficients were computed among biological\ replicates. Data sets showing both consistent pairwise correlation\ coefficients and at least weak positive correlation across all pairs\ were considered reproducible.

\ \

Credits

\

\ These data were generated and analyzed by the Gingeras/Struhl\ collaboration with the Tom Gingeras group at \ Affymetrix and \ Kevin Struhl's group at Harvard Medical School.

\ \

References

\

\ Please see the \ Affymetrix Transcriptome site for a project overview and\ additional references to Affymetrix tiling array publications.

\

\ Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. \ A comparison \ of normalization methods for high density oligonucleotide array data based \ on variance and bias. \ Bioinformatics 19(2), 185-193 (2003).

\

\ Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger,\ E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J.,\ Williams, A. J., et al. \ Unbiased mapping of \ transcription factor binding sites along human chromosomes 21 and 22 points \ to widespread regulation of noncoding RNAs. \ Cell 116(4), 499-509 (2004).

\

\ Yang A, Zhu Z, Kapranov P, McKeon F, Church GM, Gingeras TR, Struhl K.\ \ Relationships between p63 binding, DNA sequence, transcription\ activity, and biological function in human cells. \ Mol. Cell. 24(4), 593-602 (2006).

\ \ encodeChip 0 autoScale off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE Oct 2005 Freeze\ group encodeChip\ longLabel Affymetrix ChIP-chip (HL-60 and ME-180 cells) Strict P-Value\ maxHeightPixels 128:16:16\ origAssembly hg17\ priority 43.7\ shortLabel Affy Strict pVal\ spanList 1\ subGroup1 factor Factor H3K9K14ac2=H3K9K14ac2 H4Kac4=H4Kac4 Pol2=Pol2 actd=p63_ActD mactd=p63_mActD\ subGroup2 time Timepoint 0h=0hrs 2h=2hrs 8h=8hrs 32h=32hrs\ superTrack encodeAffyChipSuper dense\ track encodeAffyChIpHl60PvalStrict\ type wig 0 696.62\ viewLimits 0:250\ encodeAffyChIpHl60SignalStrict Affy Strict Sig wig -2.78 3.97 Affymetrix ChIP-chip (HL-60 and ME-180 cells) Strict Signal 0 43.8 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows regions that co-precipitate with antibodies against\ each of 4 factors in all ENCODE regions, in retinoic-acid stimulated\ \ HL-60\ (leukemia) cells harvested after 0, 2, 8, and 32 hours, and in a fifth\ factor tested in \ \ ME-180 cervical carcinoma cells.\ \ Median of the signal estimate across processed replicate data is displayed as \ separate subtracks for each antibody:\

\ Retinoic acid-stimulated HL-60 cells and ME-180 cells (actinomycin-D treated\ or untreated) were harvested and\ whole cell extracts (control) were made. An antibody was used to\ immunoprecipitate bound chromatin fragments (treatment). DNA was\ purified from these samples and hybridized to Affymetrix ENCODE\ oligonucleotide tiling arrays, which have 25-mer probes tiled\ every 22 bp on average in the non-repetitive ENCODE regions.

\

\ Only the median of the signal estimate across processed replicate data\ is displayed; data for all biological replicates \ can be downloaded from Affymetrix in \ wiggle,\ cel, and\ soft formats.

\ \

Display Conventions and Configuration

\

\ The subtracks within this composite annotation track\ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options for the subtracks \ are shown at the top of the track description page, followed by a list of \ subtracks. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\

\ Color differences among the subtracks are arbitrary. They provide a\ visual cue for finding the same antibody in different timepoint tracks.

\ \

Methods

\

\ The data from replicate arrays were quantile-normalized (Bolstad et\ al., 2003) and all arrays were scaled to a median array intensity of\ 22. Within a sliding 1001 bp window centered on each probe, a signal\ estimator S = ln[max(PM - MM, 1)] (where PM is perfect match and MM is\ mismatch) was computed for each biological replicate treatment- and\ all replicate control-probe pairs. An estimate of the significance of\ the enrichment of treatment signal for each replicate over control\ signal in each window was given by the P-value computed using the\ Wilcoxon Rank Sum test over each biological replicate treatment and\ all control signal estimates in that window. The median of the \ signal estimate across processed replicate data is\ displayed.

\ \

Verification

\

\ Using the P-values from the biological replicates, all pairwise\ rank correlation coefficients were computed among biological\ replicates. Data sets showing both consistent pairwise correlation\ coefficients and at least weak positive correlation across all pairs\ were considered reproducible.

\ \

Credits

\

\ These data were generated and analyzed by the Gingeras/Struhl\ collaboration with the Tom Gingeras group at \ Affymetrix and \ Kevin Struhl's group at Harvard Medical School.

\ \

References

\

\ Please see the \ Affymetrix Transcriptome site for a project overview and\ additional references to Affymetrix tiling array publications.

\

\ Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. \ A comparison \ of normalization methods for high density oligonucleotide array data based \ on variance and bias. \ Bioinformatics 19(2), 185-193 (2003).

\

\ Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger,\ E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J.,\ Williams, A. J., et al. \ Unbiased mapping of \ transcription factor binding sites along human chromosomes 21 and 22 points \ to widespread regulation of noncoding RNAs. \ Cell 116(4), 499-509 (2004).

\

\ Yang A, Zhu Z, Kapranov P, McKeon F, Church GM, Gingeras TR, Struhl K.\ \ Relationships between p63 binding, DNA sequence, transcription\ activity, and biological function in human cells. \ Mol. Cell. 24(4), 593-602 (2006).

\ \ encodeChip 0 autoScale off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE Oct 2005 Freeze\ group encodeChip\ longLabel Affymetrix ChIP-chip (HL-60 and ME-180 cells) Strict Signal\ maxHeightPixels 128:16:16\ origAssembly hg17\ priority 43.8\ shortLabel Affy Strict Sig\ spanList 1\ subGroup1 factor Factor H3K9K14ac2=H3K9ac2 H4Kac4=H4Kac4 Pol2=Pol2 actd=p63_ActD mactd=p63_mActD\ subGroup2 time Timepoint 0h=0hrs 2h=2hrs 8h=8hrs 32h=32hrs\ superTrack encodeAffyChipSuper dense\ track encodeAffyChIpHl60SignalStrict\ type wig -2.78 3.97\ viewLimits 0:2.0\ visibility hide\ encodeAffyChIpHl60SitesStrict Affy Strict Sites bed 3 . Affymetrix ChIP-chip (HL-60 and ME-180 cells) Strict Sites 1 43.9 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows regions that co-precipitate with antibodies against\ each of 4 factors in all ENCODE regions, in retinoic-acid stimulated\ \ HL-60\ (leukemia) cells harvested after 0, 2, 8, and 32 hours, and in a fifth\ factor tested in \ \ ME-180 cervical carcinoma cells.\ \ Clustered sites are shown in separate subtracks for each antibody:\

\ Retinoic acid-stimulated HL-60 cells and ME-180 cells (actinomycin-D treated\ or untreated) were harvested and\ whole cell extracts (control) were made. An antibody was used to\ immunoprecipitate bound chromatin fragments (treatment). DNA was\ purified from these samples and hybridized to Affymetrix ENCODE\ oligonucleotide tiling arrays, which have 25-mer probes tiled\ every 22 bp on average in the non-repetitive ENCODE regions.

\

\ Data for all biological replicates \ can be downloaded from Affymetrix in \ wiggle,\ cel, and\ soft formats.

\

\

Display Conventions and Configuration

\

\ The subtracks within this composite annotation track\ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options for the subtracks \ are shown at the top of the track description page, followed by a list of \ subtracks. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\

\ Color differences among the subtracks are arbitrary. They provide a\ visual cue for finding the same antibody in different timepoint tracks.

\ \

Methods

\

\ Three independent biological replicates were generated and hybridized\ to duplicate arrays (two technical replicates). Reproducible enriched regions\ were generated from the signal, by first applying a cutoff \ of 0.693(ln(2)=0.693) to the signal estimate, a maxgap and minrun of 500 \ and 0 basepairs respectively, to each biological replicate. \ Since each region or site can comprise of more than a single probe, \ a median based on the distribution of log transformed P-values \ was computed per site for each of the respective replicates. \ These seed sites were then ranked individually within each of the replicates. \ If a site was absent in a replicate the maximum or worst rank of the \ distribution was assigned to it. \

\ The following three values were computed for each\ site by combining data from all biological replicates: \

\

\ A final signal estimate based filter was applied, where sites with median\ signal estimate of at least 0.693/(total number of individual replcates) \ were considered. This was to ensure that if a site was not \ detected consistently in all replicates but was detected at a significant\ signal level in a subset of the replicates its detection level would\ be weighted accordingly in the final selection of sites. \ The final sites were selected when all of the above three metrics were\ relatively low, where "low" corresponds to the top 25 \ percentile of the distribution.\ \

Verification

\

\ Using the P-values from the biological replicates, all pairwise\ rank correlation coefficients were computed among biological\ replicates. Data sets showing both consistent pairwise correlation\ coefficients and at least weak positive correlation across all pairs\ were considered reproducible.

\ \

Credits

\

\ These data were generated and analyzed by the Gingeras/Struhl\ collaboration with the Tom Gingeras group at \ Affymetrix and \ Kevin Struhl's group at Harvard Medical School.

\ \

References

\

\ Please see the \ Affymetrix Transcriptome site for a project overview and\ additional references to Affymetrix tiling array publications.

\

\ Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. \ A comparison \ of normalization methods for high density oligonucleotide array data based \ on variance and bias. \ Bioinformatics 19(2), 185-193 (2003).

\

\ Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger,\ E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J.,\ Williams, A. J., et al. \ Unbiased mapping of \ transcription factor binding sites along human chromosomes 21 and 22 points \ to widespread regulation of noncoding RNAs. \ Cell 116(4), 499-509 (2004).

\

\ Yang A, Zhu Z, Kapranov P, McKeon F, Church GM, Gingeras TR, Struhl K.\ \ Relationships between p63 binding, DNA sequence, transcription\ activity, and biological function in human cells. \ Mol. Cell. 24(4), 593-602 (2006).

\ \ encodeChip 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE Oct 2005 Freeze\ group encodeChip\ longLabel Affymetrix ChIP-chip (HL-60 and ME-180 cells) Strict Sites\ origAssembly hg17\ priority 43.9\ shortLabel Affy Strict Sites\ subGroup1 factor Factor H3K9K14ac2=H3K9K14ac2 H4Kac4=H4Kac4 Pol2=Pol2 actd=p63_ActD mactd=p63_mActD\ subGroup2 time Timepoint 0h=0hrs 2h=2hrs 8h=8hrs 32h=32hrs\ superTrack encodeAffyChipSuper dense\ track encodeAffyChIpHl60SitesStrict\ type bed 3 .\ encodeAffyChIpHl60SitesP300Hr02 Affy P300 RA 2h bed 3 . Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 2hrs) Sites 0 44 100 125 0 177 190 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 100,125,0\ longLabel Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 2hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 44\ shortLabel Affy P300 RA 2h\ subGroups factor=P300 time=2h\ track encodeAffyChIpHl60SitesP300Hr02\ npredGene NCBI Prediction genePred npredPep NCBI Gene Predictions 0 44 170 100 0 212 177 127 0 0 0 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=$$

Description

\

This track shows predictions from NCBI Genome\ Assembly/Annotation Projects.\ \

Methods

\ Methods details goes here.\

Credits

\ Thanks to NCBI.\ \ genes 1 color 170,100,0\ group genes\ longLabel NCBI Gene Predictions\ priority 44\ shortLabel NCBI Prediction\ track npredGene\ type genePred npredPep\ url http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=$$\ visibility hide\ encodeYaleAffyPlacRNATarsIntergenicDistal Yale Ig Dst Plac bed 4 . Yale Intergenic Distal Placental TARs 0 44 230 0 25 242 127 140 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 color 230,0,25\ longLabel Yale Intergenic Distal Placental TARs\ parent encodeNoncodingTransFrags\ priority 44\ shortLabel Yale Ig Dst Plac\ subGroups region=intergenicDistal celltype=plac source=yale\ track encodeYaleAffyPlacRNATarsIntergenicDistal\ encodeAffyChIpHl60PvalP300Hr08 Affy P300 RA 8h wig 0.0 534.54 Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 8hrs) P-Value 0 45 100 125 0 177 190 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 100,125,0\ longLabel Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 8hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 45\ shortLabel Affy P300 RA 8h\ subGroups factor=P300 time=8h\ track encodeAffyChIpHl60PvalP300Hr08\ twinscan Twinscan genePred twinscanPep Twinscan Gene Predictions Using Mouse/Human Homology 0 45 0 100 100 127 177 177 0 0 0

Description

\

\ The Twinscan program predicts genes in a manner similar to Genscan, except \ that Twinscan takes advantage of genome comparisons to improve gene prediction\ accuracy. In the version of Twinscan used to generate this track, intronless \ copies of known genes are masked out before gene prediction, reducing \ the number of non-processed pseudogenes in gene models. More information and a\ web server can be found at http://mblab.wustl.edu/.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for\ gene prediction\ tracks.

\

\ The track description page offers the following filter and configuration\ options:\

\ \

Methods

\

\ The Twinscan algorithm is described in Korf, I. et al. 2001 in the\ References section below.

\ \

Credits

\

\ Thanks to Michael Brent's Computational Genomics Group at Washington\ University St. Louis for providing these data.

\ \

References

\

\ Korf I, Flicek P, Duan D, Brent MR.\ Integrating genomic homology into gene structure prediction.\ Bioinformatics. 2001 Jun 1;17(90001)S140-8.

\ genes 1 color 0,100,100\ group genes\ longLabel Twinscan Gene Predictions Using Mouse/Human Homology\ priority 45\ shortLabel Twinscan\ track twinscan\ type genePred twinscanPep\ visibility hide\ ucscFromMouse UCSC Mm3 genePred UCSC Gene Predictions from Known Mouse Genes Mapped to Human 0 45 0 100 100 0 50 50 0 0 0 genes 1 altColor 0,50,50\ color 0,100,100\ group genes\ longLabel UCSC Gene Predictions from Known Mouse Genes Mapped to Human\ priority 45\ shortLabel UCSC Mm3\ track ucscFromMouse\ type genePred\ visibility hide\ nscanGene N-SCAN genePred nscanPep N-SCAN Gene Predictions 0 45.1 34 139 34 144 197 144 0 0 0

Description

\

\ This track shows gene predictions using the N-SCAN gene structure prediction\ software provided by the Computational Genomics Lab at Washington University \ in St. Louis, MO, USA.\

\ \

Methods

\

\ N-SCAN combines biological-signal modeling in the target genome sequence along\ with information from a multiple-genome alignment to generate de novo gene\ predictions. It extends the TWINSCAN target-informant genome pair to allow for\ an arbitrary number of informant sequences as well as richer models of\ sequence evolution. N-SCAN models the phylogenetic relationships between the\ aligned genome sequences, context-dependent substitution rates, insertions,\ and deletions.\

\

BUG: INFORMANT DESCRIPTION NOT SET IN TRACK DB FILE

\ \

Credits

\

\ Thanks to Michael Brent's Computational Genomics Group at Washington \ University St. Louis for providing this data.\

\

\ Special thanks for this implementation of N-SCAN to Aaron Tenney in\ the Brent lab, and Robert Zimmermann, currently at Max F. Perutz\ Laboratories in Vienna, Austria.\

\ \

References

\

\ Gross SS, Brent MR.\ Using\ multiple alignments to improve gene prediction. In\ Proc. 9th Int'l Conf. on Research in Computational Molecular Biology\ (RECOMB '05):374-388 and J Comput Biol. 2006 Mar;13(2):379-93.\

\

\ Korf I, Flicek P, Duan D, Brent MR.\ Integrating genomic homology into gene structure prediction.\ Bioinformatics. 2001 Jun 1;17(90001):S140-8.

\

\ van Baren MJ, Brent MR.\ Iterative gene prediction and pseudogene removal improves\ genome annotation.\ Genome Res. 2006 May;16(5):678-85.

\

\ Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM,\ Rusch DB, Town CD et al.\ \ Improving the Arabidopsis genome annotation using maximal transcript \ alignment assemblies.\ Nucleic Acids Res 2003 Oct 1;31(19):5654-66.

\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 34,139,34\ group genes\ informant BUG: INFORMANT DESCRIPTION NOT SET IN TRACK DB FILE\ longLabel N-SCAN Gene Predictions\ priority 45.1\ shortLabel N-SCAN\ track nscanGene\ type genePred nscanPep\ visibility hide\ contrastGene CONTRAST coloredExon CONTRAST Gene Predictions 0 45.2 34 34 139 144 144 197 0 0 0

Description

\

\ This track shows protein-coding gene predictions generated by \ CONTRAST. \ Each predicted exon is colored according to confidence level: green (high\ confidence), orange (medium confidence), or red (low confidence).\

\ \

Methods

\

\ CONTRAST predicts protein-coding genes from a multiple genomic\ alignment using a combination of discriminative machine learning techniques. A two-stage approach is used, in which output from local classifiers is combined with a global model of gene structure. CONTRAST is trained using a novel procedure designed to\ maximize expected coding region boundary detection accuracy.\ \

\ Please see the \ CONTRAST web site for details on how these predictions were generated and an estimate of accuracy. \ \

Credits

\

\ Thanks to Samuel Gross of the Batzoglou lab at Stanford University for providing these predictions.\ \

References

\

\ Gross SS, Do CB, Sirota M, Batzoglou S.\ CONTRAST: A Discriminative, Phylogeny-free Approach to Multiple Informant De Novo Gene Prediction.\ Genome Biology. 2007 December;8(12):R269.\

\ genes 0 color 34,34,139\ group genes\ longLabel CONTRAST Gene Predictions\ priority 45.2\ shortLabel CONTRAST\ track contrastGene\ type coloredExon\ visibility hide\ slamMouse Slam Mouse genePred Slam Gene Predictions Using Human/Mouse Homology 0 45.5 100 50 0 175 150 128 0 0 0

Description and Credits

\

\ Slam predicts coding exons and conserved noncoding regions in a pair of \ homologous DNA sequences, incorporating both statistical sequence properties \ and degree of conservation in making the predictions. The model is symmetric \ and the same gene structure (with possibly different exon lengths) is \ predicted in both sequences.

\

\ The symmetry of the model gives it a higher degree of accuracy for regions \ where the true underlying gene structures contain the same number of coding \ exons, in cases where this is not true, or when one of the sequences is of \ lower quality and contains in-frame stop codons, the resulting predictions \ tend to have lower accuracy.

\

\ More information and a web server can be found on the \ Slam website.

\ \

References

\

\ Alexandersson, M., Cawley, S., and Pachter, L. \ SLAM - Cross-species gene finding and alignment with a \ generalized pair hidden Markov model. \ Genome Res. 13(3), 496-502.

\

\ Cawley, S., Pachter, L., and Alexandersson, M. \ SLAM web server for comparative gene finding and alignment.\ Nucleic Acids Res. 31(13), 3507-3509 (2003).

\

\ Pachter, L., Alexandersson, M., and Cawley, S. \ Applications of generalized pair hidden Markov models to \ alignment and gene finding problems. \ J Comput Biol. 9(2), 389-99 (2002).

\

\ Pachter, L., Alexandersson, M., and Cawley, S. Applications of generalized \ pair hidden Markov models to alignment and gene finding problems. \ Proceedings of the Fifth Annual International Conference on Computational \ Molecular Biology (RECOMB 2001) (2001).

\ \ genes 1 altColor 175,150,128\ color 100,50,0\ group genes\ longLabel Slam Gene Predictions Using Human/Mouse Homology\ priority 45.5\ shortLabel Slam Mouse\ track slamMouse\ type genePred\ visibility hide\ slamRat Slam Rat genePred Slam Gene Predictions Using Human/Rat Homology 0 45.6 100 50 0 175 150 128 0 0 0

Description and Credits

\

\ Slam predicts coding exons and conserved noncoding regions in a pair of \ homologous DNA sequences, incorporating both statistical sequence properties \ and degree of conservation in making the predictions. The model is symmetric \ and the same gene structure (with possibly different exon lengths) is \ predicted in both sequences.

\

\ The symmetry of the model gives it a higher degree of accuracy for regions \ where the true underlying gene structures contain the same number of coding \ exons, in cases where this is not true, or when one of the sequences is of \ lower quality and contains in-frame stop codons, the resulting predictions \ tend to have lower accuracy.

\

\ More information and a web server can be found on the \ Slam website.

\ \

References

\

\ Alexandersson, M., Cawley, S., and Pachter, L. \ SLAM - Cross-species gene finding and alignment with a \ generalized pair hidden Markov model. \ Genome Res. 13(3), 496-502.

\

\ Cawley, S., Pachter, L., and Alexandersson, M. \ SLAM web server for comparative gene finding and alignment.\ Nucleic Acids Res. 31(13), 3507-3509 (2003).

\

\ Pachter, L., Alexandersson, M., and Cawley, S. \ Applications of generalized pair hidden Markov models to \ alignment and gene finding problems. \ J Comput Biol. 9(2), 389-99 (2002).

\

\ Pachter, L., Alexandersson, M., and Cawley, S. Applications of generalized \ pair hidden Markov models to alignment and gene finding problems. \ Proceedings of the Fifth Annual International Conference on Computational \ Molecular Biology (RECOMB 2001) (2001).

\ \ genes 1 altColor 175,150,128\ color 100,50,0\ group genes\ longLabel Slam Gene Predictions Using Human/Rat Homology\ priority 45.6\ shortLabel Slam Rat\ track slamRat\ type genePred\ visibility hide\ encodeAffyChIpHl60SitesP300Hr08 Affy P300 RA 8h bed 3 . Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 8hrs) Sites 0 46 100 125 0 177 190 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 100,125,0\ longLabel Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 8hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 46\ shortLabel Affy P300 RA 8h\ subGroups factor=P300 time=8h\ track encodeAffyChIpHl60SitesP300Hr08\ genomeScan NCBI GenomeScan genePred genomeScanPep Human GenomeScan Models from NCBI 0 46 0 0 0 127 127 127 0 0 0

Description & Credits

\ \ Pure GenomeScan gene predictions from \ NCBI .\ See the human build \ \ release notes \ for a description of the build. \ genes 1 group genes\ longLabel $Organism GenomeScan Models from NCBI\ priority 46\ shortLabel NCBI GenomeScan\ track genomeScan\ type genePred genomeScanPep\ visibility hide\ encodeAffyChIpHl60PvalP300Hr32 Affy P300 RA 32h wig 0.0 534.54 Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 32hrs) P-Value 0 47 100 125 0 177 190 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 100,125,0\ longLabel Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 32hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 47\ shortLabel Affy P300 RA 32h\ subGroups factor=P300 time=32h\ track encodeAffyChIpHl60PvalP300Hr32\ encodeGisChipPet GIS p53 5FU HCT116 bed 12 GIS ChIP-PET: p53 Ab on 5FU treated HCT116 cells 0 47 158 35 135 206 145 195 1 0 24 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX,chrY,

Description

\

\ This track shows genome-wide p53 binding sites as determined by chromatin\ immunoprecipitation (ChIP) and paired-end di-tag (PET) sequencing. \ The p53 protein is a transcription factor involved in the control \ of cell growth that is often expressed at high levels in cancer cells. \ See the Methods section below for more information about ChIP and\ PET.

\

\ The PET sequences in this track are derived from 65,572\ individual p53 ChIP fragments of 5-fluorouracil (5FU) stimulated\ HCT116 cells. More datasets will be submitted in the future, including\ STAT1, TAF250, and E2F1.

\ \

Display Conventions and Configuration

\

\ In the graphical display, PET sequences are shown as two blocks,\ representing the ends of the pair, connected by a thin arrowed\ line. Overlapping PET clusters (PET fragments that overlap one\ another) originating from the ChIP enrichment process define the\ genomic loci that are potential transcription factor binding sites (TFBSs). \ PET singletons, from non-specific ChIP fragments that did not cluster, are \ not shown.

\

\ In full and packed display modes, the arrowheads on the horizontal line \ represent the orientation of the PET sequence, and an ID of the format \ XXXXX-M is shown to the left of each PET,\ where X is the unique ID for each PET \ and M is the number of PET sequences at this location.\ The track coloring reflects the value of M: \ light gray indicates one or two sequences (score = 333), dark gray is used for \ three sequences (score = 800) and black indicates four or more PET sequences \ (score = 1000) at the location.

\ \

Methods

\

\ HCT116 cells were treated with 5FU for six hours. The cross-linked\ chromatin was sheared and precipitated with a high affinity\ antibody. The DNA fragments were end-polished and cloned into the plasmid\ vector, pGIS3. pGIS3 contains two MmeI recognition sites that\ flank the cloning site, which were used to produce a 36 bp\ PET from the original ChIP DNA fragments (18 bp from each of the 5' and 3' \ ends). Multiple 36 bp PETs were concatenated and cloned into pZero-1 for \ sequencing, where each sequence read can generate 10-15 PETs. The PET \ sequences were extracted from raw sequence reads and mapped to the genome, \ defining the boundaries of each ChIP DNA fragment. The following specific\ mapping criteria were used:\

\

\ Due to the known possibility of MmeI slippage (+/- 1 bp) that\ leads to ambiguities at the PET signature boundaries, a minimal 17 bp\ match was set for each 18 bp signature. The total count of PET sequences\ mapped to the same locus but with slight nucleotide differences may reflect\ the expression level of the transcripts. Only PETs with specific mapping \ (one location) to the genome were considered. PETs that mapped to multiple \ locations may represent low complexity or repetitive sequences, and therefore \ were not included for further analysis.

\ \

Verification

\

\ Statistical and experimental verification exercises have shown that\ the overlapping PET clusters result from ChIP enrichment events.\

\ Monte Carlo simulation using the p53 ChIP-PET data estimated that\ about 27% of PET-2 clusters (PET clusters with two overlapping\ members), 3% of the PET clusters with 3 overlapping members (PET-3\ clusters), and less than 0.0001% of PET clusters with more than 3\ overlapping members were due to random chance. This suggests that the\ PET clusters most likely represent the real enrichment events by ChIP\ and that a higher number of overlapping fragments correlates to a\ higher probability of a real ChIP enrichment event. Furthermore, based\ on goodness-of-fit analysis for assessing the reliability of PET\ clusters, it was estimated that less than 36% of the PET-2 clusters\ and over 99% of the PET-3+ clusters (clusters with three or more\ overlapping members) are true enrichment ChIP sites. Thus, the\ verification rate is nearly 100% for PET-3+ ChIP clusters, and the\ PET-2 clusters contain significant noise.

\

\ In addition to these statistical analyses, 40 genomic locations\ identified by PET-3+ clusters were randomly selected and analyzed by\ quantitative real-time PCR. The relative enrichment of candidate\ regions compared to control GST ChIP DNA was determined and all 40\ regions (100%) were confirmed to have significant enrichment of p53\ ChIP clusters.

\ \

Credits

\

\ The p53 ChIP-PET library and sequence data were produced at the \ Genome Institute of Singapore. The data were mapped\ and analyzed by scientists from the Genome Institute of Singapore, the\ Bioinformatics Institute, Singapore, and Boston University.

\ \

References

\

\ Ng, P. et al. Gene identification signature (GIS) analysis for\ transcriptome characterization and genome annotation. Nature\ Methods 2, 105-111 (2005).

\ \ encodeChip 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX,chrY\ color 158,35,135\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel GIS ChIP-PET: p53 Ab on 5FU treated HCT116 cells\ origAssembly hg16\ priority 47.0\ shortLabel GIS p53 5FU HCT116\ track encodeGisChipPet\ type bed 12\ useScore 1\ visibility hide\ sgpGene SGP Genes genePred sgpPep SGP Gene Predictions Using Mouse/Human Homology 0 47 0 90 100 127 172 177 0 0 0

Description

\

\ This track shows gene predictions from the SGP program, developed at \ the Genome Bionformatics \ Laboratory (GBL), which is part of the Grup de Recerca en Informàtica Biomèdica (GRIB) at Institut \ Municipal d'Investigació Mèdica (IMIM) / Centre de Regulació Genòmica (CGR) in \ Barcelona. To predict genes in a genomic query, SGP combines geneid predictions \ with tblastx comparisons of the genomic query against other genomic sequences.\

\

Credits

\

\ Thanks to GBL for providing these gene predictions.\

\ \ \ \ genes 1 color 0,90,100\ group genes\ longLabel SGP Gene Predictions Using Mouse/Human Homology\ priority 47\ shortLabel SGP Genes\ track sgpGene\ type genePred sgpPep\ visibility hide\ encodeGisChipPetStat1 GIS STAT1 HeLa bed 12 GIS ChIP-PET: STAT1 Ab on (+/-)gIF HeLa cells 0 47.1 125 140 35 67 79 35 1 0 24 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX,chrY,

Description

\

\ This track shows STAT1 binding sites as determined by chromatin \ immunoprecipitation (ChIP) and paired-end di-tag (PET) sequencing. \

\ The PET sequences in this track are derived from\ 327,838 STAT1 ChIP fragments of interferon gamma-stimulated HeLa cells and\ 263,901 STAT1 ChIP fragments of non-stimulated HeLa cells. \ Of these individual ChIP fragments, 3,180 of the PETs from the stimulated cells and\ 4,007 PETs from unstimulated cells were mapped to the ENCODE regions. \ The data from the unstimulated cells were used as the negative control.

\

\ Only PETs mapped to the ENCODE regions are shown in this track. \ \

Display Conventions and Configuration

\

\ In the graphical display, PET sequences are shown as two blocks,\ representing the ends of the pair, connected by a thin, arrowed\ line. Overlapping PET clusters (PET fragments that overlap one\ another) originating from the ChIP enrichment process define the\ genomic loci that are potential transcription factor binding sites (TFBSs).\ PET singletons, from non-specific ChIP fragments that did not cluster, are\ not shown.

\

\ In full and packed display modes, the arrowheads on the horizontal line\ represent the orientation of the PET sequence, and an ID of the format\ XXXXX-M is shown to the left of each PET,\ where X is the unique ID for each PET\ and M is the number of PET sequences at this location.\ The track coloring reflects the value of M:\ light gray indicates one or two sequences (score = 333), dark gray is used for\ three sequences (score = 800) and black indicates four or more PET sequences\ (score = 1000) at the location.

\ \

Methods

\

\ The STAT1 chromatin immuno-precipitated DNA fragments from stimulated and \ non-stimulated control cells were end-polished and cloned into the plasmid \ vector, pGIS3. pGIS3 contains two MmeI recognition sites that flank \ the cloning site, which were used to produce a 36 bp PET from the original \ ChIP DNA fragments (18 bp from each of the 5' and 3' ends). Multiple 36 bp \ PETs were concatenated and cloned into pZero-1 for sequencing, where each \ sequence read can generate 10-15 PETs. The PET sequences were extracted from \ raw sequence reads and mapped to the genome, defining the boundaries of each \ ChIP DNA fragment. The following specific mapping criteria were used: \

\

\ Due to the known possibility of MmeI slippage (+/- 1 bp) that leads to \ ambiguities at the PET signature boundaries, a minimal 17 bp match was set \ for each 18 bp signature. Only PETs with specific mapping (one location) to \ the genome were considered. PETs that mapped to multiple locations may \ represent low complexity or repetitive sequences, and therefore were not \ included for further analysis.

\ \

Verification

\

\ Statistical and experimental verification exercises have shown that the \ overlapping PET clusters result from ChIP enrichment events.

\

\ Monte Carlo simulation using the STAT1 ChIP-PET data from interferon \ gamma-stimulated dataset estimated that random chance accounted for about \ 58% of PET-3 clusters (maximal numbers of PETs within the overlap region \ of any cluster), 21% of the PET clusters with 4 overlapping members (PET-4 \ clusters), and less than 0.5% of PET clusters with more than 5 overlapping \ members. This suggests that the PET-5+ clusters represent the real enrichment \ events by ChIP and that a higher number of overlapping fragments correlates \ to a higher probability of a real ChIP enrichment event. Furthermore, based \ on goodness-of-fit analysis for assessing the reliability of PET clusters, it \ was estimated that less than 30% of the PET-4 clusters and over 90% of the \ PET-5+ clusters (clusters with five or more overlapping members) are true \ enrichment ChIP sites.

\

\ In addition to these statistical analyses, 9 out of 14 genomic locations (64%) \ identified by PET-5+ clusters in the ENCODE regions were supported by ChIP-chip \ data from Yale using the same ChIP DNA as hybridization material.

\ \

Credits

\

\ The ChIP fragment prep was provided by Ghia Euskirchen from Michael Snyder's \ lab at Yale. The ChIP-PET library and sequence data were produced at the\ Genome Institute of Singapore. The data were mapped\ and analyzed by scientists from the Genome Institute of Singapore and\ the Bioinformatics \ Institute, Singapore.

\ \

References

\

\ Ng, P. et al. \ Gene identification signature (GIS) analysis for\ transcriptome characterization and genome annotation. Nature\ Methods 2, 105-111 (2005).

\ encodeChip 1 altColor 67,79,35\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX,chrY\ color 125,140,35\ compositeTrack on\ dataVersion ENCODE Oct 2005 Freeze\ group encodeChip\ longLabel GIS ChIP-PET: STAT1 Ab on (+/-)gIF HeLa cells\ priority 47.1\ shortLabel GIS STAT1 HeLa\ track encodeGisChipPetStat1\ type bed 12\ useScore 1\ visibility hide\ encodeAffyChIpHl60SitesP300Hr32 Affy P300 RA 32h bed 3 . Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 32hrs) Sites 0 48 100 125 0 177 190 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 100,125,0\ longLabel Affymetrix ChIP/Chip (P300 retinoic acid-treated HL-60, 32hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 48\ shortLabel Affy P300 RA 32h\ subGroups factor=P300 time=32h\ track encodeAffyChIpHl60SitesP300Hr32\ softberryGene Fgenesh++ Genes genePred softberryPep Fgenesh++ Gene Predictions 0 48 0 100 0 127 177 127 0 0 0

Description

\

\ Fgenesh++ predictions are based on Softberry's gene-finding software.

\ \

Methods

\

\ Fgenesh++ uses both hidden Markov models (HMMs) and protein similarity to \ find genes in a completely automated manner. For more information, see \ Solovyev, V.V. (2001) in the References section below.

\ \

Credits

\

\ The Fgenesh++ gene predictions were produced by \ Softberry Inc. \ Commercial use of these predictions is restricted to viewing in \ this browser. Please contact Softberry Inc. to make arrangements for further \ commercial access.

\ \

References

\

\ Solovyev, V.V. \ "Statistical approaches in Eukaryotic gene prediction" in the \ Handbook of Statistical Genetics (ed. Balding, D. et al.), \ 83-127. John Wiley & Sons, Ltd. (2001).

\ genes 1 color 0,100,0\ group genes\ longLabel Fgenesh++ Gene Predictions\ priority 48\ shortLabel Fgenesh++ Genes\ track softberryGene\ type genePred softberryPep\ visibility hide\ encodeLIChIP LI ChIP Various bedGraph 4 Ludwig Institute/UCSD ChIP-chip: Pol2 8WG16, TAF1, H3ac, H3K4me2, H3K27me3 antibodies 0 48 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ ENCODE region-wide location analyses were conducted of binding to the \ initiation-complex form of RNA polymerase II (Pol2), \ TATA-associated factor (TAF1), \ acetylated histone H3 (H3ac), \ lysine-4-dimethylated H3 (H3K4me2), \ suppressor of zeste 12 protein homolog (SUZ12), and \ lysine-27-tri-methylated H3 (H3K27me3).\ The analyses used chromatin extracted from \ IMR90 (lung fibroblast),\ HCT116 (colon epithelial carcinoma),\ HeLa (cervix epithelial adenocarcinoma), and\ THP1 (blood monocyte leukemia) cells.\ The initiation-complex form of Pol2 is associated with the transcription\ start site, as is TAF1. Both H3ac and H3K4me2 are associated with\ transcriptionally-active "open" chromatin.

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite tracks. \ Data for each antibody/cell line pair is displayed in a separate \ subtrack. See the top of the track description page for a complete\ list of the subtracks available for this annotation. The subtracks \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by the list of subtracks. To display \ only selected subtracks, uncheck the boxes next to the tracks you wish to \ hide. For more information about the graphical configuration options, click \ the \ Graph\ configuration help link.

\ \

Methods

\

\ Chromatin from each of the four cell lines was separately cross-linked, \ precipitated with antibody to one of the six proteins, sheared, amplified and\ hybridized to a PCR DNA tiling array produced at the Ren Lab at UC San Diego. \ The array was composed of 24,537 non-repetitive sequences within the 44 \ ENCODE regions.

\

\ For each marker, there were three biological replicates. Each experiment was \ normalized using the median values. The P-value and R-value were \ calculated using the modified single array error model \ (Li, Z. et al., 2003).\ The P-value and R-value were then derived from the weighted average \ results of the replicates.

\

\ The displayed values were scaled to 0 - 16, corresponding to negative log \ base 10 of the P-value.

\ \

Verification

\

\ Each of the experiments has three biological replicates. The \ array platform, the \ raw and normalized data for each experiment, and the \ image files have all been deposited at the NCBI \ GEO Microarray \ Database.

\ \

Credits

\

\ The data for this track were generated at the \ Ren Lab, Ludwig \ Institute for Cancer Research at UC San Diego.

\ \

References

\

\ Kim, T., Barrera, L.O., Qu, C., van Calcar, S., Trinklein, N.,\ Cooper, S., Luna, R., Glass, C.K., Rosenfeld, M.G., \ Myers, R., Ren, B. \ \ Direct isolation and identification of promoters in the human genome.\ Genome Research 15,830-839 (2005).

\

\ Li, Z., Van Calcar, S., Qu, C., Cavenee, W.K., Zhang, M.Z., and Ren, B. \ A global transcriptional regulatory role for c-Myc in \ Burkitt's lymphoma cells.\ Proc. Natl. Acad. Sci. 100(14), 8164-8169 (2003).\

\ Ren, B., Robert, F., Wyrick, J. W., Aparicio, O., Jennings, E.\ G., Simon, I., Zeitlinger, J., Schreiber, J., Hannett, N., Kanin, E.,\ Volkert , T. L., Wilson, C., Bell, S. P. and Young, R. A. \ Genome-wide location and function of DNA-associated proteins\ Science 290(5500), 2306-2309 (2000).

\ \ encodeChip 0 autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel Ludwig Institute/UCSD ChIP-chip: Pol2 8WG16, TAF1, H3ac, H3K4me2, H3K27me3 antibodies\ maxHeightPixels 128:16:16\ maxLimit 16\ minLimit 0\ origAssembly hg16\ priority 48.0\ shortLabel LI ChIP Various\ superTrack encodeUcsdChipSuper dense\ track encodeLIChIP\ type bedGraph 4\ viewLimits 0:14\ visibility hide\ encodeUcsdChipSuper LI/UCSD ChIP Ludwig Institute/UC San Diego ChIP-chip 0 48 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Overview

\

\ This super-track combines related tracks of ChIP-chip data generated by\ the Ludwig Institute/UCSD ENCODE group.\ ChIP-chip, also known as genome-wide location analysis, is a technique for\ isolation and identification of DNA sequences bound by specific proteins in\ cells, including histones. Histone methylation and acetylation serves as a\ stable genomic imprint\ that regulates gene expression and other epigenetic\ phenomena. These histones are found in transcriptionally active domains\ called euchromatin.

\

\ These tracks contain ChIP-chip data for transcription initiation\ complex (such as Pol2 and TAF1) and H3, H4 histones in multiple\ cell lines, including HeLa (cervical carcinoma), IMR90 (human fibroblast), \ and HCT116 (colon epithelial carcinoma), with some experiments\ including interferon-gamma induction.\

\ \

Credits

\

\ The data for this track were generated at the\ Ren Lab, Ludwig\ Institute for Cancer Research at UC San Diego.

\ \

References

\

\ Kim TH, Barrera LO, Qu C, Van Calcar S, Trinklein ND, Cooper SJ, Luna RM,\ Glass CK, Rosenfeld MG, Myers RM, Ren B.\ Direct isolation and identification of promoters in the human\ genome.\ Genome Res. 2005 Jun;15(6):830-9.

\

\ Li Z, Van Calcar S, Qu C, Cavenee WK, Zhang MQ, Ren B.\ A global transcriptional regulatory role for c-Myc in Burkitt's\ lymphoma cells.\ Proc Natl Acad Sci U S A. 2003 Jul 8;100(14):8164-9.

\

\ Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J,\ Schreiber J, Hannett N, Kanin E et al.\ Genome-wide location and function\ of DNA-associated proteins.\ Science. 2000 Dec 22;290(5500):2306-9.

\

\ Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, Richmond TA, Wu Y, Green RD,\ Ren B.\ A high-resolution map of active promoters in the human genome.\ Nature. 2005 Aug 11;436(7052):876-80.

\ encodeChip 0 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ group encodeChip\ longLabel Ludwig Institute/UC San Diego ChIP-chip\ priority 48\ shortLabel LI/UCSD ChIP\ superTrack on\ track encodeUcsdChipSuper\ encodeAffyChIpHl60PvalPu1Hr00 Affy PU1 RA 0h wig 0.0 534.54 Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 0hrs) P-Value 0 49 75 150 0 165 202 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 75,150,0\ longLabel Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 0hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 49\ shortLabel Affy PU1 RA 0h\ subGroups factor=PU1 time=0h\ track encodeAffyChIpHl60PvalPu1Hr00\ geneid Geneid Genes genePred geneidPep Geneid Gene Predictions 0 49 0 90 100 127 172 177 0 0 0

Description

\

\ This track shows gene predictions from the geneid program developed at the \ Genome Bionformatics \ Laboratory (GBL), which is part of the \ Grup de Recerca\ en Informàtica Biomèdica (GRIB) at the Institut Municipal d'Investigació \ Mèdica (IMIM) / Centre de Regulació Genòmica (CRG) in Barcelona."\ \ \

\

Methods

\

\ Geneid is a program to predict genes in anonymous genomic sequences designed \ with a hierarchical structure. In the first step, splice sites, start and stop \ codons are predicted and scored along the sequence using Position Weight Arrays \ (PWAs). Next, exons are built from the sites. Exons are scored as the sum of the \ scores of the defining sites, plus the the log-likelihood ratio of a \ Markov Model for coding DNA. Finally, from the set of predicted exons, the gene \ structure is assembled, maximizing the sum of the scores of the assembled exons. \

\

Credits

\

\ Thanks to GBL for providing these data.\

\ genes 1 color 0,90,100\ group genes\ longLabel Geneid Gene Predictions\ priority 49\ shortLabel Geneid Genes\ track geneid\ type genePred geneidPep\ visibility hide\ encodeLIChIPgIF LI gIF ChIP bedGraph 4 Ludwig Institute/UCSD ChIP-chip - Gamma Interferon Experiments 0 49 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ ENCODE region-wide location analysis of histones H3 and H4 with antibodies\ H3K4me2, H3K4me3, H3ac, H4ac, STAT1, RNA polymerase II and TAF1 \ was conducted with ChIP-chip, using chromatin extracted from HeLa\ cells induced for 30 min with interferon-gamma as well as uninduced cells.\ The H3K4me2, H3K4me3, H3ac form of histone H3, and H4ac form of \ histone H4 are associated with up-regulation of gene expression. STAT1\ (signal transducer and activator of transcription) binds to DNA and activates\ transcription in response to various cytokines, including interferon-gamma.\

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ "wiggle" tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\ \

Methods

\

\ Chromatin from both induced and uninduced cells was separately cross-linked,\ precipitated with the antibodies, sheared, amplified and hybridized\ to a PCR DNA tiling array produced at the Ren Lab at UC San Diego. The array\ was composed of 24,537 non-repetitive sequences within the 44 ENCODE regions.\

\

\ Each state had three or more biological replicates. Each \ experiment was loess-normalized using R. The P-value and R-value were \ calculated using the modified single array error model (Li, Z. et \ al., 2003). The P-value and R-value were then derived from the weighted \ average results of the replicates.

\

\ The displayed values were scaled to 0 - 16, corresponding to negative log base\ 10 of the P-value.

\ \

Verification

\

\ Each of the two experiments has three biological replicates. The \ array platform, the \ raw and normalized data for each experiment, and the\ image files have all been deposited at the NCBI \ GEO Microarray \ Database (pending approval).

\ \

Credits

\

\ The data for this track were generated at the \ Ren Lab, Ludwig \ Institute for Cancer Research at UC San Diego.

\ \

References

\

\ Kim, T., Barrera, L.O., Qu, C., van Calcar, S., Trinklein, N.,\ Cooper, S., Luna, R., Glass, C.K., Rosenfeld, M.G.,\ Myers, R., Ren, B.\ \ Direct isolation and identification of promoters in the human genome.\ Genome Research 15,830-839 (2005).

\

\ Li, Z., Van Calcar, S., Qu, C., Cavenee, W.K., Zhang, M.Z., and Ren, B. \ A global transcriptional regulatory role for c-Myc in Burkitt's \ lymphoma cells.\ Proc. Natl. Acad. Sci. 100(14), 8164-8169 (2003).

\

\ Ren, B., Robert, F., Wyrick, J. W., Aparicio, O., Jennings, E.\ G., Simon, I., Zeitlinger, J., Schreiber, J., Hannett, N., Kanin, E.,\ Volkert , T. L., Wilson, C., Bell, S. P. and Young, R. A. \ Genome-wide location and function of DNA-associated proteins\ Science 290(5500), 2306-2309 (2000).

\ encodeChip 0 autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel Ludwig Institute/UCSD ChIP-chip - Gamma Interferon Experiments\ maxHeightPixels 128:16:16\ maxLimit 16\ minLimit 0\ origAssembly hg16\ priority 49.0\ shortLabel LI gIF ChIP\ superTrack encodeUcsdChipSuper dense\ track encodeLIChIPgIF\ type bedGraph 4\ viewLimits 0:12\ visibility hide\ encodeGisRnaPet GIS-PET RNA bed 12 Gene Identification Signature Paired-End Tags of PolyA+ RNA 0 49.2 0 0 0 127 127 127 0 0 23 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows the starts and ends of mRNA transcripts \ determined by paired-end ditag (PET) sequencing. PETs are composed of 18 \ bases from either end of a cDNA; 36 bp PETs from many clones were \ concatenated together and cloned into pZero-1 for efficient sequencing. See \ the Methods and References sections below for more details on PET sequencing.\

\

\ The PET sequences in this track are full-length transcripts\ derived from two cell lines and mapped on whole genome:\

\ In total, 584,624 PETs were generated for MCF7 and 280,340 PETs were\ generated for HCT116. More than 80% of the PETs in each group were \ mapped to the genome.\ The 474,278 MCF7 PETs and 223,261 HCT116 PETs that mapped with single and\ multiple (up to ten) matches in the genome are shown in the two \ subtracks.

\

\ In the graphical display, \ the ends are represented by blocks connected by a horizontal line. In full\ and packed display modes, the arrowheads on the horizontal line represent the \ direction of transcription, and an ID of the format XXXXX-N-M is \ shown to the left of each PET, where X is the unique ID for each\ PET, N indicates the number of mapping locations in the genome \ (1 for a single mapping location, 2 for two mapping locations, and so forth),\ and M is the number of PET sequences at this location. The total \ count of PET sequences mapped to the same locus but with slight nucleotide\ differences may reflect the expression level of the transcripts. PETs that \ mapped to multiple locations may represent low complexity or repetitive \ sequences.

\

\ The graphical display also uses color coding to reflect the uniqueness\ and expression level of each PET:

\
\ \ \ \ \ \ \
ColorMappingPETS observed at location
dark blueunique2 or more
light blueunique1
medium brownmultiple2 or more
light brownmultiple1
\ \

Methods

\

\ PolyA+ RNA was isolated from the cells. A full-length cDNA library was\ constructed and converted into a PET library for Gene\ Identification Signature analysis (Ng et al., 2005). Generation of \ PET sequences involved cloning of cDNA sequences into the plasmid vector, \ pGIS3. pGIS3 contains two MmeI recognition sites that\ flank the cloning site, which were used to produce a 36 bp PET. Each 36 bp PET \ sequence contains 18 bp from each of the 5' and 3' ends of the original \ full-length cDNA clone. The 18 bp 3' signature contains 16 bp 3'-specific \ nucleotides and an AA residual of the polyA tail to indicate the sequence \ orientation. PET sequences were mapped to the genome using the following \ specific criteria:\

\

\ Most of the PET sequences (more than 90%) were mapped to specific locations\ (single mapping loci). PETs mapping to 2 - 10 locations are\ also included and may represent duplicated genes or pseudogenes in\ the genome.

\ \

Verification

\

\ To assess overall PET quality and mapping specificity, the top ten most \ abundant PET clusters that mapped to well-characterized known genes were\ examined. Over 99% of the PETs represented full-length transcripts, and the \ majority fell within ten bp of the known 5' and 3' boundaries of these \ transcripts. The PET mapping was further verified by confirming the existence \ of physical cDNA clones represented by the ditags. PCR primers were designed\ based on the PET sequences and amplified the corresponding cDNA inserts from \ the parental GIS flcDNA library for sequencing analysis. In a set of 86\ arbitrarily-selected PETs representing a wide range of annotation\ categories — including known genes (38 PETs), predicted genes (2 PETs),\ and novel transcripts (46 PETs) — 84 (97.7%) confirmed\ the existence of bona fide transcripts.

\ \

Credits

\

\ The GIS-PET libraries and sequence data for transcriptome analysis were \ produced at the \ Genome Institute of Singapore. The data were\ mapped and analyzed by scientists from the Genome Institute of\ Singapore and the \ Bioinformatics Institute of \ Singapore.

\ \

References

\

\ Ng, P. et al.\ Gene identification signature (GIS) analysis for transcriptome \ characterization and genome annotation.\ Nat. Methods 2(2), 105-11 (2005).

\ encodeGenes 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeGenes\ itemRgb on\ longLabel Gene Identification Signature Paired-End Tags of PolyA+ RNA\ priority 49.2\ shortLabel GIS-PET RNA\ track encodeGisRnaPet\ type bed 12\ visibility hide\ encodeAffyChIpHl60SitesPu1Hr00 Affy PU1 RA 0h bed 3 . Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 0hrs) Sites 0 50 75 150 0 165 202 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 75,150,0\ longLabel Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 0hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 50\ shortLabel Affy PU1 RA 0h\ subGroups factor=PU1 time=0h\ track encodeAffyChIpHl60SitesPu1Hr00\ genscan Genscan Genes genePred genscanPep Genscan Gene Predictions 0 50 170 100 0 212 177 127 0 0 0

Description

\

\ This track shows predictions from the \ Genscan program \ written by Chris Burge.\ The predictions are based on transcriptional, \ translational and donor/acceptor splicing signals as well as the length \ and compositional distributions of exons, introns and intergenic regions.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks. \

\ The track description page offers the following filter and configuration\ options:\

\ \

Methods

\

\ For a description of the Genscan program and the model that underlies it, \ refer to Burge and Karlin (1997) in the References section below. \ The splice site models used are described in more detail in Burge (1998)\ below.

\ \

Credits

\ Thanks to Chris Burge for providing the Genscan program.\ \

References

\

\ Burge C. \ Modeling Dependencies in Pre-mRNA Splicing Signals. \ In: Salzberg S, Searls D, Kasif S, editors. \ Computational Methods in Molecular Biology. \ Amsterdam: Elsevier Science; 1998. p. 127-163.

\

\ Burge C, Karlin S. \ Prediction of complete gene structures in human genomic DNA.\ J. Mol. Biol. 1997 Apr 25;268(1):78-94.

\ genes 1 color 170,100,0\ group genes\ longLabel Genscan Gene Predictions\ priority 50\ shortLabel Genscan Genes\ track genscan\ type genePred genscanPep\ visibility hide\ encodeUcsdNgGif LI Ng gIF ChIP bedGraph 4 Ludwig Institute/UCSD ChIP/Chip NimbleGen - Gamma Interferon Experiments 0 50 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays results of the following ChIP-chip (NimbleGen) \ gamma interferon experiments on HeLa cells:\

\

\ ENCODE region-wide location analysis of trimethylated K4 histone H3 (H3K4me3,\ or triMeH3K4) and RNA polymerase II was conducted with ChIP-chip\ using chromatin extracted from HeLa cells induced for 30 minutes with\ gamma interferon as well as uninduced cells.

\ \

Methods

\

\ Chromatin from both induced and uninduced HeLa cells was separately\ cross-linked, precipitated with different antibodies, sheared,\ amplified and hybridized to an oligonucleotide tiling array produced by \ NimbleGen Systems. \ The array includes non-repetitive sequences within the 44 ENCODE\ regions tiled from NCBI Build 35 (UCSC hg17) with 50-mer probes at 38\ bp interval. Resulting genomic coordinates were translated to NCBI\ Build 34 (UCSC hg16).

\

\ Intensity values for biological replicate arrays were combined after\ quantile normalization using \ R. \ The averages of the quantile\ normalized intensity values for each probe were then median-scaled and\ loess-normalized using R to obtain the adjusted log R values.

\ \

Verification

\

\ Three biological replicates were used to generate the track for each\ factor at each time point with the exception of RNA Pol2 uninduced,\ for which only two biological replicates were used.

\ \

Credits

\

\ The data for this track were generated at the \ Ren Lab, \ Ludwig Institute for Cancer Research at UC San Diego.

\ \ encodeChip 0 autoScale off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel Ludwig Institute/UCSD ChIP/Chip NimbleGen - Gamma Interferon Experiments\ maxHeightPixels 128:16:16\ maxLimit 5\ minLimit -5\ origAssembly hg17\ priority 50.0\ shortLabel LI Ng gIF ChIP\ track encodeUcsdNgGif\ type bedGraph 4\ viewLimits -1:3\ windowingFunction mean\ caseControl Case Control chromGraph Case Control Consortium 0 50.2 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays the trend p-values (-log10) of the seven \ diseases reported by The Wellcome Trust Case Control Consortium (see \ References section below). The diseases studied are:\

\ \

Methods

\ \

\ Reported p-values were taken for each of the Illumina550 probes on the genome.\ For visualization purposes, these p-values were logged and negated.

\

\ Most of the data for chromosome X is omitted from this track \ because of the difference in statistical power and the difference in \ control sets used. \ The trend p-value captures the majority of the associated signals, therefore\ it was used to create this track instead of the genotypic p-value.\

\

\ For the complete dataset, visit \ The Wellcome\ Trust Case Control Consortium Web Site.\

\ \

References

\

\ When making use of these data, please cite:

\

\ The Wellcome Trust Case Control Consortium.\ Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.\ Nature. 2007 Jun 7;447(7145):661-78.

\ phenDis 0 compositeTrack on\ group phenDis\ linesAt 5,10\ longLabel Case Control Consortium\ minMax 0,15\ priority 50.2\ shortLabel Case Control\ track caseControl\ type chromGraph\ visibility hide\ nimhBipolar NIMH Bipolar chromGraph NIMH Bipolar Disease 0 50.3 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays the p-values (-log10) of the bipolar disorder \ pooled data as reported by the NIMH Genetics Initiative Bipolar Disorder \ Consortium (see References below). \ The Consortium performed a genome-wide asociation study on two populations:\

\ \

Methods

\

\ Reported p-values were taken for each of the Illumina550 probes on the genome. \ For visualization purposes, these p-values were logged and negated. \ These p-values have not been Bonferroni adjusted.

\

\ All of the people in the US population had an affected sibling. 96% of the\ people in the German\ sample reported that both parents and grandparents were born in Germany.\ The German sample was used to replicate the US sample. The following\ conditions were applied to the US sample:\

\ \ The results left 88 SNPs in 80 genes.

\

\ When individually genotyped, 76% remained significant in the US sample,\ and 36% of those remained significant in the German sample. Combined p-values\ were also calculated for the SNPs yielding the most significant find\ rs1012053 p-value = 1.5e-8 for the gene DGKH diacylglyceral kinase,\ which is still significant after Bonferroni adjustment. DGKH\ protein is key in the lithium-sensitive phosphatidyl inositol pathway.\ Several other risk SNPs were identified of small effect.\

\

\ For the complete dataset, visit The National Instututes of Mental Health \ (NIMH) \ MAP Genetics Web Site.\

\ \

References

\

\ Baum AE, Akula N, Cabanero M, Cardona I, Corona W, Klemens B, Schulze TG, \ Cichon S, Rietschel M, Nothen MM et al.\ \ A genome-wide association study implicates diacylglycerol kinase eta \ (DGKH) and several other genes in the etiology of bipolar disorder.\ Molecular Psychiatry. 2007 May 8;:1-11.

\

\ In addition, those using data from the NIMH sample should cite \ the NIMH Genetics Initiative Bipolar Disorder Consortium by \ use of the language specified \ in the Distribution Agreement from the \ NIMH Center \ for Collaborative Genetic Studies.\

\ \ phenDis 0 compositeTrack on\ group phenDis\ linesAt 1,3\ longLabel NIMH Bipolar Disease\ minMax 0,5\ priority 50.3\ shortLabel NIMH Bipolar\ track nimhBipolar\ type chromGraph\ visibility hide\ exoniphy Exoniphy genePred Exoniphy Human/Mouse/Rat/Dog 0 50.9 173 17 162 214 136 208 0 0 0

Description

\

\ The exoniphy program identifies evolutionarily conserved protein-coding exons in\ multiple, aligned sequences using a phylogenetic hidden Markov model\ (phylo-HMM), a kind of statistical model that simultaneously describes exon\ structure and exon evolution. This track shows exoniphy predictions\ for the human Jul. 2003 (hg16), mouse Feb. 2003 (mm3), and rat Jun. 2003\ (rn3) genomes, as aligned by the multiz program.\

\

Methods

\

\ Exoniphy is described in Siepel A & Haussler D (2004), "Computational\ identification of evolutionarily conserved exons," RECOMB '04.\ Multiz is described in Blanchette M et al. (2004), "Aligning\ multiple genomic sequences with the threaded blockset aligner,"\ Genome Res. 14:708-175.\ genes 1 color 173,17,162\ group genes\ longLabel Exoniphy Human/Mouse/Rat/Dog\ priority 50.9\ shortLabel Exoniphy\ track exoniphy\ type genePred\ visibility hide\ exoniphyGene Exoniphy Genes genePred Predicted Genes and Gene Fragments from Exoniphy Exons (Human/Mouse/Rat) 0 50.9 173 17 162 214 136 208 0 0 0 genes 1 color 173,17,162\ group genes\ longLabel Predicted Genes and Gene Fragments from Exoniphy Exons (Human/Mouse/Rat)\ priority 50.9\ shortLabel Exoniphy Genes\ track exoniphyGene\ type genePred\ visibility hide\ encodeAffyChIpHl60PvalPu1Hr02 Affy PU1 RA 2h wig 0.0 534.54 Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 2hrs) P-Value 0 51 75 150 0 165 202 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 75,150,0\ longLabel Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 2hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 51\ shortLabel Affy PU1 RA 2h\ subGroups factor=PU1 time=2h\ track encodeAffyChIpHl60PvalPu1Hr02\ genscanExtra Genscan Extra bed 6 . Genscan Extra (Suboptimal) Exon Predictions 0 51 180 90 0 217 172 127 0 0 1 chr22, genes 1 chromosomes chr22,\ color 180,90,0\ group genes\ longLabel Genscan Extra (Suboptimal) Exon Predictions\ priority 51\ shortLabel Genscan Extra\ track genscanExtra\ type bed 6 .\ visibility hide\ gwasCatalog GWAS Catalog bed 4 + NHGRI Catalog of Published Genome-Wide Association Studies 0 51 0 90 0 127 172 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\

\ This track displays single nucleotide polymorphisms (SNPs) identified by published \ Genome-Wide Association Studies (GWAS), collected in the \ Catalog of Published \ Genome-Wide Association Studies (www.genome.gov/gwastudies) at the \ National Human Genome Research \ Institute (NHGRI).\ Some abbreviations\ are used above.\

\

\ From http://www.genome.gov/gwastudies:\

\ The genome-wide association study (GWAS) publications listed here\ include only those attempting to assay at least 100,000 single\ nucleotide polymorphisms (SNPs) in the initial stage. Publications are\ organized from most to least recent date of publication, indexing from\ online publication if available. Studies focusing only on candidate\ genes are excluded from this catalog. Studies are identified through\ weekly PubMed literature searches, daily NIH-distributed compilations\ of news and media reports and occasional comparisons with an existing\ database of GWAS literature \ (HuGE Navigator).\
\

\ \

Methods

\

\ From http://www.genome.gov/gwastudies:\

\ SNP-trait associations listed here are limited to those with p-values\ < 1.0 x 10-5 (see full methods for additional details). Multipliers of\ powers of 10 in p-values are rounded to the nearest single digit; odds\ ratios and allele frequencies are rounded to two decimals. Standard\ errors are converted to 95 percent confidence intervals where\ applicable. Allele frequencies, p-values and odds ratios derived from\ the largest sample size, typically a combined analysis (initial plus\ replication studies), are recorded below if reported; otherwise,\ statistics from the initial study sample are recorded. For\ quantitative traits, information on % variance explained, SD\ increment, or unit difference is reported where available. Odds ratios (OR)\ < 1 in the original paper are converted to OR > 1 for the alternate\ allele. Where results from multiple genetic models are available, we\ prioritized effect sizes (ORs or beta-coefficients) as follows: 1)\ genotypic model, per-allele estimate; 2) genotypic model, heterozygote\ estimate, 3) allelic model, allelic estimate.\

\ Gene regions corresponding to SNPs were identified from the UCSC\ Genome Browser. Gene names and risk alleles are those reported by the\ authors in the original paper. Only one SNP within a gene or region of\ high linkage disequilibrium is recorded unless there was evidence of\ independent association.\

\ Occasionally the term "pending" is used to denote one or more studies\ that we identified as an eligible GWAS, but for which SNP information\ has not yet been extracted; studies of CNVs are also noted as pending.\
\

\

Reference

\

\ Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA.\ Potential etiologic and functional implications of genome-wide association \ loci for human diseases and traits. PNAS 2009 Jun 9;106(23):9362-7.\ phenDis 1 color 0,90,0\ group phenDis\ longLabel NHGRI Catalog of Published Genome-Wide Association Studies\ priority 51\ shortLabel GWAS Catalog\ snpTable snp130\ snpVersion 130\ track gwasCatalog\ type bed 4 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility hide\ augustus Augustus genePred Augustus Gene Predictions 0 51.7 180 0 0 217 127 127 0 0 0

Description

\

\ This track shows predictions of AUGUSTUS.\ AUGUSTUS is available through the GOBICS web\ server.

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite tracks.\ To display only selected subtracks, uncheck the boxes next to the tracks \ you wish to hide. This track also follows the display conventions for \ gene prediction \ tracks.

\

\ This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions.\ To display codon colors, select the genomic codons option from the\ Color track by codons pull-down menu. Click the\ Help on codon coloring \ link for more information about this feature.

\ \

Methods

\

\ AUGUSTUS uses a generalized hidden Markov model (GHMM) that models coding and \ non-coding sequence, splice sites, the branch point region, translation start \ and end, and lengths of exons and introns. This version has been trained on a \ set of 1284 human genes.\ \

Augustus Gene Predictions Using Hints

\

\ This subtrack was made using hints from several other tracks:\

\

\ \

Augustus De Novo Gene Predictions

\

This subtrack was made using only the target genome sequence and \ evolutionary conservation. The conservation information was extracted \ from the Exoniphy track and the PhastCons Conserved Elements track. \ Further, hints about retroposed genes were used, that are based only \ on previous de novo predictions of AUGUSTUS. No transcribed \ sequences were used for this track.

\ \

Credits

\

\ The Augustus subtracks were created by Mario Stanke. The TransMap track \ was created by Mark Diekhans, the Retroposed Genes tracks by Robert \ Baertsch, and the Exoniphy and PhastCons Conserved Elements tracks by \ Adam Siepel's group.

\ \

References

\

\ Stanke M. \ Gene prediction with a hidden Markov model.\ Ph.D. thesis. Universität Göttingen, Germany. 2004.

\ \

\ Stanke M, Steinkamp R, Waack S, Morgenstern B. \ AUGUSTUS: a web server for gene finding in eukaryotes. \ Nucl Acids Res. 2004 Jul 1;32(Web Server Issue):W309-12.

\ \

\ Stanke M, Tzvetkova A, Morgenstern B. \ \ AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved \ gene prediction in the human genome. \ Genome Biology. 2006;7(Suppl 1):S11.

\ \

\ Stanke M, Waack S. \ Gene prediction with a hidden Markov model and a new intron\ submodel.\ Bioinformatics. 2003 Sep;19(Suppl. 2):ii215-25.

\ genes 1 color 180,0,0\ group genes\ longLabel Augustus Gene Predictions\ priority 51.7\ shortLabel Augustus\ track augustus\ type genePred\ visibility hide\ encodeAffyChIpHl60SitesPu1Hr02 Affy PU1 RA 2h bed 3 . Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 2hrs) Sites 0 52 75 150 0 165 202 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 75,150,0\ longLabel Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 2hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 52\ shortLabel Affy PU1 RA 2h\ subGroups factor=PU1 time=2h\ track encodeAffyChIpHl60SitesPu1Hr02\ encodeRna Known+Pred RNA bed 6 + Known and Predicted RNA Transcription in the ENCODE Regions 0 52 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows the locations of known and predicted non-protein-coding RNA \ genes and pseudogenes that fall within the ENCODE regions. It contains all \ information in Sean Eddy's RNA Genes track for these regions, combined with \ computational predictions generated by Jakob Skou Pedersen's EvoFold algorithm. \ In addition to the fields contained in the RNA Genes track, this track also \ includes ENCODE-related fields describing overlap with transcribed regions and \ repeats.

\

\ Feature types in this annotation include:\

\ \

Display Conventions and Configuration

\

\ The locations of the RNA genes and pseudogenes are represented by blocks in the \ graphical display, color-coded as follows: \

\

\ The display may be filtered to show only those items \ with unnormalized scores that meet or exceed a certain threshhold. To set a \ threshhold, type the minimum score into the text box at the top of the \ description page.

\ \

Methods

\

\ The RNA Genes track was supplemented with EvoFold predictions and filtered to \ include only those items that lie within the ENCODE regions. \ Regions that are at least 10 percent Repeatmasked are flagged because no \ transcriptional data is available for them. A region is considered transcribed \ if at least 10 percent overlaps with any Affymetrix transcribed fragment \ (transfrag), derived from six microarray experiments, or Yale \ transcriptionally-active region (TAR), derived from 15 microarray experiments. \ In these cases, each array from which the overlapped transfrags and TARs were \ derived is listed.

\

\ EvoFold is a comparative method that exploits the evolutionary signal\ of genomic multiple-sequence alignments for identifying conserved\ functional RNA structures. The method makes use of phylogenetic\ stochastic context-free grammars (phylo-SCFGs), which are combined\ probabilistic models of RNA secondary structure and primary sequence\ evolution. The predictions consist both of a specific RNA secondary\ structure and an overall score. The overall score is essentially a\ log-odd score phylo-SCFG modeling the constrained evolution of\ stem-pairing regions and one which only models unpaired regions.

\

\ Two sets of EvoFold predictions are included in this track. The first,\ labeled EvoFold, contains predictions based on the conserved elements of an \ 8-way vertebrate alignment of the human, chimpanzee, mouse, rat, dog, chicken, \ zebrafish, and Fugu assemblies. The second set of predictions, TBA23_EvoFold, \ was based on the conserved elements of the 23-way TBA alignments present in the \ ENCODE regions. When a pair of these predictions overlap, only the EvoFold \ prediction is shown.

\ \

Credits

\

\ These data were kindly provided by Sean Eddy at Washington University,\ Jakob Skou Pedersen at UC Santa Cruz, and The Encode Consortium.

\

\ This annotation track was generated by Matt Weirauch.

\ \

References

\

\ Knudsen, B. and J.J. Hein. \ RNA secondary structure prediction using stochastic context-free \ grammars and evolutionary history.\ Bioinformatics 15(6), 446-54 (1999).

\

\ Pedersen, J.S., Bejerano, G. and Haussler, D. Identification and\ classification of conserved RNA secondary structures in the human\ genome. (In preparation).\ encodeGenes 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ dataVersion ENCODE June 2005 Freeze\ group encodeGenes\ longLabel Known and Predicted RNA Transcription in the ENCODE Regions\ origAssembly hg16\ priority 52\ shortLabel Known+Pred RNA\ track encodeRna\ type bed 6 +\ visibility hide\ rnaGene RNA Genes bed 6 + Non-coding RNA Genes (dark) and Pseudogenes (light) 0 52 170 80 0 230 180 130 0 0 0

Description

\

\ This track shows the location of non-protein coding RNA genes and\ pseudogenes. \

\ Feature types include:\

\

\ \

Methods

\ \

\ Eddy-tRNAscanSE (tRNA genes, Sean Eddy):
\ tRNAscan-SE 1.23 with default parameters.\ Score field contains tRNAscan-SE bit score; >20 is good, >50 is great.

\

\ Eddy-BLAST-tRNAlib (tRNA pseudogenes, Sean Eddy):
\ Wublast 2.0, with options "-kap wordmask=seg B=50000 \ W=8 cpus=1".\ Score field contains % identity in blast-aligned region.\ Used each of 602 tRNAs and pseudogenes predicted by tRNAscan-SE\ in the human oo27 assembly as queries. Kept all nonoverlapping\ regions that hit one or more of these with P <= 0.001.

\

\ Eddy-BLAST-snornalib (known snoRNAs and snoRNA pseudogenes, Steve Johnson):
\ Wublastn 2.0, with options "-V=25 -hspmax=5000 -kap wordmask=seg \ B=5000 W=8 cpus=1".\ Score field contains blast score.\ Used each of 104 unique snoRNAs in snorna.lib as a query.\ Any hit >=95% full length and >=90% identity is annotated as a\ "true gene".\ Any other hit with P <= 0.001 is annotated as a "related \ sequence" and interpreted as a putative pseudogene.

\

\ Eddy-BLAST-otherrnalib \ (non-tRNA, non-snoRNA noncoding RNAs with GenBank entries\ for the human gene.):
\ Wublastn 2.0 [15 Apr 2002]\ with options: "-kap -cpus=1 -wordmask=seg -W=8 -E=0.01 -hspmax=0\ -B=50000 -Z=3000000000". Exceptions to this are:\

\

\ The score field contains the blastn score.\ 41 unique miRNAs and 29 other ncRNAs were used as queries.\ Any hit >=95% full length and >=95% identity is annotated as a\ "true gene".\ Any other hit with P <= 0.001 and >= 65% identity is annotated\ as a "related sequence". There is an exception to this:\ all miRNAs consist of 16-26 bp sequences in GenBank\ and are annotated only if they are 100% full length and have\ 100% identity. The set of miRNAs used consists of Let-7 from\ Pasquinelli et al. (2000) and 40 miRNAs from Mourelatos et al. (2002),\ as mentioned in the references section below.\ \

Credits

\

\ These data were kindly provided by Sean Eddy at Washington University.

\ \

References

\

\ Pasquinelli AE, Reinhart BJ, Slack F, Martindale MQ, Kuroda MI, Maller B,\ Hayward DC, Ball EE, Degnan B, Müller P, et al.\ \ Conservation of the sequence and temporal expression of let-7 \ heterochronic regulatory RNA. Nature.\ 2000 Nov 2;408(6808):86-9.

\

\ Mourelatos Z, Dostie J, Paushkin S, Sharma A, Charroux B, Abel L,\ Rappsilber J, Mann M, Dreyfuss G.\ \ miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs.\ Genes Dev. 2002 Mar 15;16(6):720-8.

\ \ genes 1 altColor 230,180,130\ color 170,80,0\ group genes\ longLabel Non-coding RNA Genes (dark) and Pseudogenes (light)\ priority 52\ shortLabel RNA Genes\ track rnaGene\ type bed 6 +\ visibility hide\ encodeAffyChIpHl60PvalPu1Hr08 Affy PU1 RA 8h wig 0.0 534.54 Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 8hrs) P-Value 0 53 75 150 0 165 202 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 75,150,0\ longLabel Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 8hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 53\ shortLabel Affy PU1 RA 8h\ subGroups factor=PU1 time=8h\ track encodeAffyChIpHl60PvalPu1Hr08\ superfamily Superfamily bed 4 + Superfamily/SCOP: Proteins Having Homologs with Known Structure/Function 0 53 150 0 0 202 127 127 0 0 0 http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/cgi-bin/gene.cgi?genome=

Description

\

\ The \ Superfamily \ track shows proteins having homologs with known structures or functions.

\

\ Each entry on the track shows the coding region of a gene (based on Ensembl gene predictions).\ In full display mode, the label for an entry consists of the names of \ all known protein domains encoded by this gene. This \ usually contains structural and/or functional descriptions that provide valuable \ information to help users get a quick grasp of the biological significance of the \ gene.

\ \

Methods

\

\ Data are downloaded from the Superfamily server.\ Using the cross-reference between Superfamily entries and Ensembl gene prediction \ entries and their alignment to the appropriate genome, the associated data are \ processed to generate a simple BED format track.

\

Credits

\

\ Superfamily was developed by\ Julian\ Gough at the MRC Laboratory\ of Molecular Biology, Cambridge.

\

\ Gough, J., Karplus, K., Hughey, R. and\ Chothia, C. (2001). "Assignment of Homology to Genome Sequences using a\ Library of Hidden Markov Models that Represent all Proteins of Known Structure". \ J. Mol. Biol., 313(4), 903-919.

\ \ genes 1 color 150,0,0\ group genes\ longLabel Superfamily/SCOP: Proteins Having Homologs with Known Structure/Function\ priority 53\ shortLabel Superfamily\ track superfamily\ type bed 4 +\ url http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/cgi-bin/gene.cgi?genome=\ visibility hide\ pseudoYale Yale Pseudo genePred Yale Pseudogenes. 0 53 100 50 0 255 240 200 1 0 0 http://www.pseudogene.org/cgi-bin/display-by-acc.cgi?id=$$

Description

\

\ This track shows identified pseudogenes as recorded in the Yale\ Pseudogene Database. For information on how these pseudogenes were\ identified and access to the database, see http://www.pseudogene.org. \ genes 1 altColor 255,240,200\ autoTranslate 0\ color 100,50,0\ group genes\ longLabel Yale Pseudogenes.\ priority 53\ shortLabel Yale Pseudo\ spectrum on\ track pseudoYale\ type genePred\ url http://www.pseudogene.org/cgi-bin/display-by-acc.cgi?id=$$\ visibility hide\ pseudoYale60 Yale Pseudo60 genePred Yale Pseudogenes based on Ensembl Release 60 0 53 0 0 0 127 127 127 1 0 0 http://tables.pseudogene.org/index.cgi?table=Human60&value=$$ genes 1 autoTranslate 0\ dataVersion December 2010\ gClass_Ambiguous 100,91,191\ gClass_Duplicated 100,50,0\ gClass_Processed 180,0,0\ geneClasses Processed Duplicated Ambiguous\ group genes\ itemClassTbl pseudoYale60Class\ longLabel Yale Pseudogenes based on Ensembl Release 60\ priority 53\ shortLabel Yale Pseudo60\ spectrum on\ track pseudoYale60\ type genePred\ url http://tables.pseudogene.org/index.cgi?table=Human60&value=$$\ urlLabel Yale pseudogene.org link:\ visibility hide\ mrna Human mRNAs psl . Human mRNAs from GenBank 1 54 0 0 0 127 127 127 1 0 0

Description

\

\ The mRNA track shows alignments between human mRNAs\ in GenBank and the genome.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA \ display. For example, to apply the filter to all mRNAs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Multiple terms may be entered at once, \ separated by a space. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter \ criteria will be highlighted. If "or" is selected, mRNAs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display mRNAs that match the filter criteria. \ If "include" is selected, the browser will display only those \ mRNAs that match the filter criteria.\

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare mRNAs against the genomic sequence. For more \ information about this option, go to the\ \ Codon and Base Coloring for Alignment Tracks page.\ Several types of alignment gap may also be colored; \ for more information, go to the\ \ Alignment Insertion/Deletion Display Options page.\

\ \

Methods

\

\ GenBank human mRNAs were aligned against the genome using the \ blat program. When a single mRNA aligned in multiple places, \ the alignment having the highest base identity was found. \ Only alignments having a base identity level within 0.5% of\ the best and at least 96% base identity with the genomic sequence were kept.\

\ \

Credits

\

\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J,\ Wheeler DL.\ GenBank: update. Nucleic Acids Res.\ 2004 Jan 1;32(Database issue):D23-6.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ rna 1 baseColorDefault diffCodons\ baseColorUseCds genbank\ baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelPolyA on\ indelQueryInsert on\ longLabel $Organism mRNAs from GenBank\ priority 54\ shortLabel $Organism mRNAs\ showDiffBasesAllScales .\ spectrum on\ table all_mrna\ track mrna\ type psl .\ visibility dense\ encodeAffyChIpHl60SitesPu1Hr08 Affy PU1 RA 8h bed 3 . Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 8hrs) Sites 0 54 75 150 0 165 202 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 75,150,0\ longLabel Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 8hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 54\ shortLabel Affy PU1 RA 8h\ subGroups factor=PU1 time=8h\ track encodeAffyChIpHl60SitesPu1Hr08\ encodeSangerChipH3H4 Sanger ChIP bedGraph 4 Sanger ChIP/Chip (histones H3,H4 antibodies in GM06990, K562 cells) 0 54 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ ENCODE region-wide location analysis of H3 and H4 histones\ was conducted employing ChIP-chip using chromatin extracted from GM06990\ (lymphoblastoid) and K562 (myeloid leukemia-derived) cells.\ Histone methylation and acetylation serves as a stable genomic imprint\ that regulates gene expression and other epigenetic \ phenomena. These histones are found in transcriptionally active domains \ called euchromatin.

\

\
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
TrackCell
Line/Type
AntibodyEpitopeData FTP accessHistoneArray ID
1: SI H3K4m1 GM06990GM06990ab8895H3K4me1H3K4me1_GM06990_1H3 monomethyl lysine 4ENCODE3.1.1
2: SI H3K4m2 GM06990GM06990ab7766H3K4me2H3K4me2_GM06990_1H3 dimethyl lysine 4ENCODE3.1.1
3: SI H3K4m3 GM06990GM06990ab8580H3K4me3H3K4me3_GM06990_2H3 trimethyl lysine 4ENCODE3.1.1
4: SI H3ac GM06990GM0699006-599H3acH3ac_GM06990_1H3 acetylated lysines 9 and 14ENCODE3.1.1
5: SI H4ac GM06990GM0699006-866H4acH4ac_GM06990_1H4 acetylated lysines 5, 8, 12, 16ENCODE3.1.1
6: SI H3K4me2 K562K562ab7766H3K4me2H3K4me2_K562_1H3 K4 dimethylatedENCODE3.1.1
7: SI H3K4me3 K562K562\ ab8580H3K4me3H3K4me3_K562_1H3 trimethyl lysine 4ENCODE3.1.1
8: SI H3ac K562K56206-599H3acH3ac_K562_1H3 acetylatedENCODE3.1.1
9: SI H4ac K562K56206-866H4acH4ac_K562_1H4 acetylatedENCODE3.1.1
\
\

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ "wiggle" tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\ \

Methods

\

\ Chromatin from the cell line was cross-linked with 1% formaldehyde,\ precipitated with antibody binding to the histone, and sheared and hybridized to\ a DNA array. DNA was not amplified prior to hybridization.

\

\ The raw and transformed data files reflect fold enrichment over background,\ averaged over six replicates.

\ \

Verification

\

\ There are six replicates: two technical replicates (immunoprecipitations)\ for each of the three biological replicates (cell cultures).

\

\ Raw and transformed (averaged) data can be downloaded from the Wellcome\ Trust Sanger Institute FTP site as indicated in the table above.

\ \

Credits

\

\ The data for this track were generated by the \ ENCODE investigators at the \ Wellcome Trust Sanger \ Institute, Hinxton, UK.

\ encodeChip 0 autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel Sanger ChIP/Chip (histones H3,H4 antibodies in GM06990, K562 cells)\ maxHeightPixels 128:40:16\ maxLimit 75\ minLimit -46.1\ priority 54.0\ shortLabel Sanger ChIP\ track encodeSangerChipH3H4\ type bedGraph 4\ viewLimits 0:10\ visibility hide\ luNega UCSC Pseudo bed 12 UCSC Pseudogenes 0 54 0 0 0 127 127 127 0 0 0 genes 1 group genes\ longLabel UCSC Pseudogenes\ priority 54\ shortLabel UCSC Pseudo\ track luNega\ type bed 12\ visibility hide\ acescan ACEScan genePred ACEScan alternative conserved Human-Mouse exon predictions 0 55 125 38 205 190 146 230 0 0 0

Description

\

\ \ This track identifies predicted Alternative Conserved Exons (human-mouse \ conservation), as predicted by ACEScan. These are exons that are \ present in some transcripts, but skipped by alternative splicing in \ other transcripts in both human and mouse. Alternate use of skipped \ exons has important consequences during gene expression and in disease.\ \

Methods

\

\ \ Putative alternative conserved exons on mRNAs were identified using a \ machine-learning algorithm, Regularized Least-Squares Classification. \ Characteristics of known exons that have been skipped in both human \ and mouse mRNAs were determined by considering factors such as\ exon and intron length, splice-site strength, sequence conservation, \ and region-specific oligonucleotide composition.

\ \

A training set was made by comparing known exons that are skipped \ in some transcripts to exons never skipped.\ These characteristics were then applied to the whole genome to predict\ skipped exons in other transcripts. This track displays exons with \ positive ACEScan scores.

\ \

For further details of the method used to generate this annotation, \ please refer to Yeo et al. (2005). \ \

Credits

\

\ \ Thanks to Gene Yeo at the Crick-Jacobs Center, Salk Institute and \ Christopher Burge, MIT, for providing this annotation. For additional \ information on ACEscan predictions please contact \ geneyeo@salk.edu\ \ or \ cburge@mit.edu.\ \ \

References

\

\ \ Yeo GW, Van Nostrand E, Holste D, Poggio T, Burge CB (2005), \ \ Identification and analysis of alternative splicing events conserved \ in human and mouse. \ Proc Natl Acad Sci U S A. 2005 \ Feb 22;102(8):2850-5.\ \

\ genes 1 color 125,38,205\ group genes\ longLabel ACEScan alternative conserved Human-Mouse exon predictions\ priority 55\ shortLabel ACEScan\ track acescan\ type genePred\ visibility hide\ encodeAffyChIpHl60PvalPu1Hr32 Affy PU1 RA 32h wig 0.0 534.54 Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 32hrs) P-Value 0 55 75 150 0 165 202 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 75,150,0\ longLabel Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 32hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 55\ shortLabel Affy PU1 RA 32h\ subGroups factor=PU1 time=32h\ track encodeAffyChIpHl60PvalPu1Hr32\ encodeStanfordChipSuper Stanf ChIP Stanford ChIP-chip 0 55 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Overview

\

\ This super-track combines related tracks of ChIP-chip data \ generated by the Stanford ENCODE group.\ ChIP-chip, also known as genome-wide location analysis, is a technique for\ isolation and identification of DNA sequences bound by specific proteins in\ cells. \

\ These tracks contain data for the Sp1 and Sp3 \ transcription factors in multiple cell lines,\ including HCT116 (colon epithelial carcinoma), \ Jurkat (T-cell lymphoblast), and K562 (myeloid leukemia).\ \

Credits

\

\ The Sp1 and Sp3 data were generated in the\ Richard M. Myers lab at Stanford University (now at HudsonAlpha Institute for Biotechnology).

\ \

References

\

\ Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P,\ Brockman W, Kim TK, Koche RP et al.\ Genome-wide maps of chromatin state in pluripotent and \ lineage-committed cells.\ Nature. 2007 Aug 2;448, 553-60.

\

\ Trinklein ND, Murray JI, Hartman SJ, Botstein D, Myers RM.\ The role of heat shock transcription factor 1 in the\ genome-wide regulation of the mammalian heat shock response.\ Mol. Biol. Cell. 2004 Mar;15(3):1254-61.

\ encodeChip 0 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ group encodeChip\ longLabel Stanford ChIP-chip\ priority 55.0\ shortLabel Stanf ChIP\ superTrack on\ track encodeStanfordChipSuper\ encodeStanfordChip Stanf ChIP bedGraph 4 Stanford ChIP-chip (HCT116, Jurkat, K562 cells; Sp1, Sp3 ChIP) 0 55 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays regions bound by Sp1 and Sp3, in the following \ three cell lines, \ assayed by ChIP and microarray hybridization:\

\

\

\ \ \ \ \ \
Cell LineClassificationIsolated From
HCT 116colorectal carcinomacolon
Jurkat, Clone E6-1acute T cell leukemiaT lymphocyte
K-562chronic myelogenous leukemia (CML)bone marrow
\
\

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\ \

Methods

\

\ Chromatin IP was performed as described in Trinklein et\ al. (2004). Amplified and labeled ChIP DNA was hybridized to\ oligo tiling arrays produced by NimbleGen, along with a total genomic\ reference sample. The data for each array were median subtracted (log\ 2 ratios) and normalized (divided by the standard deviation). \ The value given for each probe is \ the transformed mean ratio of ChIP DNA:Total DNA.

\ \

Verification

\

\ Three biological replicates and two technical replicates were\ performed. The Myers lab is currently testing the specificity and\ sensitivity using real-time PCR.

\ \

Credits

\

\ These data were generated in the Richard M. \ Myers lab at Stanford University (now at\ HudsonAlpha Institute for Biotechnology).

\ \

References

\

\ Trinklein, N.D., Chen, W.C., Kingston, R.E. and Myers, R.M. \ The role of heat shock transcription factor 1 in the \ genome-wide regulation of the mammalian heat shock response.\ Mol. Biol. Cell 15(3), 1254-61 (2004).

\ encodeChip 0 altColor 150,0,25\ autoScale off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 120,0,20\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel Stanford ChIP-chip (HCT116, Jurkat, K562 cells; Sp1, Sp3 ChIP)\ maxHeightPixels 128:16:16\ maxLimit 114\ minLimit 0\ origAssembly hg16\ priority 55.0\ shortLabel Stanf ChIP\ superTrack encodeStanfordChipSuper dense\ track encodeStanfordChip\ type bedGraph 4\ viewLimits 0:10\ visibility hide\ tightMrna Tight mRNAS psl . Tightly Filtered Human mRNAs from GenBank 0 55 0 0 0 127 127 127 1 0 0 rna 1 baseColorDefault diffCodons\ baseColorUseCds genbank\ baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelPolyA on\ indelQueryInsert on\ longLabel Tightly Filtered $Organism mRNAs from GenBank\ priority 55\ shortLabel Tight mRNAS\ showDiffBasesAllScales .\ spectrum on\ track tightMrna\ type psl .\ visibility hide\ encodeStanfordChipSmoothed Stanf ChIP Score bedGraph 4 Stanford ChIP-chip Smoothed Score 0 55.1 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays smoothed (sliding-window mean) scores for \ regions bound by Sp1 and Sp3 in the following \ three cell lines, assayed by ChIP and microarray hybridization:

\

\

\ \ \ \ \ \
Cell LineClassificationIsolated From
HCT 116colorectal carcinomacolon
Jurkat, Clone E6-1acute T cell leukemiaT lymphocyte
K-562chronic myelogenous leukemia (CML)bone marrow
\
\

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\ \

Methods

\

\ Chromatin IP was performed as described in Trinklein et\ al. (2004). Amplified and labeled ChIP DNA was hybridized to\ oligo tiling arrays produced by NimbleGen along with a total genomic\ reference sample. The data for each array were median subtracted (log\ 2 ratios) and normalized (divided by the standard deviation).

\

\ The transformed mean ratios of ChIP DNA:Total DNA for \ all probes were then smoothed by calculating a sliding-window mean. \ Windows of six neighboring probes (sliding two probes at a time) were \ used; within each window, the highest and lowest value were dropped, \ and the remaining 4 values were averaged. To increase the \ contrast between high and low values for visual display, the average \ was converted to a score by the formula:

\ score = 8^(average) * 10
.\ These scores are for visualization purposes; for all analyses, \ the raw ratios, which are available in the Stanf ChIP track, should be used. \

\ \

Verification

\

\ Three biological replicates and two technical replicates were\ performed. The Myers lab is currently testing the specificity and\ sensitivity using real-time PCR.

\ \

Credits

\

\ These data were generated in the Richard M. Myers lab \ at Stanford University (now at \ HudsonAlpha Institute for Biotechnology).

\ \

References

\ Trinklein, N.D., Chen, W.C., Kingston, R.E. and Myers, R.M. \ The role of heat shock transcription factor 1 in the \ genome-wide regulation of the mammalian heat shock response.\ Mol. Biol. Cell 15(3), 1254-61 (2004).

\ encodeChip 0 altColor 150,0,25\ autoScale off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 120,0,20\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel Stanford ChIP-chip Smoothed Score\ maxHeightPixels 128:16:16\ maxLimit 721474\ minLimit 0\ origAssembly hg16\ priority 55.1\ shortLabel Stanf ChIP Score\ superTrack encodeStanfordChipSuper dense\ track encodeStanfordChipSmoothed\ type bedGraph 4\ viewLimits 0:1000\ visibility hide\ encodeAffyChIpHl60SitesPu1Hr32 Affy PU1 RA 32h bed 3 . Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 32hrs) Sites 0 56 75 150 0 165 202 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 75,150,0\ longLabel Affymetrix ChIP/Chip (PU1 retinoic acid-treated HL-60, 32hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 56\ shortLabel Affy PU1 RA 32h\ subGroups factor=PU1 time=32h\ track encodeAffyChIpHl60SitesPu1Hr32\ intronEst Spliced ESTs psl est Human ESTs That Have Been Spliced 1 56 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows alignments between human expressed sequence tags \ (ESTs) in GenBank and the genome that show signs of splicing when\ aligned against the genome. ESTs are single-read sequences, typically about \ 500 bases in length, that usually represent fragments of transcribed genes.\

\

\ To be considered spliced, an EST must show \ evidence of at least one canonical intron, i.e. the genomic \ sequence between EST alignment blocks must be at least 32 bases in \ length and have GT/AG ends. By requiring splicing, the level \ of contamination in the EST databases is drastically reduced\ at the expense of eliminating many genuine 3' ESTs.\ For a display of all ESTs (including unspliced), see the \ human EST track.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, darker shading\ indicates a larger number of aligned ESTs.

\

\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Multiple terms may be entered at once, \ separated by a space. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter \ criteria will be highlighted. If "or" is selected, ESTs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display ESTs that match the filter criteria. \ If "include" is selected, the browser will display only those \ ESTs that match the filter criteria.\

\

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those \ that differ from the genomic sequence. For more information about this option,\ go to the \ \ Base Coloring for alignment Tracks page.\ Several types of alignment gap may also be colored; \ for more information, go to the \ \ Alignment Insertion/Deletion Display Options page.\

\ \

Methods

\

\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\

\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the \ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.

\

\ To generate this track, human ESTs from GenBank were aligned \ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligned in multiple places, the alignment having the \ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity \ with the genomic sequence are displayed in this track.

\ \

Credits

\

\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, \ Wheeler DL. \ GenBank: update. Nucleic Acids Res.\ 2004 Jan 1;32(Database issue):D23-6.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ rna 1 baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ intronGap 30\ longLabel $Organism ESTs That Have Been Spliced\ maxItems 300\ priority 56\ shortLabel Spliced ESTs\ showDiffBasesAllScales .\ spectrum on\ track intronEst\ type psl est\ visibility dense\ encodeStanfordChipJohnson Stanf ChIP2 bedGraph 4 Stanford ChIP-chip Johnson (GMO6990, HeLa, HepG2, Jurkat, K562 cells; GABP, SRF, TAF, NRST/REST ChIP) 0 56 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 altColor 150,0,25\ autoScale off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 120,0,20\ compositeTrack on\ dataVersion Mar 2007\ group encodeChip\ longLabel Stanford ChIP-chip Johnson (GMO6990, HeLa, HepG2, Jurkat, K562 cells; GABP, SRF, TAF, NRST/REST ChIP)\ maxHeightPixels 128:16:16\ maxLimit 1000\ minLimit 500\ origAssembly hg17\ priority 56.0\ shortLabel Stanf ChIP2\ track encodeStanfordChipJohnson\ type bedGraph 4\ viewLimits 0:10\ visibility hide\ encodeUCDavisE2F1Median UCD Ng E2F1 bedGraph 4 UC Davis ChIP/Chip NimbleGen - E2F1 ab, HeLa Cells 0 56 32 128 180 180 128 32 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ ChIP analysis was performed using an antibody to E2F1 and HeLa cell chromatin.\ E2F1 is a transcription factor important in controlling cell division.\ Three independently crosslinked preparations of HeLa cells\ were used to provide three independent biological replicates. ChIP\ assays were performed (with minor modifications which can be provided\ upon request) using the protocol found at\ The Farnham \ Laboratory. Array hybridizations were performed using standard\ NimbleGen Systems Inc. conditions.

\ \

Display Conventions and Configuration

\

\ This track may be configured in a variety of ways to highlight different \ aspects of the displayed data. For more information about the graphical \ configuration options, click the \ Graph\ configuration help link.

\ \

Methods

\

\ Ratio intensity values (E2F1 vs. total) for each of three biological \ replicates were calculated and converted to log2. Each set \ of ratio values was then independently scaled by its Tukey bi-weight mean. \ The three replicates were then combined by taking the median scaled \ log2 ratio for each oligo.

\ \

Verification

\

\ Primers were chosen to correspond to 13 individual peaks. \ PCR reactions were performed for each of the 13 primer sets using \ amplicons derived from each of three biological samples (39 reactions). \ The PCR reactions confirmed that all of the 13 chosen peaks were bound \ by E2F1 in all three biological samples.

\ \

Credits

\

\ These data were contributed by Mike Singer, Kyle Munn, Nan Jiang, \ Todd Richmond and Roland Green of NimbleGenSystems, Inc., and Matt Oberley, \ David Inman, Mark Bieda, Shally Xu and Peggy Farnham of Farnham Lab.

\ encodeChip 0 altColor 180,128,32\ autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 32,128,180\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel UC Davis ChIP/Chip NimbleGen - E2F1 ab, HeLa Cells\ maxHeightPixels 128:16:16\ maxLimit 4.59\ minLimit -2.41\ origAssembly hg16\ priority 56.0\ shortLabel UCD Ng E2F1\ track encodeUCDavisE2F1Median\ type bedGraph 4\ viewLimits 0:2.2\ visibility hide\ est Human ESTs psl est Human ESTs Including Unspliced 0 57 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows alignments between human expressed sequence tags \ (ESTs) in GenBank and the genome. ESTs are single-read sequences, \ typically about 500 bases in length, that usually represent fragments of \ transcribed genes.

\

\ NOTE: As of April, 2007, we no longer include GenBank sequences \ that contain the following URL as part of the record:\

\ http://fulllength.invitrogen.com\
\ Some of these entries are the result of alignment to pseudogenes,\ followed by "correction" of the EST to match the genomic sequence. \ It is therefore not the sequence of the actual EST and makes it appear that \ the EST is transcribed. Invitrogen no longer sells the clones.\

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Multiple terms may be entered at once, \ separated by a space. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter \ criteria will be highlighted. If "or" is selected, ESTs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display ESTs that match the filter criteria. \ If "include" is selected, the browser will display only those \ ESTs that match the filter criteria.\

\

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those \ that differ from the genomic sequence. For more information about this option,\ click \ here.\ Several types of alignment gap may also be colored; \ for more information, click \ here.\

\ \

Methods

\

\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\

\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the \ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.

\

\ To generate this track, human ESTs from GenBank were aligned \ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligned in multiple places, the alignment having the \ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity \ with the genomic sequence were kept.

\ \

Credits

\

\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J,\ Wheeler DL.\ GenBank: update. Nucleic Acids Res.\ 2004 Jan 1;32(Database issue):D23-6.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ rna 1 baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ intronGap 30\ longLabel $Organism ESTs Including Unspliced\ maxItems 300\ priority 57\ shortLabel $Organism ESTs\ spectrum on\ table all_est\ track est\ type psl est\ visibility hide\ encodeAffyChIpHl60PvalRnapHr00 Affy Pol2 RA 0h wig 0.0 534.54 Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 0hrs) P-Value 0 57 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,175,0\ longLabel Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 0hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 57\ shortLabel Affy Pol2 RA 0h\ subGroups factor=Pol2 time=0h\ track encodeAffyChIpHl60PvalRnapHr00\ rgdEst RGD EST psl est RGD EST 0 57.5 12 12 120 133 133 187 1 0 0 http://rgd.mcw.edu/generalSearch/RgdSearch.jsp?quickSearch=1&searchKeyword=

Description

\

\ This track shows expressed sequence tags (ESTs) downloaded from the\ Rat Genome Database (RGD). An EST is a partial sequence of a randomly-chosen \ cDNA, obtained from the results of a single DNA sequencing reaction. ESTs \ are used to identify transcribed regions in genomic sequence and to \ characterize patterns of gene expression in the tissue from which the \ cDNA was derived.

\ \

Methods

\

\ The data used to create this annotation were obtained from the file \ RGD_EST.gff downloaded from the RGD website.

\ \

Credits

\

\ Thanks to the RGD for \ providing this annotation. RGD is funded by grant HL64541 entitled \ "Rat Genome Database", awarded to Dr. Howard J Jacob, Medical College of \ Wisconsin, from the National Heart Lung and Blood Institute \ (NHLBI) of the \ National Institutes of Health \ (NIH).

\ \ rna 1 color 12,12,120\ group rna\ longLabel RGD EST\ priority 57.5\ shortLabel RGD EST\ spectrum on\ track rgdEst\ type psl est\ url http://rgd.mcw.edu/generalSearch/RgdSearch.jsp?quickSearch=1&searchKeyword=\ visibility hide\ encodeAffyChIpHl60SitesRnapHr00 Affy Pol2 RA 0h bed 3 . Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 0hrs) Sites 0 58 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 50,175,0\ longLabel Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 0hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 58\ shortLabel Affy Pol2 RA 0h\ subGroups factor=Pol2 time=0h\ track encodeAffyChIpHl60SitesRnapHr00\ tightEst Tight ESTs psl est Tightly Filtered Human ESTs Including Unspliced 0 58 0 0 0 127 127 127 1 0 0 rna 1 baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Tightly Filtered $Organism ESTs Including Unspliced\ priority 58\ shortLabel Tight ESTs\ spectrum on\ track tightEst\ type psl est\ visibility hide\ encodeAffyChIpHl60PvalRnapHr02 Affy Pol2 RA 2h wig 0.0 534.54 Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 2hrs) P-Value 0 59 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,175,0\ longLabel Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 2hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 59\ shortLabel Affy Pol2 RA 2h\ subGroups factor=Pol2 time=2h\ track encodeAffyChIpHl60PvalRnapHr02\ encodeAffyChIpHl60SitesRnapHr02 Affy Pol2 RA 2h bed 3 . Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 2hrs) Sites 0 60 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 50,175,0\ longLabel Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 2hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 60\ shortLabel Affy Pol2 RA 2h\ subGroups factor=Pol2 time=2h\ track encodeAffyChIpHl60SitesRnapHr02\ encodeBu_ORChID1 BU ORChID wig -0.56 1.58 Boston University ORChID (OH Radical Cleavage Intensity Database) 0 60 44 44 200 200 44 44 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays the predicted hydroxyl radical cleavage\ intensity on naked DNA for each nucleotide in the ENCODE regions.\ Because the hydroxyl radical cleavage intensity is proportional to the solvent\ accessible surface area of the deoxyribose hydrogen atoms \ (Balasubramanian et al., 1998), this track represents a structural \ profile of the DNA in the ENCODE regions.

\

\ Please visit the ORChID website maintained by the Tullius group for \ access to experimental hydroxyl radical cleavage data, and to a server \ which can be used to predict the cleavage pattern for any input sequence.

\ \

Display Conventions and Configuration

\

\ This track may be configured in a variety of ways to highlight different \ aspects of the displayed data. The graphical configuration options \ are shown at the top of the track description page. For more information, \ click the \ Graph\ configuration help link.

\ \

Methods

\

\ Hydroxyl radical cleavage intensity predictions were performed using\ an in-house sliding trimer window (STW) algorithm. This algorithm\ draws data from the ·OH Radical Cleavage Intensity Database\ (ORChID), which contains more than 150 experimentally determined cleavage\ patterns. These predictions are fairly accurate, with a Pearson\ coefficient of ~0.85 between the predicted and experimentally\ determined cleavage intensities. For more details on the hydroxyl\ radical cleavage method, see the References section below.

\ \

Verification

\

\ The STW algorithm has been cross-validated by removing each test\ sequence from the training set and performing a prediction. The\ mean correlation coefficient (between predicted and experimental\ cleavage patterns) from this study was 0.85.

\ \

Credits

\

\ These data were generated through the combined effort of\ Bo Pang at MIT and \ \ Jason Greenbaum,\ \ Steve Parker, and \ \ Tom Tullius of Boston University.

\ \ \

References

\

\ Balasubramanian, B., Pogozelski, W.K., and Tullius, T.D. \ DNA strand breaking by the hydroxyl radical is governed by the \ accessible surface areas of the hydrogen atoms of the DNA backbone. \ Proc. Natl. Acad. Sci. USA 95(17), 9738-9743 (1998).

\

\ Price, M. A., and Tullius, T. D. \ Using the Hydroxyl Radical to Probe DNA Structure.\ Meth. Enzymol. 212, 194-219 (1992).

\

\ Tullius, T. D. Probing DNA Structure with Hydroxyl Radicals. \ In Current Protocols in Nucleic Acid Chemistry, (eds.\ Beaucage, S.L., Bergstrom, D.E., Glick, G.D. and Jones, R.A.) (Wiley, 2001), \ pp. 6.7.1-6.7.8.

\ encodeChrom 0 altColor 200,44,44\ autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 44,44,200\ dataVersion ENCODE June 2005 Freeze\ group encodeChrom\ longLabel Boston University ORChID (OH Radical Cleavage Intensity Database)\ maxHeightPixels 128:36:16\ origAssembly hg16\ priority 60.0\ shortLabel BU ORChID\ spanList 1\ track encodeBu_ORChID1\ type wig -0.56 1.58\ viewLimits 0.22:0.5\ visibility hide\ windowingFunction Mean\ encodeUtexChipSuper UT-Austin ChIP University of Texas, Austin ChIP-chip and STAGE 0 60 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Overview

\

\ This super-track combines related tracks of ChIP data generated by\ the Iyer laboratory at \ The University of Texas at Austin.\ Two technologies are presented in this super-track: ChIP-chip and ChIP-STAGE.\ ChIP-chip, also known as genome-wide location analysis, is a technique for\ isolation and identification of DNA sequences bound by specific proteins in\ cells. Instead of detecting bound fragments by microarray, ChIP-STAGE uses\ Sequence Tag Analysis of Genomic Enrichment, or STAGE, technology by cloning \ STAGE tags, sequencing and mapping to the human genome.\

\

\ These tracks contain ChIP data for several transcription\ factors, including c-Myc, E2F4 and STAT1, in cell lines\ including 2091 (foreskin fibroblast) and HeLa (cervical carcinoma).\

\ \

Credits

\

\ ChIP-chip data were contributed by Jonghwan Kim, Akshay Bhinge, and Vishy Iyer \ from the Iyer lab\ at The University of Texas at Austin, in collaboration with Mike Singer, \ Nan Jiang, and Roland Green of NimbleGen Systems, Inc.

\

\ ChIP-STAGE data were contributed by Jonghwan Kim, Akshay Bhinge, and Vishy Iyer \ from the Iyer lab, and by Ghia Euskirchen and Michael Snyder of the \ Snyder lab at\ Yale University.

\ \

References

\

\ Bhinge AA, Kim J, Euskirchen G, Snyder M, Iyer VR.\ \ Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic \ Enrichment (STAGE). \ Genome Res. 2007 Jun;17(6):910-6.

\

\ Kim J, Bhinge A, Morgan XC, Iyer VR. \ Mapping DNA-protein interactions in large genomes by sequence tag \ analysis of genomic enrichment. Nat Methods. 2005 Jan;2(1):47-53.\

\ \ encodeChip 0 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ group encodeChip\ longLabel University of Texas, Austin ChIP-chip and STAGE\ origAssembly hg17\ priority 60\ shortLabel UT-Austin ChIP\ superTrack on\ track encodeUtexChipSuper\ visibility hide\ encodeUtexChip UT-Austin ChIP bedGraph 4 University of Texas, Austin ChIP-chip 0 60 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ ChIP-chip analysis of c-Myc and E2F4 was performed using 2091 foreskin \ fibroblasts and HeLa cells. ChIP was carried out from normally-growing HeLa \ cells and from 2091 quiescent (0.1% serum FBS), as well as serum-stimulated \ (10% FBS, 4hrs), fibroblasts. \ Microarray hybridizations were performed using NimbleGen ENCODE arrays and \ protocols.

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\ \

Methods

\

\ Chromatin from each cell line under a given condition was cross-linked with \ 1% formaldehyde, sheared, precipitated with antibody, and reverse cross-linked \ to obtain enriched DNA fragments. ChIP material was amplified and hybridized \ to a NimbleGen ENCODE region array. \ The raw and processed files reflect fold enrichment over the mock ChIP sample, \ which was used as a reference in the hybridization.

\ \

Verification

\

\ Each of the four experiments has three independent biological replicates. \ Data from all three replicates were averaged to generate a single data file. \ The NimbleGen method for hit identification was used to generate the peaks at a false positive rate of <= 0.05.

\ \

Credits

\

\ These data were contributed by Jonghwan Kim, Akshay Bhinge, and Vishy Iyer \ from the Iyer lab\ at the University of Texas at Austin, in collaboration with Mike Singer, \ Nan Jiang, and Roland Green of NimbleGen Systems, Inc.

\ \

Reference

\

\ Kim, J., Bhinge, A., Morgan, X.C. and Iyer, V.R. \ Mapping DNA-protein interactions in large genomes by sequence tag \ analysis of genomic enrichment. Nature Methods 2, 47-53 \ (2005).

\ encodeChip 0 autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE Oct 2005 Freeze\ group encodeChip\ longLabel University of Texas, Austin ChIP-chip\ maxHeightPixels 128:16:16\ maxLimit 4.35\ minLimit -3.23\ origAssembly hg17\ priority 60\ shortLabel UT-Austin ChIP\ subGroup1 dataType Data_Type raw=Raw peaks=Peaks\ superTrack encodeUtexChipSuper dense\ track encodeUtexChip\ type bedGraph 4\ viewLimits 0:2\ visibility hide\ windowingFunction mean\ encodeAffyChIpHl60PvalRnapHr08 Affy Pol2 RA 8h wig 0.0 534.54 Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 8hrs) P-Value 0 61 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,175,0\ longLabel Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 8hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 61\ shortLabel Affy Pol2 RA 8h\ subGroups factor=Pol2 time=8h\ track encodeAffyChIpHl60PvalRnapHr08\ encodeNhgriDnaseHs NHGRI DNaseI-HS bed 5 . NHGRI DNaseI-Hypersensitive Sites 0 61 0 0 0 127 127 127 1 0 19 chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays DNaseI-hypersensitive sites in CD4+ T-cells\ before and after activation by anti-CD3 and anti-CD28 antibodies. \ DNaseI-hypersensitive sites are\ associated with gene regulatory regions, particularly for upregulated\ genes. CD4+ T-cells, also known as helper or inducer T cells, are\ involved in generating an immune response. CD4+ T-cells are also\ one of the primary targets of the HIV virus.

\ \

Display Conventions and Configuration

\

\ The top subtrack of this annotation corresponds to unactivated T cells; the \ bottom subtrack shows activated T cells. Within the subtracks, the gray and \ black blocks (which appear as vertical lines when the display is zoomed-out) \ represent probable hypersensitive sites. The darker the blocks, the \ more likely the site is to be hypersensitive.

\

\ To display only selected subtracks, uncheck the boxes next to the tracks you \ wish to hide. The display may also be filtered to show only those items\ with unnormalized scores that meet or exceed a certain threshhold. To set a\ threshhold, type the minimum score into the text box at the top of the \ description page.

\ \

Methods

\

\ Primary human CD4+ T cells were activated by incubation with anti-CD3 and \ anti-CD28 antibodies for 24 hours.\ DNaseI-hypersensitive sites were cloned from the cells \ before and after activation, and\ sequenced using massively parallel signature sequencing \ (Brenner et al., 2000; Crawford et al., 2006). \ Only those clusters of multiple DNaseI library sequences that map within 500 \ bases of each other are displayed.\ Each cluster has a unique identifier, visible when the track is displayed\ in full or packed mode. The last digit of each identifier represents the \ number of sequences that map within that particular cluster. The sequence number\ is also reflected in the score, e.g. a cluster of two sequences scores \ 500, three sequences scores 750 and four or more sequences scores 1000.

\ \

Verification

\

\ Real-time PCR assay was used to verify valid\ DNaseI-hypersensitive sites. Approximately 50% of\ clusters of two sequences are valid. These clusters are shown\ in light gray. 80% of clusters of three sequences are valid, indicated by\ dark gray. 100% of clusters of four or more\ sequences are valid, shown in black.

\

\ This data set includes confirmed elements for 35 of the 44\ ENCODE regions.\ It is estimated that these data identify\ 10-20% of all hypersensitive sites within CD4+ T cells. Further\ sequencing will be required to identify additional sites.

\ \

Credits

\

\ These data were produced at the \ Collins Lab \ at NHGRI. Thanks to Gregory E. Crawford and Francis S. Collins for supplying \ the information for this track.

\ \

References

\

\ Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, \ McCurdy S, Foy M, Ewan M et al.\ Gene expression analysis by massively parallel signature \ sequencing (MPSS) on microbead arrays.\ Nat. Biotechnol. 2000 Jun;18(6):597-8.\

\ Crawford GE, Holt IE, Mullikin JC, Tai D, Blakesley R, Bouffard G, Young A, \ Masiello C, Green ED, Wolfsberg TG et al.\ Identifying gene regulatory elements by genome-wide recovery of \ DNase hypersensitive sites.\ Proc. Natl. Acad. Sci. USA. 2004 Jan 27;101(4):992-7.

\

\ Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, \ Bernat JA, Ginsburg D et al.\ Genome-wide mapping of DNase hypersensitive \ sites using massively parallel signature sequencing (MPSS).\ Genome Res. 2006 Jan;16(1):123-31.\ (See also NHGRI's\ data site for the project.)

\

\ McArthur M, Gerum S, Stamatoyannopoulos G.\ Quantification of DNaseI-sensitivity by real-time PCR: \ quantitative analysis of DNaseI-hypersensitivity of the mouse beta-globin \ LCR.\ J. Mol. Biol. 2001 Oct 12;313(1):27-34.

\ encodeChrom 1 chromosomes chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChrom\ longLabel NHGRI DNaseI-Hypersensitive Sites\ origAssembly hg16\ priority 61.0\ shortLabel NHGRI DNaseI-HS\ track encodeNhgriDnaseHs\ type bed 5 .\ useScore 1\ visibility hide\ encodeAffyChIpHl60SitesRnapHr08 Affy Pol2 RA 8h bed 3 . Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 8hrs) Sites 0 62 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 50,175,0\ longLabel Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 8hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 62\ shortLabel Affy Pol2 RA 8h\ subGroups factor=Pol2 time=8h\ track encodeAffyChIpHl60SitesRnapHr08\ encodeStanfordMeth Stanf Meth bedGraph 4 Stanford Methylation Digest: Be2C, CRL1690, HCT116, HT1080, HepG2, JEG3, Snu182, U87 0 62 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays experimentally determined regions of unmethylated\ CpGs in the ENCODE regions. These experiments were performed in eight\ cell lines, each of which is displayed as a separate subtrack:\

\

\

\ \ \ \ \ \ \ \ \ \ \
Cell LineClassificationIsolated From
BE(2)-Cneuroblastomabrain (metastatic, from bone marrow)
CRL-1690™hybridomaB lymphocyte
HCT 116colorectal carcinomacolon
HT-1080fibrosarcomaconnective tissue
HepG2hepatocellular carcinomaliver
JEG-3choriocarcinomaplacenta
SNU-182hepatocellular carcinomaliver
U-87 MGglioblastoma-astrocytomabrain
\
\

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\ \

Methods

\

\ High molecular weight genomic DNA was prepared from each cell line.\ The genomic DNA was digested with a cocktail of six methyl-sensitive\ restriction enzymes (AciI, HhaI, BstUI, HpaII, HgaI, and HpyCH4IV) and\ size selected to deplete the genome of unmethylated regions. Digested\ and undigested DNA (control) were amplified, labeled, and hybridized\ to oligo tiling arrays produced by NimbleGen. The data for each array\ were median subtracted (log 2 ratios) and normalized (divided by the \ standard deviation). The value given\ for each array probe is the transformed mean ratio of\ undigested:digested genomic DNA.

\

\ Higher scores in this track indicate regions that are more strongly\ methylated, due to the greater difference between the undigested and\ digested hybridization signals.\

\ \

Verification

\

\ Three biological replicates and two technical replicates were done for\ each of the eight cell lines. \ The Myers lab is currently testing the specificity and sensitivity using \ real-time PCR. \

\ \

Credits

\

\ These data were generated in the Richard M. Myers lab at Stanford University (now at \ HudsonAlpha Institute for Biotechnology). Please contact \ David Johnson\ \ for further information regarding the methods and the data for this track.

\ encodeChrom 0 altColor 150,0,25\ autoScale off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 120,0,20\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChrom\ longLabel Stanford Methylation Digest: Be2C, CRL1690, HCT116, HT1080, HepG2, JEG3, Snu182, U87\ maxHeightPixels 128:16:16\ maxLimit 114\ minLimit 0\ origAssembly hg16\ priority 62.0\ shortLabel Stanf Meth\ track encodeStanfordMeth\ type bedGraph 4\ viewLimits 0:10\ visibility hide\ encodeStanfordMethSmoothed Stanf Meth Score bedGraph 4 Stanford Methylation Digest Smoothed Score 0 62.1 120 0 20 150 0 25 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays smoothed (sliding-window mean) scores for \ experimentally determined regions of unmethylated\ CpGs in the ENCODE regions. These experiments were performed in eight\ cell lines, each of which is displayed as a separate subtrack:\

\

\

\ \ \ \ \ \ \ \ \ \ \
Cell LineClassificationIsolated From
BE(2)-Cneuroblastomabrain (metastatic, from bone marrow)
CRL-1690™hybridomaB lymphocyte
HCT 116colorectal carcinomacolon
HT-1080fibrosarcomaconnective tissue
HepG2hepatocellular carcinomaliver
JEG-3choriocarcinomaplacenta
SNU-182hepatocellular carcinomaliver
U-87 MGglioblastoma-astrocytomabrain
\
\

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\ \

Methods

\

\ High molecular weight genomic DNA was prepared from each cell line.\ The genomic DNA was digested with a cocktail of six methyl-sensitive\ restriction enzymes (AciI, HhaI, BstUI, HpaII, HgaI, and HpyCH4IV) and\ size selected to deplete the genome of unmethylated regions. Digested\ and undigested DNA (control) were amplified, labeled, and hybridized\ to oligo tiling arrays produced by NimbleGen. The data for each array\ were median subtracted (log 2 ratios) and normalized (divided by the \ standard deviation).

\

\ The transformed mean ratios of undigested:digested genomic DNA for \ all probes were then smoothed by calculating a sliding-window mean. \ Windows of six neighboring probes (sliding two probes at a time) were \ used; within each window, the highest and lowest value were dropped, \ and the remaining four values were averaged. In order to increase the \ contrast between high and low values for visual display, the average \ was converted to a score by the formula:

\ score = 8^(average) * 10
\ These scores are for visualization purposes; for all analyses, \ the raw ratios, which are available in the Stanf Meth track, should be used. \

\

\ Higher scores in this track indicate regions that are more strongly\ methylated, due to the greater difference between the undigested and\ digested hybridization signals.\

\ \

Verification

\

\ Three biological replicates and two technical replicates were done for\ each of the eight cell lines. \ The Myers lab is currently testing the specificity and sensitivity using \ real-time PCR.

\ \

Credits

\

\ These data were generated in the Richard M. Myers lab at Stanford University (now at \ \ HudsonAlpha Institute for Biotechnology). \ Please contact David Johnson\ \ for further information regarding the methods and the data for this track.

\ \ encodeChrom 0 altColor 150,0,25\ autoScale off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 120,0,20\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChrom\ longLabel Stanford Methylation Digest Smoothed Score\ maxHeightPixels 128:16:16\ maxLimit 364088\ minLimit 0\ origAssembly hg16\ priority 62.1\ shortLabel Stanf Meth Score\ track encodeStanfordMethSmoothed\ type bedGraph 4\ viewLimits 0:1000\ visibility hide\ encodeAffyChIpHl60PvalRnapHr32 Affy Pol2 RA 32h wig 0.0 534.54 Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 32hrs) P-Value 0 63 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 50,175,0\ longLabel Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 32hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 63\ shortLabel Affy Pol2 RA 32h\ subGroups factor=Pol2 time=32h\ track encodeAffyChIpHl60PvalRnapHr32\ evofold EvoFold bed 6 + EvoFold Predictions of RNA Secondary Structure 0 63 20 90 0 137 172 127 0 0 0

Description

\

\ This track shows RNA secondary structure predictions made with the\ EvoFold program, a comparative method that exploits the evolutionary signal\ of genomic multiple-sequence alignments for identifying conserved\ functional RNA structures.

\ \

Display Conventions and Configuration

\

\ Track elements are labeled using the convention ID_strand_score.\ When zoomed out beyond the base level, secondary structure prediction regions\ are indicated by blocks, with the stem-pairing regions shown in a darker shade \ than unpaired regions. Arrows indicate the predicted strand.\ When zoomed in to the base level, the specific secondary structure predictions \ are shown in parenthesis format. The confidence score for each position is\ indicated in grayscale, with darker shades corresponding to higher scores.\

\ The details page for each track element shows the predicted secondary structure \ (labeled SS anno), together with details of the multiple species \ alignments at that location. Substitutions relative to the human sequence are \ color-coded according to their compatibility with the predicted secondary \ structure (see the color legend on the details page). Each prediction is \ assigned an overall score and a sequence of position-specific scores. The \ overall score measures evidence for any functional RNA structures in the given \ region, while the position-specific scores (0 - 9) measure the confidence of \ the base-specific annotations. Base-pairing positions are annotated \ with the same pair symbol. The offsets are provided to ease\ visual navigation of the alignment in terms of the human sequence. The offset\ is calculated (in units of ten) from the start position of the element on \ the positive strand or from the end position when on the negative strand.

\

\ The graphical display may be filtered to show only those track elements \ with scores that meet or exceed a certain threshhold. To set a \ threshhold, type the minimum score into the text box at the top of the \ description page.

\ \

Methods

\

\ Evofold makes use of phylogenetic\ stochastic context-free grammars (phylo-SCFGs), which are combined\ probabilistic models of RNA secondary structure and primary sequence\ evolution. The predictions consist of both a specific RNA secondary\ structure and an overall score. The overall score is essentially a\ log-odd score between a phylo-SCFG modeling the constrained evolution of\ stem-pairing regions and one which only models unpaired regions.

\

\ The predictions for this track were based on the conserved elements of\ an 8-way vertebrate alignment of the human, chimpanzee, mouse, rat,\ dog, chicken, zebrafish, and Fugu assemblies. NOTE: These predictions\ were originally computed on the hg17 (May 2004) human assembly, from\ which the hg16 (July 2003), hg18 (May 2006), and hg19 (Feb 2009) predictions\ were lifted. As a result, the multiple alignments shown on the track\ details pages may differ from the 8-way alignments used for their\ prediction. Additionally, some weak predictions have been eliminated\ from the set displayed on hg18 and hg19. The hg17 prediction set corresponds\ exactly to the set analyzed in the EvoFold paper referenced below.\

\ \

Credits

\

\ The EvoFold program and browser track were developed by \ Jakob \ Skou Pedersen of the UCSC Genome Bioinformatics Group, now at \ Aarhus University, Denmark.

\

The RNA secondary structure is rendered using the VARNA Java applet.\ \

References

\ \

EvoFold

\

\ Pedersen JS, Bejerano G, Siepel A, Rosenbloom K,\ Lindblad-Toh K, Lander ES, Kent J, Miller W,\ Haussler D. Identification and classification of conserved RNA\ secondary structures in the human genome. PLoS Comput\ Biol. 2006 Apr;2(4):e33.

\ \

Phylo-SCFGs

\

\ Knudsen B, Hein J. \ RNA secondary structure prediction using stochastic context-free \ grammars and evolutionary history.\ Bioinformatics. 1999 Jun;15(6):446-54.

\

\ Pedersen JS, Meyer IM, Forsberg R, Simmonds P, Hein J. \ A comparative method for finding and folding RNA\ secondary structures within protein-coding regions. \ Nucleic Acids Res. 2004 Sep 24;32(16):4925-36.

\ \

PhastCons

\

\ Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom\ K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM,\ Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D. \ Evolutionarily conserved elements in vertebrate, insect, worm, \ and yeast genomes.\ Genome Res. 2005 Aug;15(8):1034-50.

\ genes 1 color 20,90,0\ group genes\ longLabel EvoFold Predictions of RNA Secondary Structure\ mafTrack mzPt1Mm3Rn3Gg2_pHMM\ origAssembly hg17\ priority 63\ shortLabel EvoFold\ track evofold\ type bed 6 +\ visibility hide\ miRNA miRNA bed 8 . MicroRNAs from miRBase 0 63 255 64 64 255 159 159 1 0 0 http://microrna.sanger.ac.uk/cgi-bin/sequences/mirna_entry.pl?id=$$

Description

\

\ The miRNA track shows microRNAs from the\ \ miRBase at The \ Wellcome Trust Sanger Institute.

\ \

Display Conventions and Configuration

\

\ Mature miRNAs (miRs) are represented by \ thick blocks. The predicted stem-loop portions of the primary transcripts\ are indicated by thinner blocks. miRNAs in the sense orientation are shown in\ black; those in the reverse orientation are colored grey. When a single \ precursor produces two mature miRs from its 5' and 3' parts, it is displayed \ twice with the two different positions of the mature miR.

\

\ To display only those items that exceed a specific unnormalized score, enter\ a minimum score between 0 and 1000 in the text box at the top of the track \ description page.\

\ \

Methods

\

\ Mature and precursor miRNAs from the miRNA Registry were\ aligned against the genome using blat.\ The extents of the precursor sequences were not generally known, and were\ predicted based on base-paired hairpin structure. \ miRBase is described in Griffiths-Jones, S. et al. (2006).\ The miRNA Registry is\ described in Griffiths-Jones, S. (2004) and Weber, M.J. (2005) in the \ References section below.

\ \

Credits

\

\ \ This track was created by Michel Weber of \ Laboratoire de Biologie Moléculaire Eucaryote,\ CNRS Université Paul Sabatier\ (Toulouse, France), Yves Quentin of Laboratoire de Microbiologie et Génétique\ Moléculaires (Toulouse, France) and Sam Griffiths-Jones of\ \ The Wellcome Trust Sanger Institute\ (Cambridge, UK).\

\

References

\

\ When making use of these data, please cite:

\

\ Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ.\ miRBase: microRNA sequences, targets and gene nomenclature.\ Nucleic Acids Res. 2006 Jan 1;34(Database issue):D140-4.

\

\ Griffiths-Jones S. \ The microRNA Registry.\ Nucleic Acids Res. 2004 Jan 1;32(Database issue):D109-11.

\

\ Weber MJ. \ New human and mouse microRNA genes found by homology search.\ Febs J. 2005 Jan;272(1):59-73.

\

\ You may also want to cite The Wellcome Trust Sanger Institute \ miRNA Registry.

\

\ The following publication provides guidelines on miRNA annotation:\
Ambros V, Bartel B, Bartel DP, Burge CB, Carrington JC, Chen X,\ Dreyfuss G, Eddy SR, Griffiths-Jones S, Marshall M et al.\ A uniform system for microRNA annotation. \ RNA. 2003 Mar;9(3):277-9.

\

\ For more information on blat, see \
Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ \ genes 1 color 255,64,64\ group genes\ longLabel MicroRNAs from miRBase\ priority 63\ shortLabel miRNA\ track miRNA\ type bed 8 .\ url http://microrna.sanger.ac.uk/cgi-bin/sequences/mirna_entry.pl?id=$$\ urlLabel miRBase:\ useScore 1\ visibility hide\ xenoMrna Other mRNAs psl xeno Non-Human mRNAs from GenBank 0 63 0 0 0 127 127 127 1 0 0

Description

\

\ This track displays translated blat alignments of vertebrate and\ invertebrate mRNA in \ GenBank from organisms other than human.\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) for this track is in two parts. The\ first + indicates the orientation of the query sequence whose\ translated protein produced the match (here always 5' to 3', hence +).\ The second + or - indicates the orientation of the matching \ translated genomic sequence. Because the two orientations of a DNA \ sequence give different predicted protein sequences, there are four \ combinations. ++ is not the same as --, nor is +- the same as -+.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA \ display. For example, to apply the filter to all mRNAs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Multiple terms may be entered at once, \ separated by a space. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter \ criteria will be highlighted. If "or" is selected, mRNAs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display mRNAs that match the filter criteria. \ If "include" is selected, the browser will display only those \ mRNAs that match the filter criteria.\

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare mRNAs against the genomic sequence. For more \ information about this option, go to the \ \ Codon and Base Coloring for Alignment Tracks page.\ Several types of alignment gap may also be colored; \ for more information, go to the \ \ Alignment Insertion/Deletion Display Options page.\

\ \

Methods

\

\ The mRNAs were aligned against the human genome using translated blat. \ When a single mRNA aligned in multiple places, the alignment having the \ highest base identity was found. Only those alignments having a base \ identity level within 1% of the best and at least 25% base identity with the \ genomic sequence were kept.

\ \

Credits

\

\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, \ Wheeler DL. \ GenBank: update. Nucleic Acids Res.\ 2004 Jan 1;32(Database issue):D23-6.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ rna 1 baseColorUseCds genbank\ baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Non-$Organism mRNAs from GenBank\ priority 63\ shortLabel Other mRNAs\ showDiffBasesAllScales .\ spectrum on\ track xenoMrna\ type psl xeno\ visibility hide\ encodeUncFaire UNC FAIRE bedGraph 4 UNC FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements) 0 63 20 150 20 50 100 50 0 0 21 chr1,chr4,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chr10,chrX,

Description

\

\ Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE) is a procedure \ used to isolate chromatin that is resistant to the formation of protein-DNA \ cross-links. These tracks display FAIRE data from 2091 fibroblast cells \ hybridized to high-resolution NimbleGen arrays that tile the ENCODE regions. The four \ datasets, in practical terms, can be thought of as independent replicates. \ However, because they were part of a series of experiments aimed at optimizing \ cross-linking conditions in human cells, the data represent different \ cross-linking times (1, 2, 4, and 7 minutes). Although the individual \ replicates are not displayed, the replicate data and also the signal averages \ and the peaks for the averages can be \ downloaded.

\ \

Display Conventions and Configuration

\

\ The FAIRE data are represented by three subtracks. One subtrack shows the \ average normalized log2 ratios for the tiled probes; the other two \ subtracks \ display peaks. The peaks in one set were determined using PeakFinder \ software supplied by NimbleGen. A false positive rate (FPR) was estimated for \ the peak \ set using a permutation-based method. All peaks had an FPR of < 0.01. The\ peaks in the other set (Apr. 2006 update) were identified by ChIPOTle, a \ peak-finding algorithm that uses a sliding window to identify statistically\ significant signals that comprise a peak. A null distribution was determined \ by reflecting the negative data, which is presumed to be noise, about zero and \ a Gaussian distribution was fitted to it. Windows were considered \ significant with a p-value < 1e-25, after using the Benjamini-Hochberg \ correction for multiple tests.

\

\ This annotation follows the display conventions for composite \ tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ one subtrack, uncheck the box next to the track you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link. Note that the graphical configuration options are \ available only for the Signal subtrack; the Peaks subtracks are fixed.

\ \

Methods

\

\ To perform FAIRE, proteins were cross-linked to DNA using \ 1% formaldehyde solution, the complex was sheared using sonication, and a \ phenol/chloroform extraction was performed to remove DNA fragments \ crosslinked to protein. The DNA recovered in the aqueous phase was \ fluorescently-labeled and hybridized to a microarray along with \ fluorescently-labeled genomic DNA as a control. Ratios were scaled by \ subtracting the Tukey Bi-weight mean for the log-ratio values from each \ log-ratio value, as recomended by NimbleGen. Results in yeast were \ consistent with enrichment for nucleosome-depleted regions of the genome. \ Therefore, the method may have utility as a positive selection for genomic \ regions with properties normally detected by assays like DNAse \ hypersensitivity.

\ \

Verification

\

\ The data were verified using PCR with primers designed to promoters enriched \ with FAIRE and downstream coding regions.

\ \

Credits

\

\ Cell culture, fixing, and DNA amplification were performed by Jonghwan Kim in \ the Vishy Iyer \ lab at the University of Texas, Austin. FAIRE was done by Paul Giresi in \ the Jason Lieb lab at the University of North Carolina at \ Chapel Hill. Paul Giresi of NimbleGen did the sample labeling and hybridization \ with the help of Mike Singer and Roland Green. Nan Jiang at NimbleGen supplied \ the Software used for the permutation analysis.

\ \

References

\

\ Buck, M.J., Nobel, A.B., and Lieb, J.D. \ ChIPOTle: a \ user-friendly tool for the analysis of ChIP-chip data. Genome Biol.\ 6(11), R97 (2005).

\

\ Nagy, P.L., Cleary, M.L., Brown, P.O., and Lieb, J.L. \ Genomewide demarcation of RNA polymerase II transcription units \ revealed by physical fractionation of chromatin. \ PNAS 100(11), 6364-9 (2003).

\ encodeChrom 0 altColor 50,100,50\ autoScale off\ chromosomes chr1,chr4,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chr10,chrX\ color 20,150,20\ compositeTrack on\ dataVersion ENCODE Oct 2005 Freeze\ group encodeChrom\ longLabel UNC FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements)\ maxHeightPixels 128:24:16\ maxLimit 3.63\ minLimit -2.61\ origAssembly hg17\ priority 63.0\ shortLabel UNC FAIRE\ spanList 38\ track encodeUncFaire\ type bedGraph 4\ viewLimits -0.6:0.7\ windowingFunction mean\ encode_tba23EvoFold TBA23 EvoFold bed 6 + EvoFold Predictions of RNA Secondary Structure Using TBA23 0 63.1 20 90 0 137 172 127 0 0 0

Description

\

\ This track shows RNA secondary structure predictions made with the\ EvoFold program, a comparative method that exploits the evolutionary signal\ of genomic multiple-sequence alignments for identifying conserved\ functional RNA structures.

\ \

Display Conventions and Configuration

\

\ Track elements are labeled using the convention ID_strand_score.\ At the zoomed-out level, secondary structure prediction regions are\ indicated by blocks, with the stem-pairing regions shown in a darker shade \ than unpaired regions. Arrows indicate the predicted strand.\ When zoomed in to the base level, the specific secondary structure predictions \ are shown in parenthesis format. The confidence score for each position is\ indicated in grayscale, with darker shades corresponding to higher scores.\

\ The details page for each track element shows the predicted secondary structure \ (labeled SS anno), together with details of the multiple species \ alignments at that location. Substitutions relative to the human sequence are \ color-coded according to their compatibility with the predicted secondary \ structure (see the color legend on the details page). Each prediction is \ assigned an overall score and a sequence of position-specific scores. The \ overall score measures evidence for any functional RNA structures in the given \ region, while the position-specific scores (0 - 9) measure the confidence of \ the base-specific annotations. Base-pairing positions are annotated \ with the same pair symbol. The offsets are provided to ease\ visual navigation of the alignment in terms of the human sequence. The offset\ is calculated (in units of ten) from the start position of the element on \ the positive strand or from the end position when on the negative strand.

\

\ The graphical display may be filtered to show only those track elements \ with unnormalized scores that meet or exceed a certain threshhold. To set a \ threshhold, type the minimum score into the text box at the top of the \ description page.

\ \

Methods

\

\ Evofold makes use of phylogenetic\ stochastic context-free grammars (phylo-SCFGs), which are combined\ probabilistic models of RNA secondary structure and primary sequence\ evolution. The predictions consist of both a specific RNA secondary\ structure and an overall score. The overall score is essentially a\ log-odd score phylo-SCFG modeling the constrained evolution of\ stem-pairing regions and one which only models unpaired regions.

\

\ The predictions for this track were based on the conserved elements of\ the 23-way threaded blockset aligner (TBA) alignments present in the ENCODE\ regions (see the TBA Alignment track for more information).

\ \

Credits

\

\ The EvoFold program and browser track were developed by \ Jakob Skou Pedersen of the UCSC\ Genome Bioinformatics Group.

\

\ The 23-way TBA multiple alignments were created by Elliott Margulies\ of the Green\ Lab at NHGRI.

\ \

References

\

\ Knudsen B, Hein J. \ RNA secondary structure prediction using stochastic context-free \ grammars and evolutionary history.\ Bioinformatics. 1999 Jun;15(6):446-54.

\

\ Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, \ Kent J, Miller W, Haussler D. \ Identification and classification of conserved RNA secondary \ structures in the human genome. \ PLoS Comput Biol. 2006 Apr;2(4):e33.

\

\ Pedersen JS, Meyer IM, Forsberg R, Simmonds P, Hein J. \ A comparative method for finding and folding RNA\ secondary structures within protein-coding regions. \ Nucl Acids Res. 2004 Sep 24;32(16):4925-36.

\

\ Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K,\ Clawson H, Spieth J, Hillier LW, Richards S, et al.\ Evolutionarily conserved elements in vertebrate, insect, worm,\ and yeast genomes.\ Genome Res. 2005 Aug;15(8):1034-50.\ \ encodeGenes 1 color 20,90,0\ dataVersion ENCODE June 2005 Freeze\ group encodeGenes\ longLabel EvoFold Predictions of RNA Secondary Structure Using TBA23\ mafTrack encodeTbaAlign\ priority 63.1\ shortLabel TBA23 EvoFold\ track encode_tba23EvoFold\ type bed 6 +\ visibility hide\ wgRnaOld sno/miRNA Old bed 8 + C/D and H/ACA Box snoRNAs, scaRNAs, and microRNAs from snoRNABase and miRBase (Old Track) 0 63.5 200 80 0 227 167 127 0 0 0 http://www-snorna.biotoul.fr/plus.php?id=$$

Description

\

\ This track displays positions of four different types of RNA in the human \ genome: \

\

\ C/D box and H/ACA box snoRNAs are guides for the 2'O-ribose methylation and \ the pseudouridilation, respectively, of rRNAs and snRNAs, although many of \ them have no documented target RNA. The scaRNAs guide modifications of the\ spliceosomal snRNAs transcribed by RNA polymerase II, and often contain both \ C/D and H/ACA domains.

\ \

Display Conventions and Configuration

\

\ This track follows the general display conventions for \ gene prediction \ tracks.

\

\ The miRNA precursor forms (pre-miRNA) are represented by red blocks.

\

\ C/D box snoRNAs, H/ACA box snoRNAs and scaRNAs are represented by blue, green and \ magenta blocks, respectively. At a zoomed-in resolution, arrows superimposed \ on the blocks indicate the sense orientation of the snoRNAs.

\ \

Methods

\

\ Mature and precursor miRNAs from the miRNA Registry were aligned against the \ genome using blat.\ The extents of the precursor sequences were not generally known and were\ predicted based on base-paired hairpin structure. The miRNA Registry is\ described in Griffiths-Jones, S. (2004) and Weber, M.J. (2005) in the \ References section below.

\

\ The snoRNAs and scaRNAs from the snoRNABase were aligned against the \ human genome using blat. \

\ \

Credits

\

\ The miRNA annotation was contributed by Michel Weber of \ Laboratoire de Biologie \ Moléculaire Eucaryote, CNRS Université Paul Sabatier (UMR5099, Toulouse, \ France) and Sam Griffiths-Jones of The Wellcome Trust Sanger Institute (Cambridge, UK).

\

\ The snoRNA annotations were contributed by Michel Weber and \ Laurent Lestrade of the \ Institut d'Exploration \ Fonctionnelle des Génomes (IFR109, Toulouse, France).

\

\ Fan Hsu from the UCSC Genome Bioinformatics \ Group created the combined annotation track.

\ \

References

\

\ When making use of these data, please cite: \

\ Griffiths-Jones S. \ The microRNA Registry.\ Nucl. Acids Res. 2004 Jan 1;32(D):D109-11.

\

\ Weber MJ. \ New human and mouse microRNA genes found by homology search.\ Febs J. 2005 Jan;272(1):59-73.

\

\ You may also want to cite The Wellcome Trust Sanger Institute \ miRNA Registry and The Laboratoire de Biologie Moleculaire \ Eucaryote snoRNA \ database.

\

\ The following publication provides guidelines on miRNA annotation:\ Ambros V. et al., \ A uniform system for microRNA annotation. \ RNA. 2003;9(3):277-9.

\

\ For more information on blat, see \ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002;12(4):656-664.

\ genes 1 color 200,80,0\ group genes\ longLabel C/D and H/ACA Box snoRNAs, scaRNAs, and microRNAs from snoRNABase and miRBase (Old Track)\ priority 63.5\ shortLabel sno/miRNA Old\ track wgRnaOld\ type bed 8 +\ url http://www-snorna.biotoul.fr/plus.php?id=$$\ url2 http://www.mirbase.org/cgi-bin/query.pl?terms=$$\ url2Label miRBase:\ urlLabel Laboratoire de Biologie Moléculaire Eucaryote:\ visibility hide\ wgRna sno/miRNA bed 8 + C/D and H/ACA Box snoRNAs, scaRNAs, and microRNAs from snoRNABase and miRBase 0 63.6 200 80 0 227 167 127 0 0 0 http://www-snorna.biotoul.fr/plus.php?id=$$

Description

\

\ This track displays positions of four different types of RNA in the human \ genome: \

\

\ C/D box and H/ACA box snoRNAs are guides for the 2'O-ribose methylation and \ the pseudouridilation, respectively, of rRNAs and snRNAs, although many of \ them have no documented target RNA. The scaRNAs guide modifications of the\ spliceosomal snRNAs transcribed by RNA polymerase II, and often contain both \ C/D and H/ACA domains.

\ \

Display Conventions and Configuration

\

\ This track follows the general display conventions for \ gene prediction \ tracks.

\

\ The miRNA precursor forms (pre-miRNA) are represented by red blocks.

\

\ C/D box snoRNAs, H/ACA box snoRNAs and scaRNAs are represented by blue, \ green and magenta blocks, respectively. At a zoomed-in resolution, arrows \ superimposed on the blocks indicate the sense orientation of the snoRNAs.

\ \

Methods

\

\ Precursor miRNA genomic locations from\ \ miRBase\ were calculated using wublastn for sequence alignment with the requirement of\ 100% identity. \ The extents of the precursor sequences were not generally known and were\ predicted based on base-paired hairpin structure. miRBase is\ described in Griffiths-Jones, S. (2004) and Weber, M.J. (2005) in the \ References section below.

\

\ The snoRNAs and scaRNAs from the snoRNABase were aligned against the \ human genome using blat. \

\ \

Credits

\ Genome coordinates for this track were obtained from the miRBase sequences\ FTP site and from \ \ snoRNABase coordinates download page.\

\ \

References

\

\ When making use of these data, please cite the folowing articles in addition to\ the primary sources of the miRNA sequences:

\

\ Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ.\ miRBase: tools for microRNA genomics.\ Nucleic Acids Res. 2008 Jan 1;36(Database issue):D154-8.

\

\ Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ.\ miRBase: microRNA sequences, targets and gene nomenclature.\ Nucleic Acids Res. 2006 Jan 1;34(Database issue):D140-4.

\

\ Griffiths-Jones S.\ The microRNA Registry.\ Nucleic Acids Res. 2004 Jan 1;32(Database issue):D109-11.

\

\ Weber MJ.\ New human and mouse microRNA genes found by homology search.\

\ You may also want to cite The Wellcome Trust Sanger Institute \ miRBase and The Laboratoire de Biologie Moleculaire \ Eucaryote snoRNABase.

\

\ The following publication provides guidelines on miRNA annotation:\ Ambros V. et al., \ A uniform system for microRNA annotation. \ RNA. 2003;9(3):277-9.

\

\ genes 1 color 200,80,0\ dataVersion miRBase Release 13.0 (March 2009) and snoRNABase Version 3\ group genes\ longLabel C/D and H/ACA Box snoRNAs, scaRNAs, and microRNAs from snoRNABase and miRBase\ noScoreFilter .\ priority 63.6\ shortLabel sno/miRNA\ track wgRna\ type bed 8 +\ url http://www-snorna.biotoul.fr/plus.php?id=$$\ url2 http://www.mirbase.org/cgi-bin/query.pl?terms=$$\ url2Label miRBase:\ urlLabel Laboratoire de Biologie Moleculaire Eucaryote:\ visibility hide\ encodeAffyChIpHl60SitesRnapHr32 Affy Pol2 RA 32h bed 3 . Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 32hrs) Sites 0 64 50 175 0 152 215 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 50,175,0\ longLabel Affymetrix ChIP/Chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 32hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 64\ shortLabel Affy Pol2 RA 32h\ subGroups factor=Pol2 time=32h\ track encodeAffyChIpHl60SitesRnapHr32\ xenoBestMrna Other Best mRNAs psl xeno Non-Human mRNAs from GenBank Best in Genome Alignments 0 64 0 0 0 127 127 127 1 0 0

Description

\

\ This track displays translated blat alignments of vertebrate and\ invertebrate mRNA in \ GenBank from organisms other than human. \ Better alignments are indicated by darker coloration in the display.

\ \

Methods

\

\ The mRNAs were aligned against the human genome using translated blat. \ When a single mRNA aligned in multiple places, the alignment having the \ highest base identity was found. Only those alignments having a base \ identity level within 1% of the best and at least 25% base identity with the\ genomic sequence were kept.

\ \

Using the Filter

\

\ This track has a filter that can be used to change the display mode, \ change the color, and include/exclude a subset of items within the track.\ This may be helpful when many items are shown in the track display, \ especially when only some are relevant to the current task. \ The filter is located at the top of the track description page, which is \ accessed via the small button to the left of the track's graphical \ display or through the link on the track's control menu. \ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA \ display. For example, to apply the filter to all mRNAs expressed in the \ liver, type "liver" in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Multiple terms may be entered at once, \ separated by a space. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter \ criteria will be highlighted. If "or" is selected, mRNAs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display mRNAs that match the filter criteria. \ If "include" is selected, the browser will display only those \ mRNAs that match the filter criteria.\

\

\ When you have finished configuring the filter, click the Submit \ button.

\ \

Credits

\

\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, \ Wheeler DL. \ GenBank: update. Nucleic Acids Res. \ 2004 Jan 1;32(Database issue):D23-6.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ rna 1 baseColorUseCds genbank\ baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Non-$Organism mRNAs from GenBank Best in Genome Alignments\ priority 64\ shortLabel Other Best mRNAs\ showDiffBasesAllScales .\ spectrum on\ track xenoBestMrna\ type psl xeno\ visibility hide\ encodeUvaDnaRepSuper UVa DNA Rep University of Virginia DNA Replication Timing and Origins 0 64 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Overview

\

\ This super-track combines related tracks of DNA\ replication data from the University of Virginia.\ DNA replication is carefully coordinated, both across the genome and with\ respect to development. Earlier replication in S-phase is broadly correlated\ with gene density and transcriptional activity.

\

\ These tracks contain temporal profiling of DNA replication and\ origin of DNA replication in multiple cell lines, such as\ HeLa cells (cervix carcinoma). Replication timing was measured by\ analyzing Brd-U-labeled fractions from synchronized cells on tiling arrays.

\ \

Credits

\

\ Data generation and analysis for this track were performed by the\ DNA replication group in the\ Dutta Lab\ at the University of Virginia: Neerja Karnani, Christopher Taylor,\ Hakkyun Kim, Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta.

\

\ Neerja Karnani and Christopher Taylor prepared the data for presentation in\ the UCSC Genome Browser.

\ \

References

\

\ Giacca M, Pelizon C, Falaschi A.\ Mapping replication origins by quantifying relative abundance\ of nascent DNA strands using competitive polymerase chain reaction.\ Methods. 1997 Nov;13(3):301-12.

\

\ Mesner LD, Crawford EL, Hamlin JL.\ Isolating apparently pure libraries of replication origins\ from complex genomes. Mol Cell. 2006 Mar 3;21(5):719-26.

\

\ Jeon Y, Bekiranov S, Karnani N, Kapranov P, Ghosh S, MacAlpine D, Lee C,\ Hwang DS, Gingeras TR, Dutta A.\ Temporal profile of replication of human chromosomes.\ Proc Natl Acad Sci U S A. 2005 May 3;102(18):6419-24.

\ encodeChrom 0 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ group encodeChrom\ longLabel University of Virginia DNA Replication Timing and Origins\ priority 64.0\ shortLabel UVa DNA Rep\ superTrack on\ track encodeUvaDnaRepSuper\ encodeUvaDnaRep UVa DNA Rep bed 3 . University of Virginia Temporal Profiling of DNA Replication 0 64 60 75 60 10 130 10 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ The five subtracks in this annotation correspond to five different time points\ relative to the start of the DNA synthesis phase (S-phase) of the cell cycle. \

\ \

Display Conventions and Configuration

\

\ Regions that are replicated during the given time interval are shown in green. \ Varying shades of green are used to distinguish one subtrack from another.\ To display only selected subtracks, uncheck the boxes next to the tracks you \ wish to hide.

\ \

Methods

\

\ The experimental strategy adopted to map this profile involved isolation of \ replication products from HeLa cells synchronized at the G1-S boundary by \ thymidine-aphidicolin double block. Cells released from the block were\ labeled with BrdU at every two-hour interval of the 10 hours of S-phase and\ DNA was isolated from them. The heavy-light(H/L) DNA representing the pool of \ DNA replicated during each two-hour labeling period was separated from the\ unlabeled DNA by double cesium chloride density gradient centrifugation.\ The purified heavy-light DNA was then hybridized to a high-density \ genome-tiling Affymetrix array comprised of all unique probes within the \ ENCODE regions.

\

\ The raw data generated by the microarray experiments was \ processed by computing the enrichment of signal in a particular \ part of the S-phase relative to the entirety of the S-phase (10 hours).\ High confidence regions (P-value = 1E-04) of \ replication were mapped by applying the Wilcoxon Rank Sum test in a sliding\ window of size 10 kb using the standard Affymetrix data analysis tools\ and the April 2003 (hg15) version of the human genome assembly. \ These coordinates were then mapped to the July 2003 (hg17) assembly by UCSC\ using the liftOver tool.

\ \

Verification

\

\ The submitted data are from two biological experimental sets. Regions of\ significant enrichment were included from both of the biological replicates.\

\ \

Credits

\

\ Data generation and analysis for this track were performed by the \ DNA replication group in the \ Dutta Lab\ at the University of Virginia: Neerja Karnani, Christopher Taylor, \ Hakkyun Kim, Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta.

\

\ Neerja Karnani and Christopher Taylor prepared the data for presentation in \ the UCSC Genome Browser.

\ \

References

\

\ Jeon, Y., Bekiranov, S., Karnani, N., Kapranov, P., Ghosh, S., MacAlpine, D., \ Lee, C., Hwang, D.S., Gingeras, T.R. and Dutta, A.\ Temporal profile of replication of human chromosomes.\ Proc Natl Acad Sci U S A 102(18), 6419-24 (2005).

\ encodeChrom 1 altColor 10,130,10\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 60,75,60\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChrom\ longLabel University of Virginia Temporal Profiling of DNA Replication\ origAssembly hg16\ priority 64.0\ shortLabel UVa DNA Rep\ superTrack encodeUvaDnaRepSuper dense\ track encodeUvaDnaRep\ type bed 3 .\ visibility hide\ hgIkmc IKMC Genes Mapped bed 12 International Knockout Mouse Consortium Genes Mapped to Human Genome 0 64.1 0 0 0 127 127 127 0 0 0 http://www.knockoutmouse.org/search?criteria=$$

Description

\

\ This track shows genes targeted by \ International Knockout Mouse Consortium (IKMC) \ mapped to the human genome. IKMC is a \ collaboration to generate a public resource of mouse embryonic stem (ES)\ cells containing a null mutation in every gene in the mouse genome.\ Gene targets are color-coded by status:\

\

\

\ The KnockOut Mouse Project Data\ Coordination Center (KOMP DCC) is the central database resource\ for coordinating mouse gene targeting within IKMC and provides\ web-based query and display tools for IKMC data. In addition, the\ KOMP DCC website provides a tool for the scientific community to\ nominate genes of interest to be knocked out by the KOMP initiative.

\ \

\ IKMC members include\

\ \ KOMP includes two production centers: \ CSD, a collaborative team at the Children's Hospital Oakland Research Institute\ (CHORI), the Wellcome Trust Sanger Institute and the University\ of California at Davis School of Veterinary Medicine, and \ a team at the VelociGene division of Regeneron Pharmaceuticals, Inc.\ EUCOMM includes 9 participating institutions. \ NorCOMM includes several participating institutions.\

\ \

Methods

\

\ Using complementary targeting strategies, the IKMC centers \ design and create targeting vectors, mutant ES cell lines and, to some\ extent, mutant mice, embryos or sperm. Materials are distributed to\ the research community.

\

\ The KOMP Repository \ archives, maintains, and distributes IKMC products. Researchers can\ order products and get product information from the\ Repository. Researchers can also express interest in products that are\ still in the pipeline. They will then receive email notification as\ soon as KOMP generated products are available for distribution.

\

\ The process for ordering EUCOMM materials can be found \ here.

\

\ The process for ordering TIGM materials can be found \ here.

\

\ Information on NorCOMM products and services can be found \ here.\

\ Genes were mapped to the human genome by IKMC.\

\ \

Credits

\

\ Thanks to the International Knockout Mouse Consortium, and Carol Bult in \ particular, for providing these data.

\ \

References

\

\ Collins FS, Finnell RH, Rossant J, Wurst W.\ A new partner for the international knockout mouse consortium.\ Cell. 2007 Apr 20;129(2):235.

\ \

\ International Mouse Knockout Consortium, Collins FS, Rossant J, Wurst W.\ A mouse for all reasons.\ Cell. 2007 Jan 12;128(1):9-13.

\ \

\ Austin CP, Battey JF, Bradley A, Bucan M, Capecchi M, Collins FS, Dove\ WF, Duyk G, Dymecki S, Eppig JT et al.\ The knockout mouse project.\ Nat Genet. 2004 Sep;36(9):921-4.

\ \ genes 1 group genes\ itemRgb on\ longLabel International Knockout Mouse Consortium Genes Mapped to Human Genome\ mgiUrl http://www.informatics.jax.org/searches/accession_report.cgi?id=$$\ mgiUrlLabel MGI Report:\ noScoreFilter .\ priority 64.1\ shortLabel IKMC Genes Mapped\ track hgIkmc\ type bed 12\ url http://www.knockoutmouse.org/search?criteria=$$\ urlLabel KOMP Data Coordination Center:\ visibility hide\ encodeUvaDnaRepSeg UVa DNA Rep Seg bed 3 . University of Virginia DNA Replication Temporal Segmentation 0 64.1 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ The four subtracks in this annotation correspond to replication\ timing categories for DNA synthesis. Replication is segregated\ into early specific (Early), mid specific (Mid), late specific (Late),\ and non-specific (PanS). The first three categories correspond\ to regions that replicated in a time point-specific manner; \ the latter category encompasses regions that replicated in a \ temporally non-specific manner.

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ tracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide.

\ \

Methods

\

\ The experimental strategy adopted to map this profile involved isolation of\ replication products from HeLa cells synchronized at the G1-S boundary by\ thymidine-aphidicolin double block. Cells released from the block were labeled\ with BrDu at every two-hour interval of S-phase and DNA was isolated from \ them. The heavy-light (H/L) DNA representing the pool of DNA replicated during \ each two-hour labeling period was separated from unlabeled DNA by double cesium \ chloride density gradient centrifugation. The purified H/L DNA was \ then hybridized to a high-density genome-tiling Affymetrix array comprised \ of all unique probes within the ENCODE regions.

\

\ The time of replication of 50% (TR50) of each microarray probe was\ calculated by accumulating the sum over the five time points and\ linearly interpolating the time when 50% was reached. Each probe\ was also classified as temporally specific or non-specific based on\ whether or not at least 50% of the accumulated signal appeared in a single\ time point.

\

\ The TR50 data was then analyzed within a 20 kb sliding window \ to classify regions as specific versus non-specific based\ on the ratio of specific to non-specific probes within the window.\ Specific regions were further classified as early, mid, or late\ replicating based on the average TR50 of specific probes within the\ window. The resulting regions form a non-overlapping segregation\ of the replication data into the four given categories of\ replication timing.

\ \

Verification

\

\ The replication experiments were completed for two biological sets\ in the HeLa-adherent cell line.

\ \

Credits

\

\ Data generation and analysis for this track were performed by the\ DNA replication group in the Dutta Lab at the University of \ Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim,\ Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta.

\

\ Neerja Karnani and Christopher Taylor prepared the data for \ presentation in the UCSC Genome Browser.

\ \

References

\

\ Jeon, Y., Bekiranov, S., Karnani, N., Kapranov, P., Ghosh, S.,\ MacAlpine, D., Lee, C., Hwang, D.S., Gingeras, T.R. and Dutta, A.\ Temporal profile of replication of human chromosomes.\ Proc Natl Acad Sci U S A 102(18), 6419-24 (2005).

\ \ encodeChrom 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE Oct 2005 Freeze\ group encodeChrom\ longLabel University of Virginia DNA Replication Temporal Segmentation\ origAssembly hg16\ priority 64.1\ shortLabel UVa DNA Rep Seg\ superTrack encodeUvaDnaRepSuper dense\ track encodeUvaDnaRepSeg\ type bed 3 .\ visibility hide\ encodeUvaDnaRepOrigins UVa DNA Rep Ori bed 3 . University of Virginia DNA Replication Origins 0 64.2 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ The subtracks within this annotation show replication origins identified using \ the nascent strand method (Ori-NS), the bubble trapping method (Ori-Bubble) \ and the TR50 local minima method (Ori-TR50). \ Tracks are available for HeLa cells (cervix carcinoma) for all methods and \ GM06990 cells (lymphoblastoid) for Ori-NS.\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ tracks. To show only selected subtracks within this annotation, \ uncheck the boxes next to the tracks you wish to hide.\

\ \

Nascent Strand Method (Ori-NS)

\

Description

\

\ ENCODE region-wide mapping of replication origins was performed. \ Origin-centered nascent-strands purified from HeLa and \ GM06990 cell lines were hybridized to Affymetrix \ ENCODE tiling arrays.\ \

Methods

\

\ Cells in their exponential stage of growth were labeled, in culture, with \ bromodeoxyuridine (BrdU) for 30 mins. DNA was then isolated from the cells. \ Nascent strands of 0.5-2.5 kb synthesized with incorporation of BrdU, \ representing the replication origins, were purified using a sucrose \ gradient followed by immunoprecipitation with BrdU antibody (Giacca \ et al., 1997). The purified nascent strands were amplified and \ then hybridized to Affymetrix ENCODE tiling arrays, which have 25-mer \ probes tiled every 22 bp, on average, in the non-repetitive sequence of the \ ENCODE regions. As an experimental control, genomic DNA was hybridized to \ arrays independently.

\

\ Replication origins were identified by estimating the \ significance of the enrichment of nascent strands DNA (treatment) signal over \ genomic DNA (control) signal in a sliding window of 1000 bp. An estimate of \ significance in the window was calculated by computing the p-value using the \ Wilcoxon Rank-Sum test over all three biological replicates and control signal \ estimates in that window. The origins (Ori-NS) represented in the subtrack are \ the genomic regions that showed a signal enrichment pValue <= 0.001.\ \

Verification

\

\ The origin mapping experiments were completed for three biological sets.\ \

Credits

\

\ Data generation and analysis for the subtracks using the Ori-NS method were \ performed by the DNA replication group in the Dutta Lab at the University of \ Virginia: Neerja Karnani, Christopher Taylor, Ankit Malhotra, Gabe Robins \ and Anindya Dutta. \

\ Christopher Taylor and Neerja Karnani prepared the data for presentation in \ the UCSC Genome Browser. \ \

References

\

\ Giacca M, Pelizon C, Falaschi A. Mapping replication origins by quantifying relative abundance \ of nascent DNA strands using competitive polymerase chain reaction. \ Methods. 1997;13(3):301-12.\

\ \

Bubble Trapping Method (Ori-Bubble)

\

Description

\

\ ENCODE region-wide mapping of replication origins in HeLa \ cells was performed by the bubble trapping method. Replication origins were\ identified by hybridization to Affymetrix ENCODE tiling arrays. \ \

Methods

\

\ The bubble trapping method works on the principle that circular plasmids can be\ trapped in gelling agarose followed by the application of electrical current\ for a prolonged period of time (see Mesner et al. 2006 for more \ details). Entrapment occurs by an apparent physical linkage of the circular \ DNA with the agarose matrix. The circular bubble component of the DNA \ replication intermediates was therefore enriched by agarose trapping. After \ recovery from the agarose gel, a library of the entrapped DNA was formed by DNA cloning. Subsequently, DNA from the library was labeled and hybridized to \ Affymetrix ENCODE tiling arrays, which have 25-mer probes tiled every 22 bp \ on average in the non-repetitive ENCODE regions. As an experimental control, \ genomic DNA was hybridized to arrays independently.

\

\ Replication origins were identified by estimating the significance of the \ enrichment of the bubble-trapped DNA (treatment) signal over genomic DNA \ (control) signal in a sliding window of 10,000 bp. An estimate of significance \ in the window was calculated by computing the p-value using the Wilcoxon \ Rank-Sum test over all three biological replicates and the control signal \ estimates in that window. The origins (Ori-Bubble) hence represented in the \ UCSC browser track are the genomic regions that showed a signal enrichment \ pValue <= 0.001.\

\ \

Verification

\

\ The origin mapping experiments were completed for two biological sets. \

\ \

Credits

\

\ Data generation and analysis for the subtrack using the Ori-bubble method \ were performed by the DNA replication group in the Dutta Lab and Hamlin Lab \ at the University of Virginia: Neerja Karnani, Larry Mesner, Christopher \ Taylor, Ankit Malhotra, Gabe Robins, Anindya Dutta and Joyce Hamlin.

\

\ Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser.

\ \

References

\

\ Mesner LD, Crawford EL, Hamlin JL. Isolating apparently pure libraries of replication origins \ from complex genomes. Mol Cell. 2006 Mar 3;21(5):719-26.\

\ \

TR50 local minima method (Ori-TR50)

\

Description

\

\ ENCODE region-wide mapping of replication origins in HeLa \ cells was performed by the TR50 local minima method. Replication \ origins were identified by hybridization to Affymetrix ENCODE tiling arrays.\

\ \

Methods

\

\ The experimental strategy adopted to map this profile involved isolation of \ replication products from HeLa cells synchronized at the G1-S boundary by \ thymidine-aphidicolin double block. Cells released from the block were labeled \ with BrdU at every two-hour interval of the 10 hours of S-phase. Subsequently,\ DNA was isolated from the cells. The heavy-light (H/L) DNA representing the \ pool of DNA replicated during each two-hour labeling period was separated from \ the unlabeled DNA by double cesium chloride density gradient centrifugation. \ The purified H/L DNA was then hybridized to a high-density genome-tiling\ Affymetrix array comprised of all unique probes within the ENCODE regions.

\

\ The time of replication of 50% (TR50) of each microarray probe was \ calculated by accumulating the sum over the five time points and linearly \ interpolating the time when 50% was reached. Each probe was also classified \ as showing temporally specific replication (all alleles replicating \ together within a two-hour window) or temporally non-specific replication \ (at least one allele replicating apart from the others by at least a two \ hour difference).

\

\ The TR50 data for the temporally specific probes was then \ smoothed within a 60 kb window using lowess smoothing. Local minima (within \ a 30 kb window) on the smoothed TR50 curve were identified which \ had at least 30 probes in the window on both sides of the minimum to locate \ possible origins of replication. A confidence value was calculated for each \ site as the average difference from the value of the local minimum of all \ TR50 values falling into the 30 kb window.

\ \

Verification

\

\ The replication experiments were completed for two biological sets and a \ technical replicate in the HeLa adherent cell line.

\ \

Credits

\

\ Data generation and analysis for the subtrack using the Ori-TR50 method \ were performed by the DNA replication group in the Dutta Lab at the \ University of Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim, \ Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta.

\

\ Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser.

\ \

References

\

\ Jeon Y, Bekiranov S, Karnani N, Kapranov P, Ghosh S, MacAlpine D, Lee C, \ Hwang DS, Gingeras TR, Dutta A.\ Temporal profile of replication of human chromosomes.\ Proc Natl Acad Sci U S A. 2005 May 3;102(18):6419-24.\

\ encodeChrom 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE Oct 2005 Freeze, May 2007 data\ group encodeChrom\ longLabel University of Virginia DNA Replication Origins\ origAssembly hg17\ priority 64.2\ shortLabel UVa DNA Rep Ori\ superTrack encodeUvaDnaRepSuper dense\ track encodeUvaDnaRepOrigins\ type bed 3 .\ visibility hide\ encodeUvaDnaRepTr50 UVa DNA Rep TR50 wig 2.05 6.36 University of Virginia DNA Smoothed Timing at 50% Replication 0 64.3 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This annotation shows smoothed replication\ timing for DNA synthesis as the time of 50% replication (TR50).

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\ \

Methods

\

\ The experimental strategy adopted to map this profile involved \ isolation of replication products from HeLa cells synchronized \ at the G1-S boundary by thymidine-aphidicolin double block. Cells \ released from the block were labeled with BrdU at every two-hour \ interval of the 10 hours of S-phase and DNA was isolated from them. \ The heavy-light (H/L) DNA representing the pool of DNA replicated \ during each two-hour labeling period was separated from the unlabeled \ DNA by double cesium chloride density gradient centrifugation. The \ purified H/L DNA was then hybridized to a high-density\ genome-tiling Affymetrix array comprised of all unique probes within the\ ENCODE regions.

\

\ The time of replication of 50% (TR50) of each microarray probe was\ calculated by accumulating the sum over the five time points and\ linearly interpolating the time when 50% was reached. Each probe\ was also classified as temporally specific or non-specific based on\ whether at least 50% of the accumulated signal appeared in a single\ time point or not.

\

\ The TR50 data for all specific probes were then lowess-smoothed within\ a 60 kb window to provide the profile displayed in the annotation.

\ \

Verification

\

\ The replication experiments were completed for two biological sets\ in the HeLa adherent cell line.

\ \

Credits

\

\ Data generation and analysis for this track were performed by the\ DNA replication group in the Dutta Lab at the University of \ Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim,\ Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta.

\

\ Neerja Karnani and Christopher Taylor prepared the data for \ presentation in the UCSC Genome Browser.

\ \

References

\

\ Jeon, Y., Bekiranov, S., Karnani, N., Kapranov, P., Ghosh, S.,\ MacAlpine, D., Lee, C., Hwang, D.S., Gingeras, T.R. and Dutta, A.\ Temporal profile of replication of human chromosomes.\ Proc Natl Acad Sci U S A 102(18), 6419-24 (2005).\ encodeChrom 0 autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ dataVersion ENCODE Oct 2005 Freeze\ group encodeChrom\ longLabel University of Virginia DNA Smoothed Timing at 50% Replication\ maxHeightPixels 128:16:16\ origAssembly hg17\ priority 64.3\ shortLabel UVa DNA Rep TR50\ spanList 1\ superTrack encodeUvaDnaRepSuper dense\ track encodeUvaDnaRepTr50\ type wig 2.05 6.36\ viewLimits 2.2:5.2\ visibility hide\ windowingFunction mean\ encodeAffyChIpHl60PvalRaraHr00 Affy RARA RA 0h wig 0.0 534.54 Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 0hrs) P-Value 0 65 25 200 0 140 227 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 25,200,0\ longLabel Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 0hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 65\ shortLabel Affy RARA RA 0h\ subGroups factor=RARA time=0h\ track encodeAffyChIpHl60PvalRaraHr00\ xenoEst Other ESTs psl xeno Non-Human ESTs from GenBank 0 65 0 0 0 127 127 127 1 0 0 http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$

Description

\

\ This track displays translated blat alignments of expressed sequence tags \ (ESTs) in GenBank from organisms other than human.\ ESTs are single-read sequences, typically about 500 bases in length, that \ usually represent fragments of transcribed genes.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The strand information (+/-) for this track is in two parts. The\ first + or - indicates the orientation of the query sequence whose\ translated protein produced the match. The second + or - indicates the\ orientation of the matching translated genomic sequence. Because the two\ orientations of a DNA sequence give different predicted protein sequences,\ there are four combinations. ++ is not the same as --, nor is +- the same\ as -+.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of \ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "tissue" table contains all the types of tissues that can be \ entered into the tissue text box. Multiple terms may be entered at once, \ separated by a space. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter \ criteria will be highlighted. If "or" is selected, ESTs that \ match any one of the filter criteria will be highlighted.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display ESTs that match the filter criteria. \ If "include" is selected, the browser will display only those \ ESTs that match the filter criteria.\

\

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those \ that differ from the genomic sequence. For more information about this option,\ click \ here.\ Several types of alignment gap may also be colored; \ for more information, click \ here.\

\ \

Methods

\

\ To generate this track, the ESTs were aligned against the genome using \ blat. When a single EST aligned in multiple places, the \ alignment having the highest base identity was found. Only alignments \ having a base identity level within 0.5% of the best and at least 96% base \ identity with the genomic sequence were kept.

\ \

Credits

\

\ This track was produced at UCSC from EST sequence data submitted to the \ international public sequence databases by scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, \ Wheeler DL. \ GenBank: update. Nucleic Acids Res. \ 2004 Jan 1;32(Database issue):D23-6.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ \ rna 1 baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Non-$Organism ESTs from GenBank\ priority 65\ shortLabel Other ESTs\ spectrum on\ track xenoEst\ type psl xeno\ url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$\ visibility hide\ encodeYaleChIPSTAT1Pval Yale ChIP pVal bedGraph 4 Yale ChIP/Chip (STAT1 ab, Hela cells, P-Values) 0 65 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows probable sites of STAT1 binding in HeLa cells\ as determined by chromatin immunoprecipitation followed by\ microarray analysis. STAT1 (Signal Transducer and\ Activator of Transcription) is a transcription factor that\ moves to the nucleus and binds DNA only in response to a cytokine\ signal such as interferon-gamma. HeLa cells are a common cell line\ derived from a cervical cancer. Each of the four subtracks represents \ a different microarray platform. The track as a whole can be used to \ compare results across microarray platforms.

\

\ The first three platforms are custom maskless \ photolithographic arrays with oligonucleotides tiling most of the \ non-repetitive DNA sequence of the ENCODE regions:\

\ The fourth array platform is an ENCODE PCR Amplicon array manufactured by \ Bing Ren's lab at UCSD.

\

\ The subtracks show the ratio of immunoprecipitated DNA from \ cytokine-stimulated cells vs. unstimulated cells in each of the four \ platforms. The ratio is calculated as -log10(p-value) \ in a 501-base window. The data shown is the combined result of multiple\ biological replicates: five for the first maskless array \ (50-mer every 38 bp), two for the second maskless array (36-mer \ every 36 bp), three for the third maskless array (50-mer every 50 \ bp) and six for the PCR Amplicon array. \

\

\ These data are available at NCBI GEO as \ GSE2714, which also provides additional information about \ the experimental protocols.

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ "wiggle" tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\ \

Methods

\

\ For all arrays, the STAT1 ChIP DNA was labeled with Cy5 and the control DNA \ was labeled with Cy3.

\ \

Maskless photolithographic arrays

\

\ The data from replicates were median-scaled and quantile-normalized \ to each other. After normalization, replicates were condensed to a\ single value. Using a \ 501 bp sliding window centered on each oligonucleotide probe, a \ signal map (estimating the fold enrichment [log2 \ scale] of ChIP DNA) is generated by computing the pseudomedian \ signal of all log2(Cy5/Cy3) ratios (median of \ pairwise averages) within the window (including replicates). \ Using the same procedure, a -log10(p-value) map \ (measuring significance of enrichment of oligonucleotide probes \ in the window) for all sliding windows can be made by computing \ P-values using the Wilcoxon paired signed rank test comparing \ fluorensent intensity between Cy5 and Cy3 for each \ oligonucleotide probe (Cy5 and Cy3 signals from the same array). \ A binding site is determined by thresholding both on fold \ enrichment and -log10(p-value) and requiring a \ maximum gap and a minimum run between oligonucleotide positions.

\

\ For the first maskless array (50-mer every 38 bp):\
\    log2(Cy5/Cy3) >= 1.25, -log10(p-value) >=8.0, MaxGap <= 100 bp, MinRun >= 180 bp

\

\ For the second maskless array (36-mer every 36 bp): \
\    log2(Cy5/Cy3) >= 0.25, -log10(p-value) >=4.0, MaxGap <= 250 bp, MinRun >= 0 bp

\

\ For the third maskless array (50-mer every 50 bp): \
\    log2(Cy5/Cy3) >= 0.25, -log10(p-value) >=4.0, MaxGap <= 250 bp, MinRun >= 0 bp

\ \

PCR Amplicon Arrays

\

\ The Cy5 and Cy3 array data were loess-normalized between channels \ on the same slide and then between slides. A z-score was then \ determined for each PCR amplicon from the distribution of \ log(Cy5/Cy3) in a local log(Cy5*Cy3) intensity window (see \ Quackenbush, 2002 and the \ Express \ Yourself website for more details). From the z-score, a P-value was then \ associated with each PCR amplicon. Hits were determined using a 3 sigma \ threshold and requiring a spot to be present on three out of six arrays.

\ \

Verification

\

\ ChIP-chip binding sites were verified by comparing "hit lists" \ generated from combinations of different biological replicates. \ Only experiments that yielded a significant overlap (greater than \ 50 percent) were accepted. As an independent check (for maskless \ arrays), data on the microarray were randomized with respect to \ position and re-scored; significantly fewer hits (consistent \ with random noise) were generated this way.

\ \

Credits

\

\ This data was generated and analyzed by the labs of Michael \ Snyder, Mark Gerstein and Sherman Weissman at Yale University. The PCR \ Amplicon arrays were manufactured by Bing Ren's lab at UCSD.

\ \

References

\

\ Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., \ Piccolboni, A., Sementchenko, V., Cheng, J. et al.\ Unbiased mapping of transcription factor binding sites along \ human chromosomes 21 and 22 points to widespread regulation of noncoding \ RNAs. Cell 116(4), 499-509 (2004).

\

\ Euskirchen, G., Royce, T.E., Bertone, P., Martone, R., Rinn, J.L., Nelson, \ F.K., Sayward, F., Luscombe, N.M., Miller, P. et al.\ CREB binds to multiple loci on human chromosome 22, \ Mol Cell Biol. 24(9), 3804-14 (2004).

\

\ Luscombe, N.M., Royce, T.E., Bertone, P., Echols, N., Horak, C.E., Chang, \ J.T., Snyder, M. and Gerstein, M.\ ExpressYourself: A modular platform for processing and \ visualizing microarray data.\ Nucleic Acids Res. 31(13), 3477-82 (2003).

\

\ Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T.E., \ Luscombe, N.M., Rinn, J.L., Nelson, F.K., Miller, P. et al.\ Distribution of NF-kappaB-binding sites across human chromosome \ 22.\ Proc Natl Acad Sci U S A. 100(21), 12247-52 (2003).

\

\ Quackenbush, J.. \ Microarray data normalization and transformation, \ Nat Genet. 32(Suppl), 496-501 (2002).

\ \ encodeChip 0 autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells, P-Values)\ maxHeightPixels 128:16:16\ maxLimit 18.2\ minLimit 0\ origAssembly hg16\ priority 65.0\ shortLabel Yale ChIP pVal\ track encodeYaleChIPSTAT1Pval\ type bedGraph 4\ viewLimits 0:10\ visibility hide\ windowingFunction mean\ encodeYaleChIPSTAT1Sig Yale ChIP Sig bedGraph 4 Yale ChIP/Chip: STAT1 ab, Hela cells, Signal 0 65.1 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ Each of these four tracks shows the map of signal intensity \ (estimating the fold enrichment [log2 scale] of ChIP DNA vs \ unstimulated DNA) for STAT1 ChIP-chip using Human Hela S3 cells \ hybridized to four different array designs/platforms. The first \ three platforms are custom maskless photolithographic arrays \ with oligonucleotides tiling most of the non-repetitive DNA \ sequence of the ENCODE regions: \

\ The fourth array platform is an ENCODE PCR \ Amplicon array manufactured by Bing Ren's lab at UCSD.

\

\ Each track shows the combined results of multiple biological replicates: five \ for the first maskless array (50-mer every 38 bp), two for the \ second maskless array (36-mer every 36 bp), three for the third \ maskless array (50-mer every 50 bp) and six for the PCR Amplicon \ array. For all arrays, the STAT1 ChIP DNA was labeled with Cy5 and \ the control DNA was labeled with Cy3.

\

\ These data are available at NCBI GEO as \ GSE2714, which also provides additional information about\ the experimental protocols.

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ "wiggle" tracks. The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\ \

Methods

\ \

Maskless photolithographic arrays

\

\ The data from replicates were median-scaled and quantile-normalized to each \ other (both Cy3 and Cy5 channels). Using a \ 501 bp sliding window centered on each oligonucleotide probe, a \ signal map (estimating the fold enrichment [log2 \ scale] of ChIP DNA) was generated by computing the pseudomedian \ signal of all log2(Cy5/Cy3) ratios (median of \ pairwise averages) within the window, including replicates. \ Using the same procedure, a -log10(P-value) map \ (measuring significance of enrichment of oligonucleotide probes \ in the window) for all sliding windows was made by computing \ P-values using the Wilcoxon paired signed rank test comparing \ fluorensent intensity between Cy5 and Cy3 for each \ oligonucleotide probe (Cy5 and Cy3 signals from the same array). \ A binding site was determined by thresholding both on fold \ enrichment and -log10(P-value) and requiring a \ maximum gap and a minimum run between oligonucleotide positions.

\

\ For the first maskless array (50-mer every 38 bp):\
\    log2(Cy5/Cy3) >= 1.25, -log10(P-value) >= \ 8.0, MaxGap <= 100 bp, MinRun >= 180 bp

\

\ For the second maskless array (36-mer every 36 bp): \
\    log2(Cy5/Cy3) >= 0.25, -log10(P-value) >= \ 4.0, MaxGap <= 250 bp, MinRun >= 0 bp

\

\ For the third maskless array (50-mer every 50 bp): \
\    log2(Cy5/Cy3) >= 0.25, -log10(P-value) >= \ 4.0, MaxGap <= 250 bp, MinRun >= 0 bp

\ \

PCR Amplicon Arrays

\

\ The Cy5 and Cy3 array data were loess-normalized between channels \ on the same slide and then between slides. A z-score was then \ determined for each PCR amplicon from the distribution of \ log(Cy5/Cy3) in a local log(Cy5*Cy3) intensity window (see \ Quackenbush, 2002 and the \ Express \ Yourself website for more details). From the z-score, a P-value was then \ associated with each PCR amplicon. Hits were determined using a 3 sigma \ threshold and requiring a spot to be present on three out of six arrays.

\ \

Verification

\

\ ChIP-chip binding sites were verified by comparing "hit lists" \ generated from combinations of different biological replicates. \ Only experiments that yielded a significant overlap (greater than \ 50 percent) were accepted. As an independent check (for maskless \ arrays), data on the microarray were randomized with respect to \ position and re-scored; significantly fewer hits (consistent \ with random noise) were generated this way.

\ \

Credits

\

\ These data were generated and analyzed by the labs of Michael \ Snyder, Mark Gerstein and Sherman Weissman at Yale University. The PCR \ Amplicon arrays were manufactured by Bing Ren's lab at UCSD.

\ \

References

\

\ Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., \ Piccolboni, A., Sementchenko, V., Cheng, J. et al.\ Unbiased mapping of transcription factor binding sites along \ human chromosomes 21 and 22 points to widespread regulation of noncoding \ RNAs. Cell 116(4), 499-509 (2004).

\

\ Euskirchen, G., Royce, T.E., Bertone, P., Martone, R., Rinn, J.L., Nelson, \ F.K., Sayward, F., Luscombe, N.M., Miller, P. et al.\ CREB binds to multiple loci on human chromosome 22, \ Mol Cell Biol. 24(9), 3804-14 (2004).

\

\ Luscombe, N.M., Royce, T.E., Bertone, P., Echols, N., Horak, C.E., Chang, \ J.T., Snyder, M. and Gerstein, M.\ ExpressYourself: A modular platform for processing and \ visualizing microarray data.\ Nucleic Acids Res. 31(13), 3477-82 (2003).

\

\ Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T.E., \ Luscombe, N.M., Rinn, J.L., Nelson, F.K., Miller, P. et al.\ Distribution of NF-kappaB-binding sites across human chromosome \ 22.\ Proc Natl Acad Sci U S A. 100(21), 12247-52 (2003).

\

\ Quackenbush, J.. \ Microarray data normalization and transformation, \ Nat Genet. 32(Suppl), 496-501 (2002).

\ \ encodeChip 0 autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel Yale ChIP/Chip: STAT1 ab, Hela cells, Signal\ maxHeightPixels 128:16:16\ maxLimit 3.69\ minLimit -4.19\ origAssembly hg16\ priority 65.1\ shortLabel Yale ChIP Sig\ track encodeYaleChIPSTAT1Sig\ type bedGraph 4\ viewLimits 0:2\ visibility hide\ windowingFunction mean\ encodeYaleChIPSTAT1Sites Yale ChIP Sites bed . Yale ChIP/Chip (STAT1 ab, Hela cells, Binding Sites) 0 65.2 0 0 0 127 127 127 0 0 18 chr1,chr10,chr11,chr13,chr14,chr15,chr16,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ Each of these four tracks shows the binding sites for STAT1 ChIP-chip \ using Human Hela S3 cells hybridized to four different array \ designs/platforms. The first \ three platforms are custom maskless photolithographic arrays \ with oligonucleotides tiling most of the non-repetitive DNA \ sequence of the ENCODE regions: \

\ The fourth array platform is an ENCODE PCR \ Amplicon array manufactured by Bing Ren's lab at UCSD.

\

\ Each track shows the combined results of multiple biological replicates: five \ for the first maskless array (50-mer every 38 bp), two for the \ second maskless array (36-mer every 36 bp), three for the third \ maskless array (50-mer every 50 bp) and six for the PCR Amplicon \ array. For all arrays, the STAT1 ChIP DNA was labeled with Cy5 and \ the control DNA was labeled with Cy3. See NCBI GEO \ GSE2714 for details of the experimental protocols.

\ \

Methods

\ \

Maskless photolithographic arrays

\

\ The data from replicates were median-scaled and quantile-normalized to each \ other (both Cy3 and Cy5 channels). Using a \ 501 bp sliding window centered on each oligonucleotide probe, a \ signal map (estimating the fold enrichment [log2 \ scale] of ChIP DNA) was generated by computing the pseudomedian \ signal of all log2(Cy5/Cy3) ratios (median of \ pairwise averages) within the window, including replicates. \ Using the same procedure, a -log10(P-value) map \ (measuring significance of enrichment of oligonucleotide probes \ in the window) for all sliding windows was made by computing \ P-values using the Wilcoxon paired signed rank test comparing \ fluorensent intensity between Cy5 and Cy3 for each \ oligonucleotide probe (Cy5 and Cy3 signals from the same array). \ A binding site was determined by thresholding both on fold \ enrichment and -log10(P-value) and requiring a \ maximum gap and a minimum run between oligonucleotide positions.

\

\ For the first maskless array (50-mer every 38 bp):\
\    log2(Cy5/Cy3) >= 1.25, -log10(P-value) >= \ 8.0, MaxGap <= 100 bp, MinRun >= 180 bp

\

\ For the second maskless array (36-mer every 36 bp): \
\    log2(Cy5/Cy3) >= 0.25, -log10(P-value) >= \ 4.0, MaxGap <= 250 bp, MinRun >= 0 bp

\

\ For the third maskless array (50-mer every 50 bp): \
\    log2(Cy5/Cy3) >= 0.25, -log10(P-value) >= \ 4.0, MaxGap <= 250 bp, MinRun >= 0 bp

\ \

PCR Amplicon Arrays

\

\ The Cy5 and Cy3 array data were loess-normalized between channels \ on the same slide and then between slides. A z-score was then \ determined for each PCR amplicon from the distribution of \ log(Cy5/Cy3) in a local log(Cy5*Cy3) intensity window (see \ Quackenbush, 2002 and the \ Express \ Yourself website for more details). From the z-score, a P-value was then \ associated with each PCR amplicon. Hits were determined using a 3 sigma \ threshold and requiring a spot to be present on three out of six arrays.

\ \

Verification

\

\ ChIP-chip binding sites were verified by comparing "hit lists" \ generated from combinations of different biological replicates. \ Only experiments that yielded a significant overlap (greater than \ 50 percent) were accepted. As an independent check (for maskless \ arrays), data on the microarray were randomized with respect to \ position and re-scored; significantly fewer hits (consistent \ with random noise) were generated this way.

\ \

Credits

\

\ This data was generated and analyzed by the labs of Michael Snyder, \ Mark Gerstein and Sherman Weissman at Yale University. The PCR Amplicon arrays\ were manufactured by Bing Ren's lab at UCSD.\ \

References

\

\ Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., \ Piccolboni, A., Sementchenko, V., Cheng, J. et al.\ Unbiased mapping of transcription factor binding sites along \ human chromosomes 21 and 22 points to widespread regulation of noncoding \ RNAs. Cell 116(4), 499-509 (2004).

\

\ Euskirchen, G., Royce, T.E., Bertone, P., Martone, R., Rinn, J.L., Nelson, \ F.K., Sayward, F., Luscombe, N.M., Miller, P. et al.\ CREB binds to multiple loci on human chromosome 22, \ Mol Cell Biol. 24(9), 3804-14 (2004).

\

\ Luscombe, N.M., Royce, T.E., Bertone, P., Echols, N., Horak, C.E., Chang, \ J.T., Snyder, M. and Gerstein, M.\ ExpressYourself: A modular platform for processing and \ visualizing microarray data.\ Nucleic Acids Res. 31(13), 3477-82 (2003).

\

\ Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T.E., \ Luscombe, N.M., Rinn, J.L., Nelson, F.K., Miller, P. et al.\ Distribution of NF-kappaB-binding sites across human chromosome \ 22.\ Proc Natl Acad Sci U S A. 100(21), 12247-52 (2003).

\

\ Quackenbush, J.. \ Microarray data normalization and transformation, \ Nat Genet. 32(Suppl), 496-501 (2002).

\ \ encodeChip 1 chromosomes chr1,chr10,chr11,chr13,chr14,chr15,chr16,chr19,chr2,chr20,chr21,chr22,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChip\ longLabel Yale ChIP/Chip (STAT1 ab, Hela cells, Binding Sites)\ origAssembly hg16\ priority 65.2\ shortLabel Yale ChIP Sites\ track encodeYaleChIPSTAT1Sites\ type bed .\ visibility hide\ encodeAffyChIpHl60SitesRaraHr00 Affy RARA RA 0h bed 3 . Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 0hrs) Sites 0 66 25 200 0 140 227 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 25,200,0\ longLabel Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 0hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 66\ shortLabel Affy RARA RA 0h\ subGroups factor=RARA time=0h\ track encodeAffyChIpHl60SitesRaraHr00\ anyCovBed mRNA/EST/Pseud bed 3 . Blastz Alignments of GenBank mRNA Including Pseudogenes and All ESTs 0 66 170 128 128 212 191 191 0 0 0 rna 1 color 170,128,128\ group rna\ longLabel Blastz Alignments of GenBank mRNA Including Pseudogenes and All ESTs\ priority 66\ shortLabel mRNA/EST/Pseud\ track anyCovBed\ type bed 3 .\ visibility hide\ encodeRegulomeBase UW/Reg DNaseI Sens wig 0.0 3.0 ENCODE UW/Regulome Mean DNaseI Sensitivity 0 66.1 0 0 0 127 127 127 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX,

Description

\

\ This track shows the moving baseline of mean DNaseI sensitivity, \ computed over each PCR amplicon using a locally-weighted least squares \ (LOWESS)-based algorithm described in Dorschner et al (2004).\ The track is one of a set of tracks that annotate continuous DNaseI \ sensitivity measurements and DNaseI hypersensitive sites (HSs) over ENCODE \ regions. \ DNaseI has long been used to map general chromatin accessibility and the \ DNaseI "hyperaccessibility" or "hypersensitivity" that is a \ universal feature of active cis-regulatory sequences.\ The data were produced using quantitative chromatin profiling (QCP) \ (Dorschner et al.).

\

\ See the UW/Reg Amplicon track for a list of the cell lines/phenotypes studied\ in these experiments.

\ \

Display Conventions and Configuration

\

\ The displayed values are calculated as (copies in DNaseI-untreated / copies \ in DNaseI-treated). Thus, increasing values represent increasing \ sensitivity.

\

\ The subtracks within this composite annotation track\ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\

\ Color differences among the subtracks are arbitrary; they provide a\ visual cue for distinguishing the different cell lines/phenotypes.

\ \

Methods

\

\ QCP was performed as described in Dorschner et al. See the UW/Reg\ Amplicon track description for more information.

\ \

Verification

\ See the UW/Reg Amplicon track description for verification information.

\ \

Credits

\

\ Data generation, analysis, and validation were performed jointly by groups at\ Regulome Corporation and the University of Washington (UW) in Seattle.

\

\ Regulome Corp.: Michael O.\ Dorschner, Richard Humbert, Peter J. Sabo, Anthony Shafer, Jeff Goldy,\ Molly Weaver, Kristin Lee, Fidencio Neri, Brendan Henry, Mike Hawrylycz, Paul\ Tittel, Jim Wallace, Josh Mack, Janelle Kawamoto, John A. Stamatoyannopoulos.\

\

\ UW Medical\ Genetics: Patrick Navas, Man Yu, Hua Cao, Brent Johnson, Ericka\ Johnson, George Stamatoyannopoulos.

\

\ UW Genome Sciences: \ Scott Kuehn, Robert Thurman, William S. Noble.

\ \

References

\

\ Dorschner, M.O., Hawrylycz, M., Humbert, R., Wallace, J.C., Shafer, A., \ Kawamoto, J., Mack, J., Hall, R., Goldy, J., Sabo, P.J. et al.\ High-throughput localization of functional elements by \ quantitative chromatin profiling.\ Nat Methods 1(3), 219-25 (2004).

\ \ encodeChrom 0 autoScale Off\ chromosomes chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChrom\ longLabel ENCODE UW/Regulome Mean DNaseI Sensitivity\ maxHeightPixels 128:16:16\ origAssembly hg16\ priority 66.1\ shortLabel UW/Reg DNaseI Sens\ smoothingWindow 2\ spanList 250\ track encodeRegulomeBase\ type wig 0.0 3.0\ visibility hide\ encodeRegulomeProb UW/Reg DNaseI HSs bedGraph 4 ENCODE UW/Regulome DNaseI hypersensitive sites/scores 0 66.2 0 0 0 127 127 127 0 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX,

Description

\

\ This track identifies amplicons overlying DNaseI hypersensitive sites (HSs)\ and provides an empirical P-value for each. \ The track is one of a set of tracks that annotate continuous DNaseI \ sensitivity measurements and DNaseI hypersensitive sites (HSs) over ENCODE \ regions. \ DNaseI has long been used to map \ general chromatin accessibility and the DNaseI "hyperaccessibility" \ or "hypersensitivity" that is a universal feature of active \ cis-regulatory sequences.\ The data were produced using quantitative chromatin profiling (QCP) \ (Dorschner et al., 2004).\

\ See the UW/Reg Amplicon track for a list of the cell lines/phenotypes studied\ in these experiments.

\ \

Display Conventions and Configuration

\

\ Data values are represented \ on a vertical axis as a score between 0 and 3, corresponding to \ -log10(P-value) (i.e., a score of 3 indicates a P-value of \ less than 0.001).\ Note that these are empirically determined P-values, not binomial/Gaussian \ P-values. Also, the HSs are called only in the context of plates with an \ acceptable (though conservative) plate quality score (see the Regulome Quality \ track).

\

\ The subtracks within this composite annotation track\ may be configured in a variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options are shown at the top of \ the track description page, followed by a list of subtracks. To display only \ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the \ Graph\ configuration help link.

\

\ Color differences among the subtracks are arbitrary; they provide a\ visual cue for distinguishing the different cell lines/phenotypes.

\ \

Methods

\

\ QCP was performed as described in Dorschner et al. See the UW/Reg\ Amplicon track description for more information.

\ \

Verification

\

\ See the UW/Reg Amplicon track description for verification information.

\ \

Credits

\

\ Data generation, analysis, and validation were performed jointly by groups at\ Regulome Corporation and the University of Washington (UW) in Seattle.

\

\ Regulome Corp.: Michael O.\ Dorschner, Richard Humbert, Peter J. Sabo, Anthony Shafer, Jeff Goldy,\ Molly Weaver, Kristin Lee, Fidencio Neri, Brendan Henry, Mike Hawrylycz, Paul\ Tittel, Jim Wallace, Josh Mack, Janelle Kawamoto, John A. Stamatoyannopoulos.\

\

\ UW Medical\ Genetics: Patrick Navas, Man Yu, Hua Cao, Brent Johnson, Ericka\ Johnson, George Stamatoyannopoulos.

\

\ UW Genome Sciences: \ Scott Kuehn, Robert Thurman, William S. Noble.

\ \

References

\

\ Dorschner, M.O., Hawrylycz, M., Humbert, R., Wallace, J.C., Shafer, A., \ Kawamoto, J., Mack, J., Hall, R., Goldy, J., Sabo, P.J. et al.\ High-throughput localization of functional elements by \ quantitative chromatin profiling.\ Nat Methods 1(3), 219-25 (2004).

\ \ encodeChrom 0 autoScale Off\ chromosomes chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChrom\ longLabel ENCODE UW/Regulome DNaseI hypersensitive sites/scores\ maxHeightPixels 128:16:16\ maxLimit 3\ minLimit 0\ priority 66.2\ shortLabel UW/Reg DNaseI HSs\ track encodeRegulomeProb\ type bedGraph 4\ visibility hide\ encodeRegulomeQuality UW/Reg Plate Q/A bed 5 . ENCODE UW/Regulome Plate Quality Score 0 66.3 0 0 0 127 127 127 1 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX,

Description

\

\ This track provides a visual representation of data quality scores, which range\ from 0 to 1, for each plate in the UW/Regulome experiments. \ It is one of a set of tracks that annotate continuous DNaseI \ sensitivity measurements and DNaseI hypersensitive sites (HSs) over ENCODE \ regions. \ DNaseI has long been used to map general chromatin accessibility and the \ DNaseI "hyperaccessibility" or "hypersensitivity" that is a \ universal feature of active cis-regulatory sequences.\ The data were produced using quantitative chromatin profiling (QCP) \ (Dorschner et al., 2004).

\

\ Quality scores are available on the following cell lines/phenotypes and \ chromosomes:\

\ \ \ \ \ \ \ \ \ \
Cell Line/PhenotypeChromosomes
CACO25, 7, 9, 11, 12, 16, X
GM069902, 5, 7, 8, 9, 11, 12, 16, 18, X
SKNSH5, 7, 9, 11, 12, 16, X
Huh72, 8, 11, 18
HepG211
K56211
Adult Erythroblast11
\

\

\ See the UW/Reg Amplicon track for more information on the cell lines/phenotypes \ studied in these experiments.

\ \

Display Conventions and Configuration

\

\ Plates with scores \ greater than or equal to 0.5 were conservatively considered acceptable for \ reliable scoring of HSs. Scores are shown in greyscale, with darker colors \ indicating higher scores.

\

\ This composite annotation track consists of several subtracks that \ show the quality scores for each cell line/phenotype. \ To show only selected subtracks, uncheck the boxes next to the tracks \ you wish to hide. The display may also be filtered to show only those items \ with unnormalized scores that meet or exceed a certain threshhold. To set a \ threshhold, type the minimum score into the text box at the top of the \ description page.

\

\ Color differences among the subtracks are arbitrary; they provide a\ visual cue for distinguishing the different cell lines/phenotypes.

\ \

Methods

\

\ QCP was performed as described in Dorschner et al. See the UW/Reg\ Amplicon track description for more information.\ QCP assays were formatted into 384-well plates for high-throughput real-time \ PCR. Each plate was treated as a separate experiment. \

\ Plate quality scores were computed using a Support Vector Machine (SVM). \ Trained operators manually scored 500 plates, classifying each on a scale of \ 1 to 5 to rank the degree of experimental noise. The unified set was then used \ to train an SVM to classify and score "good" and "bad" \ plates. Good plates were conservatively assigned noise scores of 1 - 3;\ bad plates received scores of 4 - 5. By performing cross validation on a 90% \ subsample of the training set, the SVM achieved an ROC (receiver\ operating characteristic) score of 0.93.

\ \

Verification

\

\ See the UW/Reg Amplicon track description for verification information.

\ \

Credits

\

\ Data generation, analysis, and validation were performed jointly by groups at\ Regulome Corporation and the University of Washington (UW) in Seattle.

\

\ Regulome Corp.: Michael O.\ Dorschner, Richard Humbert, Peter J. Sabo, Anthony Shafer, Jeff Goldy,\ Molly Weaver, Kristin Lee, Fidencio Neri, Brendan Henry, Mike Hawrylycz, Paul\ Tittel, Jim Wallace, Josh Mack, Janelle Kawamoto, John A. Stamatoyannopoulos.\

\

\ UW Medical\ Genetics: Patrick Navas, Man Yu, Hua Cao, Brent Johnson, Ericka\ Johnson, George Stamatoyannopoulos.

\

\ UW Genome Sciences: \ Scott Kuehn, Robert Thurman, William S. Noble.

\ \

References

\

\ Dorschner, M.O., Hawrylycz, M., Humbert, R., Wallace, J.C., Shafer, A., \ Kawamoto, J., Mack, J., Hall, R., Goldy, J., Sabo, P.J. et al.\ High-throughput localization of functional elements by \ quantitative chromatin profiling.\ Nat Methods 1(3), 219-25 (2004).

\ \ encodeChrom 1 chromosomes chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChrom\ longLabel ENCODE UW/Regulome Plate Quality Score\ maxHeightPixels 128:16:16\ priority 66.3\ shortLabel UW/Reg Plate Q/A\ track encodeRegulomeQuality\ type bed 5 .\ useScore 1\ encodeRegulomeAmplicon UW/Reg Amplicon bed 5 . ENCODE UW/Regulome Amplicon 0 66.4 0 0 0 127 127 127 1 0 10 chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX,

Description

\

\ This track shows a tiling path of PCR amplicons, along with their raw DNaseI\ sensitivity scores, across all ENCODE \ regions. It is one of a set of tracks that annotate continuous DNaseI \ sensitivity measurements and DNaseI hypersensitive sites (HSs) over the ENCODE \ regions. \ DNaseI has long been used to map general chromatin accessibility and the \ DNaseI "hyperaccessibility" or "hypersensitivity" that is a \ universal feature of active cis-regulatory sequences.\ The data were produced using quantitative chromatin profiling (QCP) \ (Dorschner et al., 2004).\

\

\ DNaseI-treated and untreated chromatin samples from the following cell \ lines/phenotypes were studied: \

\ \

Display Conventions and Configuration

\

\ The display is separated into "odd" and "even" amplicons, \ to provide a visually distinct appearance among amplicons, so that\ adjacent amplicons are always in different subtracks. The details page \ for each amplicon reveals its \ start/stop coordinates and its raw DNaseI sensitivity score.\ The score is calculated by the formula (copies in DNaseI-treated / copies \ in DNaseI-untreated) * 1000.

\

\ The graphical display may be filtered to show only those items \ with unnormalized scores that meet or exceed a certain threshhold. To set a \ threshhold, type the minimum score into the text box at the top of the \ description page.

\ \

Methods

\

\ QCP was performed as described in Dorschner et al. \ PCR amplicons of ~250 bp in size were tiled end-to-end across the study regions.\ An amplicon tiling path has been computed over all regions and is available \ through UniSTS.

\

\ Chromatin preparation and DNaseI treatment were\ performed on the cell types list above as described in \ Dorschner et al. High-throughput real-time PCR was used to quantify \ DNaseI at each amplicon by measuring copies remaining in DNaseI-treated \ vs. untreated samples. The results were then analyzed with a \ statistical algorithm to compute the moving baseline of mean DNaseI sensitivity \ and to identify outliers that correspond with DNaseI hypersensitive sites.

\ \

Verification

\

\ QCP measurements were performed in replicate (6X) on pooled biological \ replicate samples. Validation of the results was carried out by conventional \ DNaseI hypersensitivity assays using end-labeling/Southern blotting. A total of \ 1.17 Mb have been evaluated by conventional assay.

\

\ The specificity was defined as the number of true negative evaluable QCP \ amplicons divided by the sum of the true negatives plus false positives. Using \ 246.2 Kb from ENm002, the specificity was calculated to be 0.997. The \ sensitivity of the QCP assay was calculated as the true positives divided by \ the sum of the true positives plus false negatives. The sensitivity measured \ for ENm002 was 0.9487.

\ \

Credits

\

\ Data generation, analysis, and validation were performed jointly by groups at \ Regulome Corporation and the University of Washington (UW) in Seattle.

\

\ Regulome Corp.: Michael O.\ Dorschner, Richard Humbert, Peter J. Sabo, Anthony Shafer, Jeff Goldy, \ Molly Weaver, Kristin Lee, Fidencio Neri, Brendan Henry, Mike Hawrylycz, Paul \ Tittel, Jim Wallace, Josh Mack, Janelle Kawamoto, John A. Stamatoyannopoulos.\

\

\ UW Medical \ Genetics: Patrick Navas, Man Yu, Hua Cao, Brent Johnson, Ericka \ Johnson, George Stamatoyannopoulos.

\

\ UW Genome Sciences: \ Scott Kuehn, Robert Thurman, William S. Noble.

\ \

References

\

\ Dorschner, M.O., Hawrylycz, M., Humbert, R., Wallace, J.C., Shafer, A., \ Kawamoto, J., Mack, J., Hall, R., Goldy, J., Sabo, P.J. et al.\ High-throughput localization of functional elements by \ quantitative chromatin profiling.\ Nat Methods 1(3), 219-25 (2004).

\ \ encodeChrom 1 chromosomes chr2,chr5,chr7,chr8,chr9,chr11,chr12,chr16,chr18,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeChrom\ longLabel ENCODE UW/Regulome Amplicon\ origAssembly hg16\ priority 66.4\ shortLabel UW/Reg Amplicon\ track encodeRegulomeAmplicon\ type bed 5 .\ useScore 1\ visibility hide\ encodeAffyChIpHl60PvalRaraHr02 Affy RARA RA 2h wig 0.0 534.54 Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 2hrs) P-Value 0 67 25 200 0 140 227 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 25,200,0\ longLabel Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 2hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 67\ shortLabel Affy RARA RA 2h\ subGroups factor=RARA time=2h\ track encodeAffyChIpHl60PvalRaraHr02\ anyMrnaCov mRNA/Pseud bed 3 . Blastz Alignments of GenBank mRNA Including Pseudogenes 0 67 170 128 128 212 191 191 0 0 0 rna 1 color 170,128,128\ group rna\ longLabel Blastz Alignments of GenBank mRNA Including Pseudogenes\ priority 67\ shortLabel mRNA/Pseud\ track anyMrnaCov\ type bed 3 .\ visibility hide\ HInvGeneMrna H-Inv psl . H-Invitational Genes mRNA Alignments 0 67.5 0 100 100 127 177 177 0 0 0

Description

\

\ This track shows alignments of full-length cDNAs that were used as the basis \ of the H-Invitational Gene Database (HInv-DB). \ The HInv-DB is a human gene database containing human-curated annotation of \ 41,118 full-length cDNA clones representing 21,037 cDNA clusters.\ The project was initiated in 2002 and the database became publicly\ available in April 2004.

\

\ HInv-DB entries describe the following entities:\

\ \

Methods

\

\ To cluster redundant cDNAs and alternative splicing variants within the H-Inv \ cDNAs, a total of 41,118 H-Inv cDNAs were mapped to the human genome using \ the mapping pipeline developed by the Japan Biological Information Research\ Center (JBIRC). The mapping yielded 40,140 cDNAs that \ were aligned against the genome using the stringent criteria of at least 95% \ identity and 90% length coverage. These 40,140 cDNAs were clustered to 20,190 \ loci, resulting in an average of 2.0 cDNAs per locus. For the remaining 978 \ unmapped cDNAs, cDNA-based clustering was applied, yielding 847 clusters. \ In total, 21,037 clusters (20,190 mapped and 847 unmapped) were identified \ and integrated into H-InvDB. H-Inv cluster IDs (e.g. HIX0000001) were\ assigned to these clusters. A representative sequence was selected from each \ cluster and used for further analyses and annotation.

\

\ A full description of the construction of the HInv-DB is contained in the \ report by the H-Inv Consortium (see References section).

\ \

Credits

\

\ The H-InvDB is hosted at the JBIRC.\ The human-curated annotations were produced during invitational annotation\ meetings held in Japan during the summer of 2002, with a follow-up\ meeting in November 2004. Participants included 158 scientists \ representing 67 institutions from 12 countries.

\

\ The full-length cDNA clones and sequences were produced by the\ Chinese National Human Genome Center (CHGC), \ the Deutsches Krebsforschungszentrum (DKFZ/MIPS), \ Helix Research Institute, Inc. (HRI), \ the Institute of Medical Science in the University of Tokyo (IMSUT), \ the Kazusa DNA Research Institute (KDRI), \ the Mammalian Gene Collection (MGC/NIH) and the\ Full-Length Long Japan (FLJ) project.

\ \

References

\

\ Imanishi, T. et al. \ Integrative annotation of 21,037 human genes validated by full-length cDNA clones.\ PLoS Biol. 2:(6), e162 (2004).

\ rna 1 color 0,100,100\ group rna\ longLabel H-Invitational Genes mRNA Alignments\ priority 67.5\ shortLabel H-Inv\ track HInvGeneMrna\ type psl .\ visibility hide\ encodeAffyChIpHl60SitesRaraHr02 Affy RARA RA 2h bed 3 . Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 2hrs) Sites 0 68 25 200 0 140 227 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 25,200,0\ longLabel Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 2hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 68\ shortLabel Affy RARA RA 2h\ subGroups factor=RARA time=2h\ track encodeAffyChIpHl60SitesRaraHr02\ tigrGeneIndex TIGR Gene Index genePred Alignment of TIGR Gene Index TCs Against the Human Genome 0 68 100 0 0 177 127 127 0 0 0 http://www.tigr.org/tigr-scripts/tgi/tc_report.pl?$$

Description

\

This track displays alignments of the TIGR Gene Index (TGI)\ against the human genome. The TIGR Gene Index is based\ largely on assemblies of EST sequences in the public databases.\ See \ www.tigr.org for more information about TIGR and the Gene Index.

\ \

Credits

\

Thanks to Foo Cheung and Razvan Sultana of the The Institute for Genomic Research, for converting these data into a track for the browser.

\ rna 1 autoTranslate 0\ color 100,0,0\ group rna\ longLabel Alignment of TIGR Gene Index TCs Against the $Organism Genome\ priority 68\ shortLabel TIGR Gene Index\ track tigrGeneIndex\ type genePred\ url http://www.tigr.org/tigr-scripts/tgi/tc_report.pl?$$\ visibility hide\ encodeAffyChIpHl60PvalRaraHr08 Affy RARA RA 8h wig 0.0 534.54 Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 8hrs) P-Value 0 69 25 200 0 140 227 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 25,200,0\ longLabel Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 8hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 69\ shortLabel Affy RARA RA 8h\ subGroups factor=RARA time=8h\ track encodeAffyChIpHl60PvalRaraHr08\ uniGene_2 UniGene bed 12 . UniGene Hs 162 Alignments and SAGEmap Info 0 69 0 0 0 127 127 127 1 0 0

Description

\

\ Serial analysis of gene expression (SAGE)\ is a quantitative measurement of gene expression. Data are presented for every\ cluster contained in the browser window and the selected cluster name is \ highlighted in the table. All data are from the repository at the \ SageMap project\ built on UniGene version Hs 162. Click on a UniGene cluster name on the track\ details page to display SageMap's page for that cluster. Please note that data \ are not available for every cluster. There is no data available for clusters\ that lie entirely within the bounds of larger clusters.

\ \

Methods

\

\ SAGE counts are produced by sequencing small "tags" of DNA believed \ to be associated with a gene. These tags were generated by attaching \ poly-A RNA to oligo-dT beads. After synthesis of double-stranded cDNA, \ transcripts were cleaved by an anchoring enzyme (usually NlaIII). Then, small \ tags were produced by ligation with a linker containing a type IIS restriction\ enzyme site and cleavage with the tagging enzyme (usually BsmFI). The \ tags were concatenated together and sequenced. The frequency of each \ tag was counted and used to infer expression level of transcripts that could\ be matched to that tag.

\ \

Credits

\

\ All SAGE data presented here were mapped to UniGene transcripts by the \ SageMap project at NCBI.

\ \

\ rna 1 group rna\ longLabel UniGene Hs 162 Alignments and SAGEmap Info\ priority 69\ shortLabel UniGene\ spectrum on\ track uniGene_2\ type bed 12 .\ visibility hide\ uniGene_3 UniGene psl UniGene Alignments 0 69 0 0 0 127 127 127 1 0 0 http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=

Description

\

\ This track shows the UniGene genes from NCBI.\ Each UniGene entry is a set of transcript sequences that appear to come from the same transcription locus (gene or expressed pseudogene), together with information on protein similarities, gene expression, cDNA clone reagents, and genomic location. \

\

\ Coding exons are represented by \ blocks connected by horizontal lines representing introns. \ In full display mode, arrowheads \ on the connecting intron lines indicate the direction of transcription.

\ \

Methods

\

\ The UniGene sequence file, Hs.seq.uniq.gz, is downloaded from NCBI.\ Sequences are aligned to base genome using BLAT to create this track.

\

\ When a single UniGene gene aligned in multiple places, \ the alignment having the highest base identity was found. \ Only alignments having a base identity level within 0.2% of the best and \ at least 96.5% base identity with the genomic sequence were kept. \

\

Credits

\

\ Thanks to UniGene for \ providing this annotation. \

\ rna 1 group rna\ longLabel UniGene Alignments\ priority 69\ shortLabel UniGene\ spectrum on\ track uniGene_3\ type psl\ url http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=\ visibility hide\ encodeAffyChIpHl60SitesRaraHr08 Affy RARA RA 8h bed 3 . Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 8hrs) Sites 0 70 25 200 0 140 227 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 25,200,0\ longLabel Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 8hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 70\ shortLabel Affy RARA RA 8h\ subGroups factor=RARA time=8h\ track encodeAffyChIpHl60SitesRaraHr08\ encodeAllElements Consens Elements bed 5 . NHGRI/PSU/UCSC/Stanford TBA and MLAGAN Consensus Conserved Elements 0 70 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ These tracks represent conserved elements detected by any (union) or all\ (intersection) combinations of elements produced by binCons, phastCons, and \ GERP conservation scoring methods applied to TBA and MLAGAN sequence alignments \ of 23 vertebrates in the ENCODE regions. \

\ For more information on the individual subtracks, see the\ description pages for the TBA Elements and MLAGAN Elements tracks.

\ \

Display Conventions and Configuration

\

\ The locations of conserved elements are indicated by blocks in the graphical \ display. The display may be filtered to show only those items \ with unnormalized scores that meet or exceed a certain threshhold. To set a \ threshhold, type the minimum score into the text box at the top of the \ description page. To show only selected subtracks within this annotation, \ uncheck the boxes next to the tracks you wish to hide.

\ \

Methods

\ In these annotations, "non-coding" refers to those regions\ not overlapping with CDS regions in any of the following UCSC Genome Browser\ tables: refFlat, knownGene, mgcGenes, vegaGene, or ensGene.\

\ See the description pages for the TBA Elements and MLAGAN Elements \ for additional information about methods used to generate these data.

\ \

Verification

\

\ See the description pages for the TBA Elements and MLAGAN Elements \ for information about verification techniques used to generate these data.

\ \

Credits

\

\ BinCons and phastCons MCS data were contributed by Elliott Margulies in the \ Eric Green lab at \ NHGRI, with assistance from Adam Siepel of UCSC.

\

\ GERP was developed primarily by Greg Cooper in the lab of \ Arend Sidow \ at Stanford University (Depts of Pathology and Genetics), in close collaboration\ with Eric Stone (Biostatistics, NC State), and George Asimenos and \ Eugene Davydov in the lab of \ Serafim Batzoglou \ (Dept. of Computer Science, Stanford).

\

\ The intersection and union data shown in these subtracks were contributed by\ Elliott Margulies.

\ \

References

\

\ See the TBA/MLAGAN Alignment and TBA/MLAGAN Cons tracks for \ references.

\ encodeCompGeno 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ exonArrows off\ group encodeCompGeno\ longLabel NHGRI/PSU/UCSC/Stanford TBA and MLAGAN Consensus Conserved Elements\ priority 70.0\ shortLabel Consens Elements\ track encodeAllElements\ type bed 5 .\ visibility hide\ uniGene UniGene psl . UniGene Alignments and SAGE Info 0 70 0 0 0 127 127 127 0 0 0 rna 1 group rna\ longLabel UniGene Alignments and SAGE Info\ priority 70\ shortLabel UniGene\ track uniGene\ type psl .\ visibility hide\ encodeTbaAlign TBA Alignment wigMaf 0.0 1.0 NHGRI/PSU TBA Alignments 0 70.1 0 10 100 1 128 0 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays human-centric multiple sequence alignments in the ENCODE \ regions for the 23 vertebrates in the May 2005 ENCODE MSA freeze, based on \ comparative sequence data generated for the ENCODE project.\ The alignments in this track were generated using the\ Threaded Blockset Aligner (TBA).\ A complete list of the vertebrates included in the May 2005 freeze may be found\ at the top of the description page for this track.

\

\ The Genome Browser companion tracks, TBA Cons and TBA Elements, display \ conservation scoring and conserved elements for these alignments based on \ various conservation methods.

\ \

Display Conventions and Configuration

\

\ In full display mode, this track shows pairwise alignments\ of each species aligned to the human genome. \ The alignments are shown in dense display mode using a gray-scale\ density gradient. The checkboxes in the track configuration section allow\ the exclusion of species from the pairwise display.\

\ When zoomed-in to the base-display level, the track shows the base\ composition of each alignment. The numbers and symbols on the "human\ gap" line indicate the lengths of gaps in the human sequence at those\ alignment positions relative to the longest non-human sequence. \ If there is sufficient space in the display, the size of the gap is shown; \ if not, and if the gap size is a multiple of 3, a "*" is displayed, \ otherwise "+" is shown. \ To view detailed information about the\ alignments at a specific position, zoom in the display to 30,000 or fewer \ bases, then click on the alignment.

\ \

Methods

\

\ The TBA was used to align sequences in the May 2005 ENCODE sequence data \ freeze. Multiple alignments were seeded from a series of combinatorial pairwise \ blastz alignments (not referenced to any one species). The specific \ combinations were determined by the\ species guide tree. Additionally, a \ blastz.specs file\ was used to fine-tune the blastz parameters, based on the evolutionary\ distance of the species being compared.\ The resulting multiple alignments were projected onto the human reference\ sequence.

\ \

Credits

\

\ The TBA multiple alignments were created by Elliott Margulies of the \ Green Lab at NHGRI. \

\

\ The programs Blastz and TBA, which were used to generate the alignments, were\ provided by Minmei Hou, Scott Schwartz and Webb Miller of the \ Penn State Bioinformatics \ Group.

\

The phylogenetic tree is based on Murphy et al. (2001) and general\ consensus in the vertebrate phylogeny community.\

\ \

References

\

\ Blanchette, M., Kent, W.J., Reimer, C., Elnitski, L., Smit, A.,\ Roskin, K., Baertsch, R., Rosenbloom, K.R., Clawson, H. et al.\ Aligning Multiple Genomic Sequences With the Threaded Blockset \ Aligner.\ Genome Res 14, 708-15 (2004).

\

\ Chiaromonte, F., Yap, V.B., and Miller, W. \ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D. and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res 13(1):103-7 (2003).

\

Murphy, W.J., et al.\ Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294(5550), 2348-51 (2001).

\ encodeCompGeno 1 altColor 1,128,0\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 0, 10, 100\ dataVersion ENCODE June 2005 Freeze\ group encodeCompGeno\ longLabel NHGRI/PSU TBA Alignments\ priority 70.1\ sGroup_mammal monDom1 platypus\ sGroup_placental rn3 mm6 rabbit cow canFam1 rfbat hedgehog armadillo elephant tenrec\ sGroup_primate panTro1 baboon rheMac1 marmoset galago\ sGroup_vertebrate galGal2 xenTro1 danRer2 tetNig1 fr1\ shortLabel TBA Alignment\ speciesGroups primate placental mammal vertebrate\ summary encodeTbaSummary\ track encodeTbaAlign\ treeImage phylo/hg16_23way.gif\ type wigMaf 0.0 1.0\ visibility hide\ encodeAffyChIpHl60PvalRaraHr32 Affy RARA RA 32h wig 0.0 534.54 Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 32hrs) P-Value 0 71 25 200 0 140 227 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 25,200,0\ longLabel Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 32hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 71\ shortLabel Affy RARA RA 32h\ subGroups factor=RARA time=32h\ track encodeAffyChIpHl60PvalRaraHr32\ rnaCluster Gene Bounds bed 12 . Gene Boundaries as Defined by RNA and Spliced EST Clusters 0 71 200 0 50 227 127 152 0 0 0

Description

\

\ This track shows the boundaries of genes and the direction of\ transcription as deduced from clustering spliced ESTs and mRNAs\ against the genome. When many spliced variants of the same gene exist, \ this track shows the variant that spans the greatest distance in the \ genome.

\ \

Method

\

\ ESTs and mRNAs from \ GenBank were aligned against the genome using BLAT.\ Alignments with less than 97.5% base identity within the aligning blocks \ were filtered out. When multiple alignments occurred, only those\ alignments with a percentage identity within 0.2% of the\ best alignment were kept. The following alignments were also discarded: \ ESTs that aligned without any introns, blocks smaller than 10 bases, and \ blocks smaller than 130 bases that were not located next to an intron. \ The orientations of the ESTs and mRNAs were deduced from the GT/AG splice \ sites at the introns; ESTs and mRNAs with overlapping blocks\ on the same strand were merged into clusters. Only the\ extent and orientation of the clusters are shown in this track.

\

\ Scores for individual gene boundaries were assigned based on the number of \ cDNA alignments used:\

\ \

Credits

\

\ This track, which was originally developed by Jim Kent,\ was generated at UCSC and uses data submitted to GenBank by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. \ GenBank: update. Nucleic Acids Res. \ 2004 Jan 1;32:D23-6.

\

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.

\ rna 1 color 200,0,50\ group rna\ longLabel Gene Boundaries as Defined by RNA and Spliced EST Clusters\ priority 71\ shortLabel Gene Bounds\ track rnaCluster\ type bed 12 .\ visibility hide\ encodeTbaCons TBA Cons wig 0.0 1.0 NHGRI/PSU/UCSC TBA Conservation 0 71 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays different measurements of conservation based on \ the Threaded Blockset Aligner (TBA) multiple sequence alignments of \ ENCODE regions shown in the TBA Alignment track. Three programs — binCons\ (binomial-based conservation method), phastCons (phylogenetic hidden-Markov\ model method), and \ GERP (Genomic Evolutionary Rate Profiling)\ — generated the conservation scoring used to create this track. A related \ track, TBA Elements, shows multi-species conserved sequences (MCSs) based on\ the conservation measurements displayed in this track.

\

\ For details on the conservation scores generated by each program, refer to the \ individual Methods subsections.

\ \

Display Conventions and Configuration

\

\ The subtracks within this composite annotation track, which\ show data from the binCons, phastCons and GERP programs, may be configured in a \ variety of ways to highlight different aspects of the \ displayed data. The graphical configuration options \ are shown at the top of the track description page, followed by a list of \ the subtracks. A subtrack may be hidden from view by checking the box to the\ left of the track name in the list. For more information about the \ graphical configuration options, click the \ Graph\ configuration help link.

\

\ Color differences among the subtracks are arbitrary; they provide a\ visual cue for distinguishing the different gene prediction methods. See the\ Methods section for display information specific to each subtrack.

\ \

Methods

\

\ The methods used to create the TBA alignments in the ENCODE\ regions are described in the TBA Alignment track description.

\ \

BinCons

\

\ The binCons score is based on the cumulative binomial probability of \ detecting the observed number of identical bases (or greater) in \ sliding 25 bp windows (moving one bp at a time) between the \ reference sequence and each other species, given the neutral rate\ at four-fold degenerate sites. Neutral rates are calculated\ separately at each targeted region. For targets with no gene annotations,\ the average percent identity across all alignable sequence was instead used\ to weight the individual species binomial scores (this latter\ weighting scheme was found to closely match 4D weights).

\

\ The negative log of these P-values was then averaged across all \ human-referenced pairwise combinations, and the highest scoring overlapping \ 25 bp window for each base was the resulting score. This track shows the \ plotting of a ranked percentile score normalized between 0 and 1 across all \ ENCODE regions, such that the top 5% most conserved sequence across all ENCODE\ regions have a score of 0.95 or greater (top 10% have a score of 0.9 or \ greater, and so on).

\

\ BinCons scores were normalized to represent a percentile to the power of\ 10. For example, scores representing the top 1 percent most conserved\ sequence, 99th percentile, have a score greater than or equal to 0.99^10\ = 0.904. Transforming scores to the power of 10 was done for visual\ purposes only, in order to accentuate and distinguish the peaks of more\ highly conserved regions.

\

\ More details on binCons can be found in Margulies et. al. (2003)\ cited below.

\ \

PhastCons

\

\ The phastCons program predicts conserved elements and produces base-by-base\ conservation scores using a two-state phylogenetic hidden Markov model.\ The model consists of a state for conserved regions and a \ state for nonconserved regions, each of which is associated with a \ phylogenetic model. These two models are identical\ except that the branch lengths of the conserved phylogeny are \ multiplied by a scaling parameter rho (0 < rho < 1).

\

\ For determining the conservation for the ENCODE TBA\ alignments, the nonconserved model was estimated \ from four-fold degenerate coding sites within the ENCODE regions using \ the program phyloFit. The parameter rho was then estimated by \ maximum likelihood, conditional on the nonconserved model, using the EM \ algorithm implemented in phastCons. Parameter estimation was based on \ a single large alignment, constructed by concatenating the \ alignments for all conserved regions.

\

\ PhastCons was run with the options --expected-lengths 15 and\ --target-coverage 0.05 to obtain the desired level of \ "smoothing" and a final coverage by conserved elements of 5%.

\

\ The conservation score at each base is the posterior probability that the\ base was generated by the conserved state of the phylo-HMM. It can\ be interpreted as the probability that the base is in a conserved\ element, given the assumptions of the model and the estimated parameters.\ Scores range from 0 to 1, with higher scores corresponding to\ higher levels of conservation.

\

\ More details on phastCons can be found in Siepel et. al. (2005)\ cited below.

\ \

GERP

\

\ The GERP score is the expected substitution rate divided by\ the observed substitution rate at a particular human base.\ Scores are estimated on a column-by-column basis using multiple sequence\ alignments of mammalian genomic DNA generated by MLAGAN.\ The scores range from 0 to 3; those greater than 3 are clipped to 3. \ The expected and observed rates are\ both calculated on a phylogenic tree using the same fixed topology.\ The branch lengths of the expected tree are based on the average\ substitutions at neutral sites. The branch lengths of the observed\ tree, which is calculated separately for each human base, are based on the\ substitutions seen at the column of the\ multiple alignment at that base. Species that have gaps at\ a particular column are not considered in the scoring for that column.

\

\ Higher scores correspond to human\ bases in alignment columns with higher degrees of similarity, i.e.\ bases that have evolved slowly, some of which have been under purifying \ selection. The opposite holds true for swiftly evolving (low similarity) \ columns.

\

\ Scores are deterministic, given a maximum-likelihood model of\ nucleotide substitution, species topology, neutral tree, and alignment.

\ \

Credits

\

\ BinCons was developed by Elliott Margulies of the \ Eric Green lab at \ NHGRI.

\

\ PhastCons was developed by Adam Siepel in the \ Haussler lab at UCSC.

\

\ GERP was developed primarily by Greg Cooper in the lab of\ Arend Sidow\ at Stanford University\ (Depts of Pathology and Genetics), in close collaboration with\ Eric Stone (Biostatistics, NC State), and George Asimenos and\ Eugene Davydov in the lab of\ Serafim Batzoglou\ (Dept. of Computer Science, Stanford).

\

\

\ TBA was provided by Minmei Hou, Scott Schwartz and Webb Miller of the \ Penn State Bioinformatics \ Group.

\

\ The data for this track were generated by Elliott Margulies, \ with assistance from Adam Siepel.

\ \

References

\

\ Blanchette, M., Kent, W.J., Reimer, C., Elnitski, L., Smit, A.,\ Roskin, K., Baertsch, R., Rosenbloom, K.R., Clawson, H. et al.\ Aligning Multiple Genomic Sequences With the Threaded Blockset \ Aligner.\ Genome Res 14, 708-15 (2004).

\

\ Cooper, G.M., Stone, E.A., Asimenos, G., NISC Comparative Sequencing Program,\ Green, E.D., Batzoglou, S. and Sidow, A.\ Distribution and intensity of constraint in mammalian genomic \ sequence.\ Genome Res. 15, 901-13 (2005).

\

\ Margulies, E.H., Blanchette, M., NISC Comparative Sequencing Program, \ Haussler, D. and Green, E.D. \ Identification and characterization of multi-species conserved \ sequences. \ Genome Res. 13, 2507-18 (2003).

\

\ Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M.,\ Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W. et al.\ Evolutionarily conserved elements in vertebrate,\ insect, worm, and yeast genomes. \ Genome Res. 15, 1034-1050 (2005).

\ encodeCompGeno 0 autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeCompGeno\ longLabel NHGRI/PSU/UCSC TBA Conservation\ maxHeightPixels 100:25:11\ priority 71.0\ shortLabel TBA Cons\ track encodeTbaCons\ type wig 0.0 1.0\ visibility hide\ windowingFunction mean\ encodeAffyChIpHl60SitesRaraHr32 Affy RARA RA 32h bed 3 . Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 32hrs) Sites 0 72 25 200 0 140 227 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 25,200,0\ longLabel Affymetrix ChIP/Chip (RARA retinoic acid-treated HL-60, 32hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 72\ shortLabel Affy RARA RA 32h\ subGroups factor=RARA time=32h\ track encodeAffyChIpHl60SitesRaraHr32\ genieBounds Clone Bounds bed 9 . Clone Boundaries from EST Mate Pairs 0 72 178 34 34 216 144 144 0 0 0

Description & Credits

\ \

These clone bounds are based on EST mate pairs from \ Affymetrix's \ Genie gene finding software. \

\ rna 1 color 178,34,34\ group rna\ longLabel Clone Boundaries from EST Mate Pairs\ priority 72\ shortLabel Clone Bounds\ track genieBounds\ type bed 9 .\ visibility hide\ exonWalk ExonWalk genePred ExonWalk Alt-Splicing Transcripts 0 72 23 58 58 139 156 156 0 0 0

Description

\ \

The ExonWalk program merges cDNA evidence together to predict full\ length isoforms, including alternative transcripts. To predict\ transcripts that are biologically functional, rather than the result\ of technical or biological noise, ExonWalk requires that every intron\ and exon be either: 1) Present in cDNA libraries of another organism\ (i.e. also present in mouse), 2) Have three separate cDNA GenBank\ entries supporting it, or 3) Be evolving like a coding exon as\ determined by Exoniphy.\ Once the transcripts are predicted an ORF finder (BESTORF from\ Softberry) is used to find the\ best open reading frame. By default transcripts that are targets for\ nonsense mediated decay (NMD) are filtered out as they are less likely\ to be translated into proteins.\ \

Methods

\ \

The input to the ExonWalk program is the AltSplice track which has\ filtered out exons and introns that are not: 1) Present in cDNA\ libraries of another organism (i.e. also present in mouse), 2) Have\ three separate cDNA GenBank entries supporting it, or 3) Be evolving\ like a coding exon as determined by Exoniphy.\ \

The ExonWalk algorithm takes these filtered sequences and\ constructs a graph where the exons are the nodes and the introns are\ the edges. The goal of the program is to produce all full length\ transcripts implied by the transcripts. Full length transcripts are\ defined as transcripts that are not a subsequences of another\ transcript. The stages of the algorithm can be divided into three\ steps as illustrated in Figure 1 below:\ \

    \
  1. Detection and connection of compatible transcripts (Figure 1B).
  2. \
  3. Merging of vertices that are identical in terms of splicing (Figure 1C).
  4. \
  5. Exploration of all paths in the resulting graph (Figure 1D).
  6. \
\ \ \ \
\ \
\ Different stages of the ExonWalk Program. A. Different\ transcripts for a particular gene have been aligned to the genome to\ give an order and orientation. B. Exons in the overlapping\ section of compatible transcripts are joined to form new\ edges. C. Vertices which are redundant are pruned from the\ graph, being replaced by edges from other, equivalent, vertices. This\ simplifies the initial graph and yet retains splicing specific\ information. D. The maximal paths through the graph are\ explored to produce a set of maximal (full length) transcripts.\
\ \

Initially each each transcript is an independent sub-graph in the\ exon graph. Individual transcripts are then compared pairwise to\ determine if they are compatible. If they are compatible, an edge is\ created between exons of the overlap, called a compatibility edge.\ This results in a directed graph where overlapping exons are connected\ together, and thus compatible transcripts have been connected as well\ (Figure 1B). The algorithm then makes use of the\ implicit order provided by the genome sequence and the fact that\ splicing occurs in order to explore all of the paths present in the\ graph.\ \

Comments/Questions? Email sugnet@soe.ucsc.edu\ genes 1 color 23,58,58\ group genes\ longLabel ExonWalk Alt-Splicing Transcripts\ priority 72\ shortLabel ExonWalk\ track exonWalk\ type genePred\ visibility hide\ exonWalk2 ExonWalk2 genePred ExonWalk Alt-Splicing Transcripts - take 2 0 72.01 23 58 58 139 156 156 0 0 0 genes 1 color 23,58,58\ group genes\ longLabel ExonWalk Alt-Splicing Transcripts - take 2\ priority 72.01\ shortLabel ExonWalk2\ track exonWalk2\ type genePred\ visibility hide\ exonWalkRna ExonWalkRna genePred ExonWalk Alt-Splicing Transcripts mRNA only, no orthology 0 72.02 23 58 58 139 156 156 0 0 0 genes 1 color 23,58,58\ group genes\ longLabel ExonWalk Alt-Splicing Transcripts mRNA only, no orthology\ priority 72.02\ shortLabel ExonWalkRna\ track exonWalkRna\ type genePred\ visibility hide\ exonWalkRnaNoCds ExonWalkRnaNoCds bed 12 Exonwalk on Rna only, no orthology, no CDS mapping 0 72.03 23 58 58 139 156 156 0 0 0 genes 1 color 23,58,58\ group genes\ longLabel Exonwalk on Rna only, no orthology, no CDS mapping\ priority 72.03\ shortLabel ExonWalkRnaNoCds\ track exonWalkRnaNoCds\ type bed 12\ visibility hide\ agxMapped agxMapped bed 12 . Condensed version of AltGraphX Mapped from Mouse 0 72.1 153 26 42 204 140 148 0 0 0 rna 1 color 153,26,42,\ group rna\ longLabel Condensed version of AltGraphX Mapped from Mouse\ priority 72.1\ shortLabel agxMapped\ track agxMapped\ type bed 12 .\ visibility hide\ orthoIntrons orthoIntrons bed 12 . Bed version of AltGraphX Mapped Inrons from Mouse 0 72.2 107 74 34 181 164 144 0 0 0 rna 1 color 107,74,34,\ group rna\ longLabel Bed version of AltGraphX Mapped Inrons from Mouse\ priority 72.2\ shortLabel orthoIntrons\ track orthoIntrons\ type bed 12 .\ visibility hide\ encodeAffyChIpHl60PvalSirt1Hr00 Affy SIRT1 RA 0h wig 0.0 534.54 Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 0hrs) P-Value 0 73 0 225 0 127 240 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 0,225,0\ longLabel Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 0hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 73\ shortLabel Affy SIRT1 RA 0h\ subGroups factor=SIRT1 time=0h\ track encodeAffyChIpHl60PvalSirt1Hr00\ altGraph AltGraph psl . AltGraph 0 73 0 0 0 127 127 127 0 0 0 rna 1 group rna\ longLabel AltGraph\ priority 73\ shortLabel AltGraph\ track altGraph\ type psl .\ visibility hide\ encodeTbaElements TBA Elements bed 5 . NHGRI/PSU/UCSC TBA Conserved Elements 0 73 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays multi-species conserved sequences (MCSs)\ derived from binCons, phastCons, and genomic evolutionary rate profiling \ (GERP) conservation scoring\ of Threaded Blockset Aligner (TBA) multiple sequence alignments in the \ ENCODE regions. The combined-methods subtracks show the union/intersection\ of conserved elements produced by the three conservation methods.\

\ The multiple sequence alignments may be viewed in the TBA Alignment\ track. Another related track, TBA Cons, shows the conservation scoring. \ The descriptions accompanying these tracks detail\ the methods used to create the alignments and conservation.

\ \

Display Conventions and Configuration

\

\ The locations of conserved elements are indicated by blocks in the graphical\ display. This composite annotation track consists of several subtracks that\ show conserved elements derived by the three methods listed above, as well as \ both unions and intersections of the sets of conserved and non-coding conserved \ elements. To show only selected subtracks, uncheck the boxes next to the tracks\ you wish to hide.

\

The display may also be filtered to show only those items\ with unnormalized scores that meet or exceed a certain threshhold. To set a\ threshhold, type the minimum score into the text box at the top of the \ description page.

\

\ Display characteristics specific to certain subtracks are described in the\ respective Methods sections below.

\ \

Methods

\

\

BinCons-based Elements

\

\ For each ENCODE target, a conservation score threshold was picked to match\ the number of conserved bases predicted by phastCons, an alternative method\ for measuring conservation. This latter method has been found slightly more\ reliable for predicting the expected fraction of conserved sequence \ in each target. Clusters of bases\ that exceeded the given conservation score threshold were designated \ as MCSs. The minimum length of an MCS is 25\ bases. Strict cutoffs were used: if even one base fell below the\ conservation score threshold, it separated an MCS into two distinct\ regions.

\ \

PhastCons-based Elements

\

\ The predicted MCSs are segments of the alignment that are likely to\ have been "generated" by the conserved state of the phylo-HMM,\ i.e., maximal segments in which the maximum-likelihood (Viterbi)\ path remains in the conserved state.

\ \

GERP-based Elements

\

\ GERP elements are scored according to the inferred intensity\ of purifying selection\ and are measured as "rejected substitutions" (RSs). RSs capture the\ magnitude of difference between the number of "observed" substitutions\ (estimated using maximum likelihood) and the number that would be\ "expected" under a neutral model of evolution. \ The RS is displayed as part of the item name.\ Items with higher RSs are displayed in a darker shade of blue. The score shown \ on the details page, which has been scaled by 300 for display purposes, is \ generally not as accurate as the RS count that is part of the item name.

\

\ "Constrained elements" are identified as those groups\ of consecutive human bases that have an observed rate of evolution that is\ smaller than the expected rate. These groups of columns are merged if they\ are less than a few nucleotides apart and are scored according to the sum of\ the site-by-site difference between observed and expected rates (RS).

\

\ Permutations of the actual alignments were analyzed, and the "constrained\ elements" identified in these permuted alignments were treated as\ "false positives". Subsequently, an RS threshold was picked such\ that the total length of "false positive" constrained elements\ (identified in the permuted alignments) was less than 5% of the length of\ constrained elements identified in the actual alignment.\ Thus, all annotated constrained elements are significant at better\ than 95% confidence, and the total fraction of the ENCODE regions\ annotated as constrained is 5-7%.

\ \

PhastCons/BinCons/GERP Union/Intersection of Conserved Elements

\

\ These subtracks were produced by creating unions and intersections of the\ constrained element data detected by binCons, phastCons, and GERP on TBA \ alignments. In these annotations, "non-coding" is defined as those \ regions not overlapping with CDS regions in any of the following UCSC gene \ tables: refFlat, knownGene, mgcGenes, vegaGene, or ensGene.

\ \

Credits

\

\ BinCons and phastCons MCS data were contributed by Elliott Margulies in the \ Eric Green lab at \ NHGRI, with assistance from Adam Siepel of UCSC.

\

\ GERP was developed primarily by Greg Cooper in the lab of\ Arend Sidow\ at Stanford University (Depts of Pathology and Genetics), in close collaboration\ with Eric Stone (Biostatistics, NC State), and George Asimenos and\ Eugene Davydov in the lab of\ Serafim Batzoglou\ (Dept. of Computer Science, Stanford).

\

\ TBA was provided by Minmei Hou, Scott Schwartz and Webb Miller of the \ Penn State Bioinformatics \ Group.

\ \

References

\

\ See the TBA Alignment and TBA Cons tracks for references.

\ encodeCompGeno 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ exonArrows off\ group encodeCompGeno\ longLabel NHGRI/PSU/UCSC TBA Conserved Elements\ priority 73.0\ shortLabel TBA Elements\ track encodeTbaElements\ type bed 5 .\ visibility hide\ encodeAffyChIpHl60SitesSirt1Hr00 Affy SIRT1 RA 0h bed 3 . Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 0hrs) Sites 0 74 0 225 0 127 240 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 0,225,0\ longLabel Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 0hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 74\ shortLabel Affy SIRT1 RA 0h\ subGroups factor=SIRT1 time=0h\ track encodeAffyChIpHl60SitesSirt1Hr00\ altGraphX Alt-Splicing altGraphX Alternative Splicing from ESTs/mRNAs 0 74 0 0 0 127 127 127 0 0 0

Description

\

\ This track summarizes alternative splicing shown in the mRNA and\ EST tracks. The blocks represent exons; lines indicate possible\ splice junctions. The graphical display is drawn such that no exons\ overlap, making alternative events easier to view when the track\ is in full display mode and the resolution is set to approximately gene-level.\

\

\ To help reduce the noise present in the EST libraries, \ exons and splice junctions are filtered based on orthologous mouse\ transcripts and the frequency with which an exon or intron appears in human\ transcript libraries. Only those exons and splice junctions that have\ an orthologous exon or splice junction in the mouse\ transcriptome or are present three or more times in the human transcriptome are \ kept. Transcripts labeled as mRNA in GenBank are weighted more heavily, \ reflecting their typically higher quality. This process is similar\ to that presented in Sugnet, C.W. et al.,\ Transcriptome and genome conservation of alternative splicing \ events in humans and mice. \ Pacific Symposium on Biocomputing (PSB) 2004 Online Proceedings.

\ \

Methods

\

\ The splicing graphs for each genome were generated separately\ from their native EST and mRNA transcripts using the following\ process: \

\

\ After the splicing graphs were constructed independently for both\ human and mouse, they were mapped to each other using the entire set of\ genome mouse net alignments (viewable on the browser as\ the Mouse Net track). Only those exons and splice junctions that were common \ to both or occurred three or more times in the human transcript were kept \ in the splicing graph. When counting the number of times an\ exon or splice junction was included in the human transcripts, those\ designated as mRNA were weighted more heavily than those designated as \ EST.

\ \

References

\

\ For more information on the mouse net alignments, see \ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: \ Duplication, deletion, and rearrangement in the mouse and human genomes. \ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\ \

Credits

\

\ This annotation was generated by Chuck \ Sugnet of the UCSC Genome Bioinformatics Group.

\ rna 1 group rna\ longLabel Alternative Splicing from ESTs/mRNAs\ priority 74\ shortLabel Alt-Splicing\ track altGraphX\ type altGraphX\ visibility hide\ encodeMlaganAlign MLAGAN Alignment wigMaf 0.0 1.0 Stanford MLAGAN Alignments 0 74 0 10 100 1 128 0 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays human-centric multiple sequence alignments in the \ ENCODE regions for the 23 vertebrates included in the May 2005 ENCODE MSA \ freeze, based on comparative sequence data generated for the ENCODE project. \ The alignments in this track were generated using the \ LAGAN Alignment Toolkit.\ A complete list of the vertebrates included in the May 2005 freeze may be found\ at the top of the description page for this track.

\

\ The Genome Browser companion tracks, MLAGAN Cons and MLAGAN Elements, \ display conservation scoring and conserved elements for these alignments based \ on various conservation methods.

\ \

Display Conventions and Configuration

\

\ In full display mode, this track shows pairwise alignments\ of each species aligned to the human genome.\ The alignments are shown in dense display mode using a gray-scale\ density gradient. The checkboxes in the track configuration section allow\ the exclusion of species from the pairwise display.\

\ When zoomed-in to the base-display level, the track shows the base\ composition of each alignment. The numbers and symbols on the "human\ gap" line indicate the lengths of gaps in the human sequence at those\ alignment positions relative to the longest non-human sequence. \ If there is sufficient space in the display, the size of the gap is shown; \ if not, and if the gap size is a multiple of 3, a "*" is displayed, \ otherwise "+" is shown. \ To view detailed information about the\ alignments at a specific position, zoom in the display to 30,000 or fewer\ bases, then click on the alignment.

\ \

Methods

\

\ To create the alignments, the sequence of each non-human species was first \ "rearranged" to be orthologously collinear with respect to the human \ sequence. The rearrangements were generated using a suite of tools and \ algorithms based on Shuffle-LAGAN and SuperMap.\ For each pairing of human sequence with that of another species, Shuffle-LAGAN \ was used to find the best-scoring chain of local similarities according to \ a scoring scheme that penalized evolutionary rearrangements. SuperMap was then\ used to aggregate parts of the chain into a human-monotonic map of syntenic \ blocks. This mapping was used to undo the genomic rearrangements of the other \ sequence and convert it to a form that was directly alignable to the human \ sequence.\

\ A multiple global alignment was created for every region using \ MLAGAN. The alignments \ were then refined using \ MUSCLE, which \ processes small non-overlapping windows of an alignment and attempts to realign \ them in an iterative fashion, keeping the refined alignment\ if it has a better sum-of-pairs score than the original.

\ \

Credits

\

\ The MLAGAN alignments were generated by George Asimenos from Stanford's \ ENCODE group.

\

\ Shuffle-LAGAN, SuperMap and MLAGAN were written by Mike Brudno.

\

\ MUSCLE was authored by Bob Edgar.

\

The phylogenetic tree is based on Murphy et al. (2001) and general\ consensus in the vertebrate phylogeny community.\

\ \

References

\

\ Brudno, M., Do, C., Cooper, G., Kim, M.F., Davydov, E., Green, E.D., Sidow, A.\ and Batzoglou, S. \ LAGAN and Multi-LAGAN: efficient tools for large-scale multiple \ alignment of genomic DNA.\ Genome Res. 13(4), 721-31 (2003).

\

\ Brudno, M., Malde, S., Poliakov, A., Do, C., Courone, O., Dubchak, I. and \ Batzoglou, S.\ Glocal alignment: finding rearrangements during alignment.\ Bioinformatics 19(Suppl. 1), i54-i62 (2003).

\

\ Edgar, R.C. \ MUSCLE: multiple sequence alignment with high\ accuracy and high throughput.\ Nucl. Acids Res. 32(5), 1792-97 (2004).

\

\ Murphy, W.J., et al.\ Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294(5550), 2348-51 (2001).

\ encodeCompGeno 1 altColor 1,128,0\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 0, 10, 100\ dataVersion ENCODE June 2005 Freeze\ group encodeCompGeno\ longLabel Stanford MLAGAN Alignments\ maxHeightPixels 100:40:11\ priority 74.0\ sGroup_mammal opossum platypus\ sGroup_placental rat mouse rabbit cow dog rfbat hedgehog armadillo elephant tenrec\ sGroup_primate chimp baboon rhesus marmoset galago\ sGroup_vertebrate chicken x_tropicalis zebrafish tetraodon fugu\ shortLabel MLAGAN Alignment\ speciesGroups primate placental mammal vertebrate\ summary encodeMlaganSummary\ track encodeMlaganAlign\ treeImage phylo/hg16_23way.gif\ type wigMaf 0.0 1.0\ visibility hide\ altGraphX2 Alt-Splicing2 altGraphX Alternative Splicing from ESTs/mRNAs - test take 2 0 74.1 0 0 0 127 127 127 0 0 0 rna 1 group rna\ longLabel Alternative Splicing from ESTs/mRNAs - test take 2\ priority 74.1\ shortLabel Alt-Splicing2\ track altGraphX2\ type altGraphX\ visibility hide\ encodeMlaganCons MLAGAN Cons wig 0.0 1.0 Stanford MLAGAN Conservation 0 74.1 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays different measurements of conservation based on\ the MLAGAN multiple sequence alignments of ENCODE regions \ shown in the MLAGAN Alignment track. Three programs — binCons\ (binomial-based conservation method), phastCons (phylogenetic hidden-Markov\ model method), and\ GERP (Genomic Evolutionary Rate Profiling)\ — generated the conservation scoring used to create this track. A related \ track, MLAGAN Elements, shows multi-species conserved sequences (MCSs) based on\ the conservation measurements displayed in this track.

\

\ For details on the conservation scores generated by each program, refer to the \ individual Methods subsections.

\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ tracks. The subtracks within this annotation may be configured in a variety of \ ways to highlight different aspects of the displayed data. The graphical \ configuration options are shown at the top of the track description page, \ followed by a list of subtracks. To display only selected subtracks, uncheck \ the boxes next to the tracks you wish to hide. For more information about the \ graphical configuration options, click the \ Graph configuration \ help link.

\

\ Color differences among the subtracks are arbitrary; they provide a\ visual cue for distinguishing the different gene prediction methods. See the\ Methods section for display information specific to each subtrack.

\ \

Methods

\

\ The methods used to create the MLAGAN alignments in the ENCODE\ regions are described in the MLAGAN Alignment track description.

\ \

BinCons

\

\ The binCons score is based on the cumulative binomial probability of\ detecting the observed number of identical bases (or greater) in\ sliding 25 bp windows (moving one bp at a time) between the\ reference sequence and each other species, given the neutral rate\ at four-fold degenerate sites. Neutral rates are calculated\ separately at each targeted region. For targets with no gene annotations,\ the average percent identity across all alignable sequence was instead used\ to weight the individual species binomial scores (this latter\ weighting scheme was found to closely match 4D weights).

\

\ The negative log of these P-values was then averaged across all\ human-referenced pairwise combinations, and the highest scoring overlapping\ 25 bp window for each base was the resulting score. This track shows the\ plotting of a ranked percentile score normalized between 0 and 1 across all\ ENCODE regions, such that the top 5% most conserved sequence across all ENCODE\ regions have a score of 0.95 or greater (top 10% have a score of 0.9 or\ greater, and so on).

\

\ BinCons scores were normalized to represent a percentile to the power of\ 10. For example, scores representing the top 1 percent most conserved\ sequence, 99th percentile, have a score greater than or equal to 0.99^10\ = 0.904. Transforming scores to the power of 10 was done for visual\ purposes only, in order to accentuate and distinguish the peaks of more\ highly conserved regions.

\

\ More details on binCons can be found in Margulies et. al. (2003)\ cited below.

\ \

PhastCons

\

\ The phastCons program predicts conserved elements and produces base-by-base\ conservation scores using a two-state phylogenetic hidden Markov model.\ The model consists of a state for conserved regions and a\ state for nonconserved regions, each of which is associated with a\ phylogenetic model. These two models are identical\ except that the branch lengths of the conserved phylogeny are\ multiplied by a scaling parameter rho (0 < rho < 1).

\

\ For determining the conservation for the ENCODE MLAGAN\ alignments, the nonconserved model was estimated\ from four-fold degenerate coding sites within the ENCODE regions using\ the program phyloFit. The parameter rho was then estimated by\ maximum likelihood, conditional on the nonconserved model, using the EM\ algorithm implemented in phastCons. Parameter estimation was based on\ a single large alignment, constructed by concatenating the\ alignments for all conserved regions.

\

\ PhastCons was run with the options --expected-lengths 15 and\ --target-coverage 0.05 to obtain the desired level of\ "smoothing" and a final coverage by conserved elements of 5%.

\

\ The conservation score at each base is the posterior probability that the\ base was generated by the conserved state of the phylo-HMM. It can\ be interpreted as the probability that the base is in a conserved\ element, given the assumptions of the model and the estimated parameters.\ Scores range from 0 to 1, with higher scores corresponding to\ higher levels of conservation.

\

\ More details on phastCons can be found in Siepel et. al. (2005)\ cited below.

\ \

GERP

\

\ The GERP score is the expected substitution rate divided by\ the observed substitution rate at a particular human base.\ Scores are estimated on a column-by-column basis using multiple sequence\ alignments of mammalian genomic DNA generated by MLAGAN.\ The scores range from 0 to 3; those greater than 3 are clipped to 3. \ The expected and observed rates are\ both calculated on a phylogenic tree using the same fixed topology.\ The branch lengths of the expected tree are based on the average\ substitutions at neutral sites. The branch lengths of the observed\ tree, which is calculated separately for each human base, are based on the\ substitutions seen at the column of the\ multiple alignment at that base. Species that have gaps at\ a particular column are not considered in the scoring for that column.

\

\ Higher scores correspond to human\ bases in alignment columns with higher degrees of similarity, i.e.\ bases that have evolved slowly, some of which have been under purifying \ selection. The opposite holds true for swiftly evolving (low similarity) \ columns.

\

\ Scores are deterministic, given a maximum-likelihood model of \ nucleotide substitution, species topology, neutral tree, and alignment.

\ \

Credits

\

\ BinCons was developed by Elliott Margulies of the \ Eric Green lab at \ NHGRI.

\

\ PhastCons was developed by Adam Siepel in the \ Haussler lab at UCSC.

\

\ GERP was developed primarily by Greg Cooper in the lab of \ Arend Sidow \ at Stanford University\ (Depts of Pathology and Genetics), in close collaboration with \ Eric Stone (Biostatistics, NC State), and George Asimenos and \ Eugene Davydov in the lab of \ Serafim Batzoglou \ (Dept. of Computer Science, Stanford).

\

\ The data for this track were generated by Elliott Margulies, with assistance \ from Adam Siepel.

\ \

References

\

\ Cooper, G.M., Stone, E.A., Asimenos, G., NISC Comparative Sequencing Program, \ Green, E.D., Batzoglou, S. and Sidow, A. \ Distribution and intensity of constraint in mammalian genomic \ sequence. \ Genome Res.. 15(7), 901-13 (2005).

\

\ Margulies, E.H., Blanchette, M., NISC Comparative Sequencing Program,\ Haussler, D. and Green, E.D.\ Identification and characterization of multi-species conserved\ sequences.\ Genome Res. 13, 2507-18 (2003).

\

\ Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M.,\ Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W. et al.\ Evolutionarily conserved elements in vertebrate,\ insect, worm, and yeast genomes.\ Genome Res. 15, 1034-50 (2005).

\ encodeCompGeno 0 autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeCompGeno\ longLabel Stanford MLAGAN Conservation\ maxHeightPixels 100:25:11\ priority 74.1\ shortLabel MLAGAN Cons\ track encodeMlaganCons\ type wig 0.0 1.0\ visibility hide\ windowingFunction mean\ encodeMlaganElements MLAGAN Elements bed 5 . Stanford MLAGAN Conserved Elements 0 74.2 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track displays multi-species conserved sequences (MCSs) derived from \ binCons, phastCons, and genomic evolutionary rate profiling \ (GERP) conservation \ scoring of human ENCODE genomic DNA alignments to 22 other vertebrates using the \ MLAGAN alignment package. \ The combined-methods subtracks show the union/intersection of conserved elements\ produced by the three conservation methods.\

\ The multiple sequence alignments may be viewed in\ the MLAGAN Alignments track. Another related track, MLAGAN Cons, shows \ the conservation scoring. The descriptions accompanying these tracks detail the \ methods used to create the alignments and conservation.

\ \

Display Conventions and Configuration

\

\ The locations of conserved elements are indicated by blocks in the graphical \ display. This composite annotation track consists of several subtracks that \ show conserved elements derived by the three methods listed above, as well as \ both unions and intersections of the sets of coding and noncoding conserved \ elements. To show only selected subtracks, uncheck the boxes next to the tracks \ you wish to hide. The display may also be filtered to show only those items \ with unnormalized scores that meet or exceed a certain threshhold. To set a \ threshhold, type the minimum score into the text box at the top of the \ description page.

\

\ Display characteristics specific to certain subtracks are described in the\ respective Methods sections below.

\ \

Methods

\ \

BinCons-based Elements

\

\ For each ENCODE target, a conservation score threshold was picked to match\ the number of conserved bases predicted by phastCons, an alternative method\ for measuring conservation. This latter method has been found slightly more\ reliable for predicting the expected fraction of conserved sequence \ in each target. Clusters of bases\ that exceeded the given conservation score threshold were designated \ as MCSs. The minimum length of an MCS is 25\ bases. Strict cutoffs were used: if even one base fell below the\ conservation score threshold, it separated an MCS into two distinct\ regions.

\ \

PhastCons-based Elements

\

\ The predicted MCSs are segments of the alignment that are likely to\ have been "generated" by the conserved state of the phylo-HMM,\ i.e., maximal segments in which the maximum-likelihood (Viterbi)\ path remains in the conserved state.

\ \

GERP-based Elements

\

\ GERP constrained elements exhibit significant evidence of the effects of \ purifying selection. \ Elements are scored according to the inferred intensity of purifying selection \ and are measured as "rejected substitutions" (RSs). RSs capture the \ magnitude of difference between the number of "observed" substitutions\ (estimated using maximum likelihood) and the number that would be \ "expected" under a neutral model of evolution. \ The RS is displayed as part of the item name. Items with\ higher RSs are displayed in a darker shade of blue. The score shown on\ the details page, which has been scaled by 300 for display purposes, is \ generally not as accurate as the RS count that is part of the item name.

\

\ "Constrained elements" are identified as those groups\ of consecutive human bases that have an observed rate of evolution that is\ smaller than the expected rate. These groups of columns are merged if they\ are less than a few nucleotides apart and are scored according to the sum of\ the site-by-site difference between observed and expected rates (RS).

\

\ Permutations of the actual alignments were analyzed, and the "constrained\ elements" identified in these permuted alignments were treated as\ "false positives". Subsequently, an RS threshold was picked such\ that the total length of "false positive" constrained elements\ (identified in the permuted alignments) was less than 5% of the length of\ constrained elements identified in the actual alignment.\ Thus, all annotated constrained elements are significant at better\ than 95% confidence, and the total fraction of the ENCODE regions\ annotated as constrained is 5-7%.

\ \

PhastCons/BinCons/GERP Union/Intersection of Coding/NonCoding Elements

\

\ These subtracks were produced by creating unions and intersections of the\ constrained element data detected by binCons, phastCons, and GERP on MLAGAN\ alignments. In these annotations, "non-coding" is defined as those \ regions not overlapping with CDS regions in any of the following UCSC gene \ tables: refFlat, knownGene, mgcGenes, vegaGene, or ensGene.

\ \

Credits

\

\ BinCons and phastCons MCS data were contributed by Elliott Margulies in the \ Eric Green lab at \ NHGRI, with assistance from Adam Siepel of UCSC.

\

\ GERP was developed primarily by Greg Cooper in the lab of \ Arend Sidow \ at Stanford University (Depts of Pathology and Genetics), in close collaboration\ with Eric Stone (Biostatistics, NC State), and George Asimenos and \ Eugene Davydov in the lab of \ Serafim Batzoglou \ (Dept. of Computer Science, Stanford).

\ \

References

\

\ See the MLAGAN Alignment and MLAGAN Cons tracks for references.

\ encodeCompGeno 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ exonArrows off\ group encodeCompGeno\ longLabel Stanford MLAGAN Conserved Elements\ priority 74.2\ shortLabel MLAGAN Elements\ track encodeMlaganElements\ type bed 5 .\ visibility hide\ sibTxGraph SIB Alt-Splicing altGraphX Alternative Splicing Graph from Swiss Institute of Bioinformatics 0 74.5 0 0 0 127 127 127 0 0 0 http://www.isrec.isb-sib.ch/cgi-bin/tromer/tromergraph2draw.pl?species=H.+sapiens&tromer=$$

Description

\

\ This track shows the graphs constructed by analyzing experimental RNA\ transcripts, and serves as basis for the predicted alternative splicing\ transcripts shown in the SIB Genes track. The blocks represent exons; lines\ indicate introns. The graphical display is drawn such that no exons\ overlap, making alternative events easier to view when the track is in full\ display mode and the resolution is set to approximately gene-level.

\

Further information on the graphs can be found on the\ Transcriptome \ Web interface.

\ \

Methods

\

\ The splicing graphs were generated using a multi-step pipeline: \

    \
  1. RefSeq and GenBank RNAs and ESTs are aligned to the genome with\ SIBsim4, keeping \ only the best alignments for each RNA.\
  2. Alignments are broken up at non-intronic gaps, with small isolated \ fragments thrown out.\
  3. A splicing graph is created for each set of overlapping alignments. This\ graph has an edge for each exon or intron, and a vertex for each splice site,\ start, and end. Each RNA that contributes to an edge is kept as evidence for\ that edge.\
  4. Graphs consisting solely of unspliced ESTs are discarded.\

\ \

Credits

\

\ The SIB Alternative Splicing Graphs track was produced on the Vital-IT high-performance \ computing platform\ using a computational pipeline developed by Christian Iseli with help from\ colleagues at the Ludwig \ Institute for Cancer\ Research and the Swiss \ Institute of Bioinformatics. It is based on data from NCBI RefSeq and GenBank/EMBL. Our\ thanks to the people running these databases and to the scientists worldwide\ who have made contributions to them.

\ rna 1 group rna\ idInUrlSql select name from sibTxGraph where id=%s\ longLabel Alternative Splicing Graph from Swiss Institute of Bioinformatics\ priority 74.5\ shortLabel SIB Alt-Splicing\ track sibTxGraph\ type altGraphX\ url http://www.isrec.isb-sib.ch/cgi-bin/tromer/tromergraph2draw.pl?species=H.+sapiens&tromer=$$\ urlLabel SIB link:\ visibility hide\ sibAltEvents SIB Alt Events bed 6 . Alt-Splicing, Alternative Promoters, Alternative Poly-A etc from SIB 0 74.6 0 0 0 127 127 127 0 0 0 rna 1 group rna\ longLabel Alt-Splicing, Alternative Promoters, Alternative Poly-A etc from SIB\ priority 74.6\ shortLabel SIB Alt Events\ track sibAltEvents\ type bed 6 .\ visibility hide\ encodeAffyChIpHl60PvalSirt1Hr02 Affy SIRT1 RA 2h wig 0.0 534.54 Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 2hrs) P-Value 0 75 0 225 0 127 240 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 0,225,0\ longLabel Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 2hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 75\ shortLabel Affy SIRT1 RA 2h\ subGroups factor=SIRT1 time=2h\ track encodeAffyChIpHl60PvalSirt1Hr02\ encodeAffyChIpHl60SitesSirt1Hr02 Affy SIRT1 RA 2h bed 3 . Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 2hrs) Sites 0 76 0 225 0 127 240 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 0,225,0\ longLabel Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 2hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 76\ shortLabel Affy SIRT1 RA 2h\ subGroups factor=SIRT1 time=2h\ track encodeAffyChIpHl60SitesSirt1Hr02\ mgcIntrons mgcIntronPicks bed 12 . Introns and Flanking Exons for RACE PCR 0 76 0 0 0 127 127 127 0 0 0 rna 1 group rna\ longLabel Introns and Flanking Exons for RACE PCR\ priority 76\ shortLabel mgcIntronPicks\ track mgcIntrons\ type bed 12 .\ visibility hide\ binMCS95_encode TBA MCSs Nov04 bed 5 . Binomial-based MCSs from ENCODE Nov. 2004 TBA alignments (top 5%) 0 76 0 60 120 127 157 187 1 0 0 encodeCompGeno 1 color 0,60,120\ dataVersion ENCODE June 2005 Freeze\ group encodeCompGeno\ longLabel Binomial-based MCSs from ENCODE Nov. 2004 TBA alignments (top 5%)\ priority 76.0\ shortLabel TBA MCSs Nov04\ track binMCS95_encode\ type bed 5 .\ useScore 1\ visibility hide\ encode_tba TBA Nov04 wigMaf 0.0 10.0 TBA Nov. 2004 Alignments of ENCODE Regions 0 76.1 0 10 100 1 128 0 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 altColor 1,128,0\ autoScale Off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 0, 10, 100\ dataVersion ENCODE June 2005 Freeze\ group encodeCompGeno\ longLabel TBA Nov. 2004 Alignments of ENCODE Regions\ maxHeightPixels 100:40:11\ pairwise encode_tba 15\ priority 76.1\ sGroup_mammal platypus\ sGroup_placental rat mouse cow dog armadillo\ sGroup_primate chimp baboon marmoset galago\ sGroup_vertebrate chicken\ shortLabel TBA Nov04\ spanList 1\ speciesGroups primate placental mammal vertebrate\ track encode_tba\ type wigMaf 0.0 10.0\ visibility hide\ wiggle binCons_encode\ yLineOnOff Off\ encodeAffyChIpHl60PvalSirt1Hr08 Affy SIRT1 RA 8h wig 0.0 534.54 Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 8hrs) P-Value 0 77 0 225 0 127 240 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 0,225,0\ longLabel Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 8hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 77\ shortLabel Affy SIRT1 RA 8h\ subGroups factor=SIRT1 time=8h\ track encodeAffyChIpHl60PvalSirt1Hr08\ encode_MSA2_MLAGAN MLAGAN Nov04 wigMaf 0.0 10.0 Stanford MLAGAN Alignments of November 2004 ENCODE MSA Sequences 0 77 0 10 100 1 128 0 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ We have generated a first version of alignments for the ENCODE regions. These alignments are in beta stage; future versions will incorporate newer alignment techniques that are currently under research and development in our group. We think that this early set of alignments will be useful for those who are eager to perform analysis on aligned data. For each region, we have used a human-centric methodology that comprises the following steps:

\
    \
  1. In the first step, the sequence of each species is "rearranged" so that it is orthologously collinear, with respect to the human sequence. In other words, the sequence of each species is mapped to the human sequence: First, a human-monotonic map is created based on local similarities between the two sequences (using an algorithm based on Shuffle-LAGAN), and then a new sequence is produced for the species, by glueing together different pieces of the original sequence, according to the mapping. The mapping allows for any rearrangements, such as inversions, translocations, and duplications. Therefore, a new FASTA file is created for each of the species (other than human), containing a sequence that is directly alignable to human using standard global alignment techniques.

  2. \
  3. In the second step, a multiple global alignment is created for every region using MLAGAN.

  4. \
\

Moreover, for each region we report how the sequences have been rearranged, so that people who want to do comparative analysis on the alignment can later map the coordinates of the rearranged sequences back to the original ones. For each region, a subdirectory named "rearrangements" contains a compressed tar archive with .info files. A .info file contains the map between the original and the rearranged sequence of some species. For example, here is the info file for the galago species, region ENr133:

\
  galago 2 165283 244112      1  58088  0 33 + 0     73  58160\
  galago 2 244113 348408  58089 174393 85  0 + 0  58044 174348\
  galago 2 353199 357763 174394 180830  0 69 - 0 175801 182237\
  galago 2 357764 369080 180831 195593 18  0 + 0 182613 197375\
  galago 2 369569 498714 195594 301899  4  0 + 0 197246 303551
\

The info file tries to follow the conventions of AVID's draft sequence info file format. The first field contains the species name; the last two fields contain the species' coordinates, and the third and fourth fields contain the human coordinates. For example, in the first line of the example, the part of galago's sequence from the 73rd to the 58,160th base is mapped to the respective part (from 165,283 to 244,112) of the human sequence. The file is always sorted according to human coordinates, since it is a human-monotonic map. Fields number five and six correspond to the coordinates of the rearranged sequence that is created from the map; in the example, the first 58,088 bases of the rearranged galago sequence are copied from positions 73 - 58,160 of the original sequence. The ninth field contains a sign that distinguishes positive strands (+) from negative ones (-). In the example, positions 175,801 to 182,237 are reverse complemented and then put into positions 174,394 to 180,830 in the rearranged sequence. The rest of the fields are nonimportant or irrelevant. Notice that not all of the original sequence is present in the rearranged one; the algorithm may discard parts of the original sequence which could not be mapped to any place of the human sequence.

\

The info files have also been drawn into linear plots. Here is the linear plot of the previous example:

\

ENr133 galago linearplot

\

The first grey horizontal line represents the galago sequence, and the second line represents the human sequence. Black arrows are drawn in rearranged regions (showing the direction within the strand) and grey lines cross the two regions to indicate that they were linked. The same info file can also be represented in a pseudo-dotplot, like the following:

\

ENr133 galago dotplot

\

The horizontal axis represents the human sequence, and the vertical axis represents the other species' sequence. A black line is drawn to indicate a rearranged piece, and grey dotted lines indicate its boundaries in the human-monotonic axis. The figure looks like a dotplot, but one should have in mind that it's actually a rearrangement visualization; it does not directly depict any local alignment hits or aligned regions, like usual dotplots do. Also notice that in all plots, the line that represents the sequence of the other species always ends at the position of the furthest rearrangement (rather than the position of the last nucleotide in the sequence).

\

Linear plots and dotplots are available for all regions, and they are located in the "rearrangements" subdirectory, in PNG format.

\

The actual data location is: http://ai.stanford.edu/~asimenos/beta_encode_Nov_2004/data/

\

Finally, notice that we have used "rat" instead of "ratB" in these alignments. Also, our rearrangement algorithm was not fine-tuned for each region, and so it may behave differently from region to region. It seems that a few of the regions (especially the randomly picked ones) contain sequencies from other species that show very weak homology or too many repeated elements. For that, we decided to exclude the marmoset sequence from region ENr132 and the cow sequence from region ENr213. Lastly, in some regions, the original FASTA files of some of the species contained more than one sequencies, usualy from two or more different chromosomes of the species; in this case we concatenated the sequencies into a single one, before feeding them to the rearrangement algorithm. Therefore, in such cases, the MAF and .info coordinates (and the first horizontal line in the linear plot or the vertical axis in the dotplot) refer to the concatenated input file. Thus, extra care should be taken if one needs to map these coordinates back to the original sequences.

\

Feel free to download the alignments or browse through some interesting plots! Be sure to email me any comments!

\

Valid XHTML 1.0!\ Valid CSS!

\ \ \ encodeCompGeno 1 altColor 1,128,0\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 0, 10, 100\ dataVersion ENCODE June 2005 Freeze\ group encodeCompGeno\ longLabel Stanford MLAGAN Alignments of November 2004 ENCODE MSA Sequences\ priority 77.0\ sGroup_mammal platypus\ sGroup_placental rat mouse cow dog armadillo\ sGroup_primate chimp baboon marmoset galago\ sGroup_vertebrate chicken\ shortLabel MLAGAN Nov04\ speciesGroups primate placental mammal vertebrate\ track encode_MSA2_MLAGAN\ type wigMaf 0.0 10.0\ visibility hide\ orthoMrna Mouse mRNAs bed 12 . Mouse mRNAs Mapped from mm3 via Nets/Chains 0 77 16 107 44 135 181 149 0 0 0 rna 1 color 16,107,44\ group rna\ longLabel Mouse mRNAs Mapped from mm3 via Nets/Chains\ priority 77\ shortLabel Mouse mRNAs\ track orthoMrna\ type bed 12 .\ visibility hide\ encodeAffyChIpHl60SitesSirt1Hr08 Affy SIRT1 RA 8h bed 3 . Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 8hrs) Sites 0 78 0 225 0 127 240 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 0,225,0\ longLabel Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 8hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 78\ shortLabel Affy SIRT1 RA 8h\ subGroups factor=SIRT1 time=8h\ track encodeAffyChIpHl60SitesSirt1Hr08\ encodeAffyChIpHl60PvalSirt1Hr32 Affy SIRT1 RA 32h wig 0.0 534.54 Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 32hrs) P-Value 0 79 0 225 0 127 240 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 0,225,0\ longLabel Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 32hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 79\ shortLabel Affy SIRT1 RA 32h\ subGroups factor=SIRT1 time=32h\ track encodeAffyChIpHl60PvalSirt1Hr32\ encodeReseqRegions Reseq Regions bed 4 . ENCODE Resequencing Regions 0 79.9 150 100 30 202 177 142 0 0 7 chr12,chr18,chr2,chr4,chr7,chr8,chr9,

Description

\

\ This track depicts the 10 ENCODE resequencing regions for the \ NHGRI ENCODE \ project.\ The long-term goal of this project is to identify all functional elements \ in the human genome sequence to facilitate a better understanding of human \ biology and disease.

\

\ These regions were chosen out of the 44 total. The resequencing was done \ by Broad and Baylor.\ \

\ See the NHGRI target \ selection process web page for a description of how the target \ regions were selected.

\

\ To open a UCSC Genome Browser with a menu for selecting ENCODE regions on the \ Build 34 human genome, use ENCODE Regions in the UCSC Browser. The UCSC resources \ provided for the ENCODE project are described on the \ UCSC ENCODE Portal.

\ \

Credits

\

\ Thanks to the NHGRI ENCODE project for providing this initial set of data.

\ \ encodeVariation 1 chromosomes chr12,chr18,chr2,chr4,chr7,chr8,chr9\ color 150,100,30\ group encodeVariation\ longLabel ENCODE Resequencing Regions\ priority 79.9\ shortLabel Reseq Regions\ track encodeReseqRegions\ type bed 4 .\ visibility hide\ encodeAffyChIpHl60SitesSirt1Hr32 Affy SIRT1 RA 32h bed 3 . Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 32hrs) Sites 0 80 0 225 0 127 240 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 0,225,0\ longLabel Affymetrix ChIP/Chip (SIRT1 retinoic acid-treated HL-60, 32hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 80\ shortLabel Affy SIRT1 RA 32h\ subGroups factor=SIRT1 time=32h\ track encodeAffyChIpHl60SitesSirt1Hr32\ encodeIndels NHGRI DIPs bed 9 + NHGRI Deletion/Insertion Polymorphisms in ENCODE regions 0 80 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ This track shows deletion/insertion polymorphisms (DIPs). In packed and full \ modes, the sequence variation is shown to the left of the DIP. \ The naming convention "-/sequence" is used for deletions; \ "sequence/-" is used for insertions. The details\ page shows the name of the trace used to define the polymorphism, the\ quality score, and the strand on which the trace aligns to the reference\ sequence.

\

\ The quality score reflects the minimum PHRED quality value over \ the entire range of the DIP within the trace, plus 5 flanking bases. \ PHRED quality scores are \ expressed as log probabilities using the formula: \

  Q = -10 * log10(Pe)\
\ where Pe is the estimated probability of an \ error at that base. PHRED quality scores typically vary from 0 to 40, where 0 \ indicates complete uncertainty about the base and 40 implies odds of 10,000 \ to 1 that the base is correct. Sometimes a PHRED value of 50 or higher is \ used to denote finished sequence. A color gradient is used to distinguish\ quality scores in the browser display: brighter shading indicates higher \ scores.

\

\ The "Trace Pos" value on the details page indicates the 3' position \ of the DIP within the trace. The alleles are \ reported relative to the "+" strand of the reference sequence; \ however, the trace may actually align to the "-" strand. \ When viewing the chromatogram using the URL provided, \ if the trace aligned to the "-" strand, the DIP bases in the trace \ will be the reverse compliment of the variant allele given.

\ \

Methods

\

\ All human trace data from NCBI's trace archive were aligned to hg17 with \ ssahaSNP, followed by ssahaDIP post-processing to detect deletion/insertion \ polymorphisms. DIPs within ENCODE regions were extracted.

\ \

Verification

\

\ For verification, 500k traces from the mouse whole genome shotgun (WGS) \ sequencing effort were compared to mm6 using ssahaSNP and \ ssahaDIP. Because mm6 and these traces are from the same mouse strain, \ C57BL/6J, the DIP rate should be very low. Applying a quality threshold of \ Q23, the detected DIP rate was one DIP per 140k Neighborhood Quality Standard \ (NQS) bases. This level was ten-fold lower than the SNP rate for the same \ data set using ssahaSNP, which has been validated as having a 5% false positive rate. \ The detected DIP rate for human traces against hg17 is one DIP per 12k NQS \ bases, indicating a false positive rate of 12k/140k, or about 8%.

\

\ Further validation experiments are in progress.

\ \

Credits

\

\ All analyses were performed by Jim Mullikin using ssahaSNP and ssahaDIP. \ The trace data were contributed to the trace archive by many sequencing \ centers.

\ \

References

\

\ Ning Z, Cox AJ, Mullikin JC. \ SSAHA: A fast search method for large DNA databases.\ Genome Res. 2001 Oct;11(10):1725-9.

\

\ The International SNP Map Working Group. \ A map of human genome sequence variation containing \ 1.4 million single nucleotide polymorphisms.\ Nature. 2001 Feb 15;409(6822):928-33.

\ encodeVariation 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ dataVersion ENCODE June 2005 Freeze\ group encodeVariation\ itemRgb on\ longLabel NHGRI Deletion/Insertion Polymorphisms in ENCODE regions\ origAssembly hg17\ priority 80\ shortLabel NHGRI DIPs\ track encodeIndels\ type bed 9 +\ visibility hide\ encodeUnimiCst UNIMI Cons Tags bed 5 + UNIMI Cons Tags 0 80 0 60 120 127 157 187 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeCompGeno 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 0,60,120\ dataVersion ENCODE June 2005 Freeze\ group encodeCompGeno\ priority 80.0\ shortLabel UNIMI Cons Tags\ track encodeUnimiCst\ type bed 5 +\ visibility hide\ encodeHapMapCov HapMap Coverage wig 0.0 100.0 ENCODE HapMap (16c.1) Resequencing Coverage 0 80.8 0 0 0 127 127 127 0 0 7 chr2,chr4,chr7,chr8,chr9,chr12,chr18,

Description

\

\ This track shows depth sequencing coverage for the four HapMap populations in \ the ten ENCODE regions that have been resequenced for variation. The data for\ each population is shown in a separate subtrack:\

\

\ The ENCODE regions targeted in this annotation include: \

\ \

Display Conventions and Configuration

\

\ The subtracks within this annotation \ may be configured in a variety of ways to highlight different aspects of the\ displayed data. The graphical configuration options are shown at the top of\ the track description page, followed by a list of subtracks. To display only\ selected subtracks, uncheck the boxes next to the tracks you wish to hide. \ For more information about the graphical configuration options, click the\ Graph\ configuration help link.

\ \

Methods

\

\ Each data value represents the number of sequencing traces that covered the \ nucleotide.\ See the International HapMap \ Project website for information about how these data were collected and\ analyzed.

\ \

Credits

\

\ These data were obtained from HapMap public release 16c.1. Thanks to the\ International HapMap Project for making this information available.

\ encodeVariation 0 autoScale Off\ chromosomes chr2,chr4,chr7,chr8,chr9,chr12,chr18\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeVariation\ longLabel ENCODE HapMap (16c.1) Resequencing Coverage\ maxHeightPixels 128:16:16\ maxLimit 16\ minLimit 0\ priority 80.8\ shortLabel HapMap Coverage\ track encodeHapMapCov\ type wig 0.0 100.0\ visibility hide\ encodeRecombRate SNP Recomb Rates bedGraph 4 Oxford Recombination Rates from ENCODE resequencing data 0 80.9 0 0 0 127 127 127 0 0 7 chr2,chr4,chr7,chr8,chr9,chr12,chr18,

Description

\

\ This track shows recombination rates measured in\ centiMorgans/Megabase in ten ENCODE regions that have been resequenced: \

\

\ Observations from sperm studies (Jeffreys et al., 2001) and \ patterns of genetic variation (McVean et al., 2004; Crawford \ et al., 2004) show\ that recombination rates in the human genome vary extensively over kilobase \ scales and that much recombination occurs in recombination hotspots. This\ provides an explanation for the apparent block-like structure of linkage \ disequlibrium (Daly et al., 2001; Gabriel et al., 2002).

\

\ Fine-scale recombination rate estimates provide a new route to\ understanding the molecular mechanisms underlying human recombination.\ A better understanding of the genomic landscape of human recombination\ rate variation would facilitate the efficient design and analysis of\ disease association studies and greatly improve inferences from\ polymorphism data about selection and human demographic history.

\ \

Display Conventions and Configuration

\

\ This annotation track may be configured in a variety of ways to highlight \ different aspects of the displayed data. The graphical configuration options \ are shown at the top of the track description page. \ For more information, click the \ Graph\ configuration help link.

\ \

Methods

\

\ Fine-scale recombination rates are estimated using the reversible-jump\ Markov chain Monte Carlo method (McVean et al., 2004). This\ approach explores the posterior distribution of fine-scale recombination\ rate profiles, where the state-space considered is the distribution of\ piece-wise constant recombination maps. The Markov chain explores the\ distribution of both the number and location of change-points, in addition\ to the rates for each segment. A prior is set on the number of\ change-points that increases the smoothing effect of trans-dimensional\ MCMC, which is necessary because of the composite-likelihood scheme\ employed.

\

\ This method is implemented in the package \ LDhat, \ which includes full details of installation and implementation.

\

\ For the ENCODE regions, a block-penalty of 5 was used (calibrated by simulation \ and comparison to data from sperm-typing studies). Each region was\ analyzed as a single run with 10,000,000 iterations, sampling every 5000th\ iteration and discarding the first third of all samples as burn-in. The\ mean posterior rate for each SNP interval is the value reported. Because of \ the non-independence of the composite likelihood scheme,\ the quantiles of the sampling distribution do not reflect true uncertainty\ and are therefore not given.

\

\ Estimates were generated separately from each of the four ENCODE resequencing \ populations, and then combined to give a single figure. Differences between \ populations are not significant.

\ \

Validation

\

\ This approach has been validated in three ways: by extensive\ simulation studies and by comparisons with independent estimates of\ recombination rates, both over large scales from the genetic map and\ over fine scales from sperm analysis. Full details of validation can be \ found in McVean et al. (2004) and Winckler et al. (2005).

\ \

Credits

\

\ The data is based on HapMap \ release 16. The recombination rates were ascertained by Gil McVean from the\ Mathematical Genetics Group at the University of Oxford.

\ \

References

\

\ Crawford, D.C., Bhangale, T., Li, N., Hellenthal, G., Rieder, M.J., \ Nickerson, D.A. and Stephens, M.\ Evidence for substantial fine-scale variation in recombination \ rates across the human genome.\ Nat Genet. 36(7), 700-6 (2004).

\

\ Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J. and Lander, E.S.\ High-resolution haplotype structure in the human genome.\ Nat Genet. 29(2), 229-32 (2001).

\

\ Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, \ B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M. et al.\ The structure of haplotype blocks in the human genome.\ Science 296(5576), 2225-9 (2002).

\

\ Jeffreys, A.J,. Kauppi, L. and Neumann, R.\ Intensely punctate meiotic recombination in the class II region \ of the major histocompatibility complex.\ Nat Genet. 29(2), 217-22 (2001).

\

\ McVean, G.A., Myers, S.R., Hunt, S., Deloukas, P., Bentley, D.R. and Donnelly, \ P.\ The fine-scale structure of recombination rate variation in the \ human genome.\ Science 304(5670), 581-4 (2004).

\

\ Winckler, W., Myers, S.R., Richter, D.J., Onofrio, R.C., McDonald, G.J., \ Bontrop, R.E., McVean, G.A., Gabriel, S.B., Reich, D., Donnelly, P. \ et al.\ Comparison of fine-scale recombination rates in humans and \ chimpanzees.\ Science 308(5718), 107-11 (2005).

\ \ encodeVariation 0 autoScale Off\ chromosomes chr2,chr4,chr7,chr8,chr9,chr12,chr18\ dataVersion ENCODE June 2005 Freeze\ group encodeVariation\ longLabel Oxford Recombination Rates from ENCODE resequencing data\ maxHeightPixels 64:32:16\ maxLimit 100\ minLimit 0\ origAssembly hg16\ priority 80.9\ shortLabel SNP Recomb Rates\ track encodeRecombRate\ type bedGraph 4\ viewLimits 0:16\ visibility hide\ encodeRecombHotspot SNP Recomb Hots bed 3 . Oxford Recombination Hotspots from ENCODE resequencing data 0 80.91 0 0 0 127 127 127 0 0 7 chr2,chr4,chr7,chr8,chr9,chr12,chr18,

Description

\

\ This track shows the location of recombination hotspots detected from\ patterns of genetic variation. It is based on the HapMap ENCODE data,\ in the ten ENCODE regions that have been resequenced:\

\

\

\ Observations from sperm studies (Jeffreys et al., 2001) and\ patterns of genetic variation (McVean et al., 2004; Crawford\ et al., 2004) show that recombination rates in the human\ genome vary extensively over kilobase scales and that much\ recombination occurs in recombination hotspots. This provides an\ explanation for the apparent block-like structure of linkage\ disequlibrium (Daly et al., 2001; Gabriel et al.,\ 2002).\

\

\ Recombination hotspot estimates provide a new route to\ understanding the molecular mechanisms underlying human recombination.\ A better understanding of the genomic landscape of human recombination\ hotspots would facilitate the efficient design and analysis of\ disease association studies and greatly improve inferences from\ polymorphism data about selection and human demographic history.\

\ \

Methods

\

\ Recombination hotspots are identified using the likelihood-ratio test\ described in McVean et al. (2004) and Winckler et al. (2005), \ referred to as LDhot. For successive intervals of 200 kb, the maximum\ likelihood of a model with a constant recombination rate is compared\ to the maximum likelihood of a model in which the central 2 kb is a\ recombination hotspot (likelihoods are approximated by the composite\ likelihood method of Hudson 2001). The observed difference in log\ composite likelihood is compared against the null distribution, which\ is obtained by simulations. Simulations are matched for sample size,\ SNP density, background recombination rate and an approximation to the\ ascertainment scheme (a panel of 12 individuals with a Poisson number\ of chromosomes, mean 1, sampled from this panel, using a single hit\ ascertainment scheme for dbSNP and resequencing of 16 individuals for\ the 10 HapMap ENCODE regions). Evidence for a hotspot was assessed in\ each analysis panel separately (YRI, CEU and combined CHB+JPT), and\ p-values were combined such that a hotspot requires that two of the\ three populations show some evidence of a hotspot (p < 0.05) and at\ least one population showed stronger evidence for a hotspot\ (p < 0.01). Hotspot centers were estimated at those locations where\ distinct recombination rate estimate peaks occurred with at least a factor \ of two separation between peaks, within the low p-value intervals.\

\ \

Validation

\

\ This approach has been validated in three ways: by extensive\ simulation studies and by comparisons with independent estimates of\ recombination rates, both over large scales from the genetic map and\ over fine scales from sperm analysis. Full details of validation can be \ found in McVean et al. (2004) and Winckler et al. (2005).\

\ \

Credits

\

\ The data are based on HapMap \ release 16a. The recombination hotspots were ascertained by Simon Myers from the\ Mathematical Genetics Group at the University of Oxford.\

\ \

References

\

\ Crawford, D.C., Bhangale, T., Li, N., Hellenthal, G., Rieder, M.J., \ Nickerson, D.A. and Stephens, M.\ Evidence for substantial fine-scale variation in recombination \ rates across the human genome.\ Nat Genet. 36(7), 700-6 (2004).\

\

\ Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J. and Lander, E.S.\ High-resolution haplotype structure in the human genome.\ Nat Genet. 29(2), 229-32 (2001).\

\

\ Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, \ B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M. et al.\ The structure of haplotype blocks in the human genome.\ Science 296(5576), 2225-9 (2002).\

\

\ Hudson, R. R. Two-locus sampling distributions and their application. Genetics 159(4):1805-1817 (2001).\

\

\ Jeffreys, A.J,. Kauppi, L. and Neumann, R.\ Intensely punctate meiotic recombination in the class II region \ of the major histocompatibility complex.\ Nat Genet. 29(2), 217-22 (2001).\

\

\ McVean, G.A., Myers, S.R., Hunt, S., Deloukas, P., Bentley, D.R. and Donnelly, \ P.\ The fine-scale structure of recombination rate variation in the \ human genome.\ Science 304(5670), 581-4 (2004).\

\

\ Winckler, W., Myers, S.R., Richter, D.J., Onofrio, R.C., McDonald, G.J., \ Bontrop, R.E., McVean, G.A., Gabriel, S.B., Reich, D., Donnelly, P. \ et al.\ Comparison of fine-scale recombination rates in humans and \ chimpanzees.\ Science 308(5718), 107-11 (2005).\

\ encodeVariation 1 chromosomes chr2,chr4,chr7,chr8,chr9,chr12,chr18\ dataVersion ENCODE June 2005 Freeze\ group encodeVariation\ longLabel Oxford Recombination Hotspots from ENCODE resequencing data\ origAssembly hg16\ priority 80.91\ shortLabel SNP Recomb Hots\ track encodeRecombHotspot\ type bed 3 .\ visibility hide\ encodeAffyChIpHl60PvalTfiibHr32 Affy TFIIB RA 32h wig 0.0 534.54 Affymetrix ChIP/Chip (TFIIB retinoic acid-treated HL-60, 32hrs) P-Value 0 81 0 0 200 127 127 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 0 color 0,0,200\ longLabel Affymetrix ChIP/Chip (TFIIB retinoic acid-treated HL-60, 32hrs) P-Value\ parent encodeAffyChIpHl60Pval\ priority 81\ shortLabel Affy TFIIB RA 32h\ subGroups factor=TFIIB time=32h\ track encodeAffyChIpHl60PvalTfiibHr32\ encodeSangerGenoExprAssociation Sanger Assoc bed5FloatScore Sanger Genotype-Expression Association 0 81 0 0 0 127 127 127 1 0 6 chr2,chr7,chr8,chr9,chr12,chr18,

Description

\

\ This track displays associations among gene expression\ data from the 60 unrelated Centre d'Etude du Polymorphisme Humain (CEPH)\ individuals of the International \ HapMap Project with SNPs genotyped by HapMap. The CEPH population is \ composed of Utah residents with ancestry from northern and western Europe.\ The expression data were generated with the Illumina platform at the \ Wellcome Trust \ Sanger Institute.

\ \

Display Conventions and Configuration

\

\ In the graphical display, an association is displayed as a block \ drawn at the location of the associated SNP. In pack or full modes, \ the name of the associated gene is drawn to the left of the block. \ The shading of the block indicates the strength of the association:\ light gray indicates a (-log10) P-value close to 0 and \ black indicates a P-value of 2 or more. \

\ \

Methods

\

\ An association analysis was performed for each ENCODE RefSeq gene with the \ genotypes of SNPs in the same ENCODE region (cis). Expression values were\ initially log2 transformed and subsequently normalized with quantile\ normalization to ensure homogeneous levels between arrays. \ Analysis of variance (ANOVA) was then performed with 1 or 2 degrees of \ freedom (depending on whether only two or all three genotypes in the population\ were available), using \ the genotype as a categorical variable and the normalized/transformed\ expression values as the response. The values presented here are the\ -log10 P-value.

\ \

Verification

\

\ There were six technical replicates for each sample; the average values\ from these were used for the ANOVA.

\ \

Credits

\

\ The following people contributed to this analysis:\ Barbara Stranger, Matthew Forrest, Panos Deloukas, and Manolis Dermitzakis \ from Wellcome Trust Sanger Institute and Simon Tavare from Cambridge \ University.

\ \

References

\

\ Dausset, J., Cann, H., Cohen, D., Lathrop, M., Lalouel, J.M. and White, R.\ Centre d'Etude du Polymorphisme Humain (CEPH): \ collaborative genetic mapping of the human genome.\ Genomics 6(3), 575-7 (1990).

\ \ encodeVariation 1 chromosomes chr2,chr7,chr8,chr9,chr12,chr18\ dataVersion ENCODE June 2005 Freeze\ group encodeVariation\ longLabel Sanger Genotype-Expression Association\ origAssembly hg16\ priority 81\ shortLabel Sanger Assoc\ track encodeSangerGenoExprAssociation\ type bed5FloatScore\ useScore 1\ visibility hide\ encodeAffyChIpHl60SitesTfiibHr32 Affy TFIIB RA 32h bed 3 . Affymetrix ChIP/Chip (TFIIB retinoic acid-treated HL-60, 32hrs) Sites 0 82 0 0 200 127 127 227 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeChip 1 color 0,0,200\ longLabel Affymetrix ChIP/Chip (TFIIB retinoic acid-treated HL-60, 32hrs) Sites\ parent encodeAffyChIpHl60Sites\ priority 82\ shortLabel Affy TFIIB RA 32h\ subGroups factor=TFIIB time=32h\ track encodeAffyChIpHl60SitesTfiibHr32\ encodeHapMapAlleleFreq HapMap SNPs bed 6 + ENCODE HapMap (16c.1) Allele Frequencies 0 82 0 0 0 127 127 127 1 0 7 chr2,chr4,chr7,chr8,chr9,chr12,chr18,

Description

\

\ This track shows allele frequencies for the four HapMap populations in the \ ten ENCODE regions that have been resequenced for variation. The data for\ each population is shown in a separate subtrack:\

\

\ The ENCODE regions targeted in this annotation include: \

\

\ See the Methods section for a discussion of the scoring method used in this \ annotation.

\ \

The data set combines SNPs from the HapMap resequencing project, in addition to\ SNPs discovered previously.\ \

Display Conventions and Configuration

\

\ The complete list of subtracks available in this annotation is shown at\ the top of the track description page. To display\ only selected subtracks, uncheck the boxes next to the tracks you wish to\ hide.

\

\ Allele locations are indicated by tickmarks using a grayscale coloring \ scheme based on score, where darker shading indicates a higher score. A lower\ score indicates little or no variation; a higher score indicates a split \ between the reference and variant observations in the population.\ The track details page for an individual allele displays the variant and\ reference sequences, the allele frequencies, the origination of the data,\ and the total sample count.

\ \

Methods

\

\ See the International HapMap \ Project website for information about how these data were collected and\ analyzed.

\

\ The score calculation in this annotation is a function of the minor allele \ frequency (maf), which varies from 0.0 to 0.5. The score has been normalized \ to a range of 500 to 1000 using the formula score = 500 + (maf * 1000).\ Thus, a score of 500 indicates no variation; a score of 1000 indicates an even \ split between reference and variant observations in the population.

\ \

Credits

\

\ These data were obtained from HapMap public release 16c.1. Thanks to the\ International HapMap Project for making this information available.

\ encodeVariation 1 chromosomes chr2,chr4,chr7,chr8,chr9,chr12,chr18\ compositeTrack on\ dataVersion ENCODE June 2005 Freeze\ group encodeVariation\ longLabel ENCODE HapMap (16c.1) Allele Frequencies\ origAssembly hg16\ priority 82\ shortLabel HapMap SNPs\ track encodeHapMapAlleleFreq\ type bed 6 +\ useScore 1\ visibility hide\ encodeStanfordNRSF Stanf NRSF Tags bed 6 . Stanford NRSF/REST 0 83 0 0 0 127 127 127 0 0 23 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,chrY,chrM,

Description

\

\ ChIP-seq is a new approach for investigating entire transcription factor-DNA interactomes.\ ChIP-seq is comprised of chromatin immunoprecipitation (ChIP) followed by single-molecule-based\ sequencing (seq). ChIP-seq avoids the complications of array hybridization.\

\ This track shows 25mer sequence reads obtained via a ChIP-seq protocol \ from an NRSF/REST-enriched ChIP sample and a companion control sample\ of the same fixed chromatin without immuno-enrichment. These sequence\ reads were later analyzed using a peak locator (data to be available in the future)\ to identify NRSF-positive binding events.\

\ Chromatin immunoprecipitation was applied to NRSF/REST loci identified by a\ PSFM (position specific frequency matrix) computational screen. NRSF/REST\ is a zinc finger repressor that negatively regulates many neuronal genes in\ stem and progenitor cells and in nonneuronal cell types such as the\ Jurkat T cell line used for this track.\

\ NRSF (neuron-restrictive silencer factor), also known as REST (repressor element-1 \ silencing transcription factor), was chosen because prior studies provide\ a large set of target genes. In addition, the DNA motif bound by NRSF \ (known as NRSE/RE1) is long (21 bp) and well-specified, and there is a high-quality \ monoclonal antibody that recognizes NRSF efficiently in ChIP experiments.\ \

Methods

\

\ A Jurkat human T lymphoblast cell line from \ ATCC\ was cultured according to standard protocols.\ Chromatin immunoprecipitation was performed as described in Mortazavi et al. (2006) using a\ custom monoclonal antibody. \ Libraries were prepared from the ChIP DNA using the ligation mediated PCR Solexa protocol. \ The Solexa library construction protocol was modified to include a PCR preamplification following \ linker ligation and preceding gel electrophoresis. Reducing the size and narrowing the\ size range of DNAs collected from gel purification is intended to improve positional resolution of\ ChIP-seq.\

\ Cistematic\ (Wold lab) was used to perform motif-oriented analysis. The NRSE2 PSFM was used to\ identify canonical NRSE sites. These sites were used to compare and call the distances from\ peaks of ChIP-seq read-tag distributions at each location.\ Reads which mapped to multiple genomic locations are not included.\ Up to two mismatches were allowed.\ \

Verification

\

\ These data represent a pool of 4 replicate immunoprecipitations. \

\ NRSF-binding sites previously identified by qPCR or transfection assays (74 loci) plus a set of known negatives\ (139 loci) were used to estimate a sensitivity of 87% and a specificity of 97% for clusters of 13 tags \ separated by no more than 100 bp.\

\ 771 computationally high-scoring NRSE motifs were to assess the precision of ChIP-seq site location. \ 754 sites were found in the ChIP-seq experiments, and the center of a 21-bp NRSE motif was within 50 base pairs\ of the called ChIPSeq peak for 94% of these.\

\ NRSF binding in or near promoters is expected to be correlated with low levels of transcription.\ Labeled cRNA was hybridized to Illumina Sentrix RefSeq8 whole-genome gene expression microarrays.\ Illumina BeadStudio software was used to extract and normalize the data (rank-invariant method).\ A database of transcription start sites from \ SwitchGear Genomics\ was used to obtain high-confidence\ promoter predictions. The 230 transcripts occuring near ChIP-seq peaks had a median expression \ intensity of 6.8, while the full set of 20,589 transcripts had a median expression intensity of 23.6.\ The difference of medians was significant (P = 1 x 10-11) by the Mann-Whitney test.\ \

Display conventions

\ \ Sequence reads that align to the forward strand are displayed in green; reads that align to the reverse\ strand are displayed in red.\ \

Credits

\

\ \ Myers Group: David Johnson, Betsy Anton, Loan Nguyen, Cat Medina, Richard Myers.\

\ \ Wold Group: Barbara Wold, Ali Mortazavi, Kenneth McCue.\

\ Solexa/Illumina Sequencing: Gary Schroth.\ \

References

\

\ Fields S.\ \ Site-Seeing by Sequencing.\ Science 2007 June;316:1441-1442.\

\ Johnson DS, Mortazavi A, Myers RM, Wold B.\ \ Genome-Wide Mapping of in Vivo Protein-DNA Interactions.\ Science 2007 June;316:1497-1502.\

\ Mortazavi A, Thompson EC, Garcia ST, Myers RM, Wold B. \ \ Comparative genomics modeling of the NRSF/REST repressor network: \ from single conserved sites to genome-wide repertoire.\ Genome Res. 2006 Oct;16(10):1208-21. \

Schoenherr CJ, Paquette AJ, Anderson DJ.\ \ Identification of potential target genes for the neuron-restrictive silencer factor.\ Proc. Natl. Acad. Sci.1996;93:9881-9886.\

Related Work

\

\ Mikkelsen TS et.al.\ \ Genome-wide maps of chromatin state in pluripotent and lineage-committed cells.\ Nature 5 July 2007; 448(7149).\ encodeChip 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX,chrY,chrM\ compositeTrack on\ dataVersion March 2007\ group encodeChip\ longLabel Stanford NRSF/REST\ origAssembly hg17\ priority 83\ shortLabel Stanf NRSF Tags\ superTrack encodeStanfordChipSuper dense\ track encodeStanfordNRSF\ type bed 6 .\ visibility hide\ rosetta Rosetta bed 15 + Rosetta Experimental Confirmation of Chr22 Exons 0 88 0 0 0 127 127 127 0 0 1 chr22,

Description

\

Expression data from Rosetta Inpharmatics.\ See the paper "Experimental Annotation of the Human Genome Using Microarray Technology"\ Nature Feb. 2001, vol 409 pp 922-7 for more\ information. Briefly, Rosetta created DNA probes for each exon as\ described by the Sanger center for the October 2000 draft of the\ genome and used them to explore expression leves over 69 different\ experiments. As in the original paper exons are labeled according to\ contig name, relative position in the contig, and whether they were\ predicted (pe) or confirmed (true->te) exons at the time of\ publication. For example, AC000097_256_te is the 256th exon on\ AC000097 predicted by Genescan which was confirmed\ independently. Hybridization names refer to the sources of the two\ mRNA populations used for the experiment.\ Please note: in the browser window the hybridization names\ are too long to fit and have been abbreviated. Also, the ratios\ were inverted as of Feb 12, 2002 to conform with standard microarray\ conventions of having the experimental sample in the red (cy5) channel\ and the reference sample in the green (cy3) channel.\ \

Display Options

\ The track can be configured with a few different options:\ \
Reference Sample: This option is only valid when the track is displayed in\ full. It determines how the 69 different experiments are displayed. The\ options are:\ \ \ Exons Shown: Probes on the microarrays correspond to gene\ predictions on chromosome 22, some of which were confirmed by known\ genes, others are predictions. This option determines whether data are\ shown for probes corresponding to confirmed, predicted, or all exons\ are shown.\ \
Color Scheme: Data are presented using two color false\ display. By default the Brown/Botstein colors of red -> positive log\ ratio, green -> negative log ratio are used. However, blue can be\ substituted for green for those who are color blind. Gray values\ indicate missing data. Please note that due to technical limitations\ the details page will have many more color shades possible than those used\ on the browser image and thus may not match exactly.\ \

Details Page

\ On the details page the probes presented correspond to those contained\ in window range seen on the Genome Browser, the exon probe selected is highlighted\ in blue. The detail display table is actually an average of many data\ points. It is possible to see the full data for each experiment\ graphically by selecting the check-boxes for the experiments of interest\ and clicking the submit value button.\ regulation 1 chromosomes chr22,\ group regulation\ longLabel Rosetta Experimental Confirmation of Chr22 Exons\ priority 88\ shortLabel Rosetta\ track rosetta\ type bed 15 +\ visibility hide\ cpgIsland CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 0 90 0 100 0 128 228 128 0 0 0

Description

\

\ CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites, and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.\

\ \

Methods

\

\ CpG islands are predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment is then\ evaluated for the following criteria:\

\

\ The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length.

\ \

Credits

\

\ This track was generated using a modification of a program developed by \ G. Miklem and L. Hillier.

\ \ regulation 1 altColor 128,228,128\ color 0,100,0\ group regulation\ longLabel CpG Islands (Islands < 300 Bases are Light Green)\ priority 90\ shortLabel CpG Islands\ track cpgIsland\ type bed 4 +\ visibility hide\ cpgIslandGgfAndyMasked CpG Islands (AL) bed 4 + CpG Islands - Andy Law, masked sequence (Islands < 300 Bases are Light Green) 0 90.001 0 100 0 128 228 128 0 0 0

Description

\

\ CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites, and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.\

\

\ The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length.\

\ \

Methods

\

\ The genome sequence was masked using the output of RepeatMasker and\ the Tandem Repeats Finder (period ≤ 12). A sliding-window search\ was performed on the set of CpG locations in the masked genome\ sequence to find the longest spans that met the criteria given in\ Gardiner-Garden, M. and Frommer, M. (1987) in the References section\ below:\

\ The ratio of observed to expect CpGs is calculated as follows:\
\
\
Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)\
\

\ \

Credits

\

\ This track was generated using a program written by Andy Law (Roslin \ Institute) with minor modifications by Angie Hinrichs (UCSC).

\ \

References

\

\ Gardiner-Garden M, Frommer M.\ CpG islands in vertebrate genomes.\ J. Mol. Biol. 1987 Jul 20;196(2):261-82.

\ \ regulation 1 altColor 128,228,128\ color 0,100,0\ group regulation\ longLabel CpG Islands - Andy Law, masked sequence (Islands < 300 Bases are Light Green)\ priority 90.001\ shortLabel CpG Islands (AL)\ track cpgIslandGgfAndyMasked\ type bed 4 +\ visibility hide\ cpgIslandGgfAndy CpG Islands (AL) bed 4 + CpG Islands - Andy Law (Islands < 300 Bases are Light Green) 0 90.01 0 100 0 128 228 128 0 0 0

Description

\

\ CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites, and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.\

\

\ The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length.\

\ \

Methods

\

\ A sliding-window search was performed on the set of CpG locations in \ the genome to find the longest spans that met the criteria given in \ Gardiner-Garden, M. and Frommer, M. (1987) in the References section below:\

\ The ratio of observed to expect CpGs is calculated as follows:\
\
\
Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)\
\

\ \

Credits

\

\ This track was generated using a program written by Andy Law (Roslin \ Institute) with minor modifications by Angie Hinrichs (UCSC).

\ \

References

\

\ Gardiner-Garden M, Frommer M.\ CpG islands in vertebrate genomes.\ J. Mol. Biol. 1987 Jul 20;196(2):261-82.

\ regulation 1 altColor 128,228,128\ color 0,100,0\ group regulation\ longLabel CpG Islands - Andy Law (Islands < 300 Bases are Light Green)\ priority 90.01\ shortLabel CpG Islands (AL)\ track cpgIslandGgfAndy\ type bed 4 +\ visibility hide\ softPromoter TSSW Promoters bed 5 + TSSW Promoter Predictions 0 90.2 0 100 0 127 177 127 0 0 0 regulation 1 color 0,100,0\ group regulation\ longLabel TSSW Promoter Predictions\ priority 90.2\ shortLabel TSSW Promoters\ track softPromoter\ type bed 5 +\ visibility hide\ transfacHit Transfac Hits bed 6 . Transfac Transcription Factor Binding Sites Near Transcription Start 0 91 0 0 0 127 127 127 1 0 0 regulation 1 group regulation\ longLabel Transfac Transcription Factor Binding Sites Near Transcription Start\ priority 91\ shortLabel Transfac Hits\ spectrum on\ track transfacHit\ type bed 6 .\ visibility hide\ triangleSelf Golden Triangle bed 6 . Golden Triangle Possible Transcription Factor Binding Sites 0 92 0 0 0 127 127 127 1 0 0 regulation 1 group regulation\ longLabel Golden Triangle Possible Transcription Factor Binding Sites\ priority 92\ shortLabel Golden Triangle\ spectrum on\ track triangleSelf\ type bed 6 .\ visibility hide\ esRegGeneToMotif Reg. Module bed 6 + Eran Segal Regulatory Module 1 93 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows predicted transcription factor binding sites \ based on sequence similarities upstream of coordinately expressed genes.\

\ In dense display mode the gold areas indicate the extent of the area\ searched for binding sites; black boxes indicate the actual\ binding sites. In other modes the gold areas disappear and only\ the binding sites are displayed. Clicking on a particular predicted binding \ site displays a page that shows the sequence motif associated with the \ predicted transcription factor and the sequence at the predicted binding site.\ Where known motifs have been identified by this method, they are named;\ otherwise, they are assigned a motif number.\ \

Methods

\

\ This analysis was performed according to \ Genome-wide discovery of transcriptional modules from DNA \ sequence and gene expression on various pre-existing microarray datasets.\ A regulatory module is comprised of a set of genes predicted to be regulated \ by the same combination of DNA sequence motifs. The predictions are based on \ the co-expression of the set of genes in the module and on the appearance of\ common combinations of motifs in the upstream regions of genes assigned to\ the same module. \ \

Credits

\

\ Thanks to Eran Segal for providing the data analysis that forms the \ basis for this track. The display was programmed by \ Jim Kent.\ regulation 1 exonArrows off\ group regulation\ longLabel Eran Segal Regulatory Module\ noScoreFilter .\ priority 93\ shortLabel Reg. Module\ spectrum on\ track esRegGeneToMotif\ type bed 6 +\ visibility dense\ triangle Golden Extra bed 6 . Golden Triangle Motif Matching Sites Near Transcription Start 0 94 0 0 0 127 127 127 1 0 0 regulation 1 group regulation\ longLabel Golden Triangle Motif Matching Sites Near Transcription Start\ priority 94\ shortLabel Golden Extra\ spectrum on\ track triangle\ type bed 6 .\ visibility hide\ transfac Transfac Hits genePred refPep refMrna Transfac Hits 0 95 12 12 120 133 133 187 0 0 0 regulation 1 color 12,12,120\ group regulation\ longLabel Transfac Hits\ priority 95\ shortLabel Transfac Hits\ track transfac\ type genePred refPep refMrna\ visibility hide\ transfacRatios Transfac Ratios bed 6 . Transfac Likelihood Ratios 0 96 12 12 120 133 133 187 0 0 0 regulation 1 color 12,12,120\ group regulation\ longLabel Transfac Likelihood Ratios\ priority 96\ shortLabel Transfac Ratios\ track transfacRatios\ type bed 6 .\ visibility hide\ psuReg Known Regulatory bed 4 . Functional Regulatory Elements Compiled by Penn State 0 97 30 130 210 142 192 232 0 0 0

Regulatory Elements


\
\ This list of functional regions contains names and coordinates of the regulatory regions relative to the Decmber version of the Human Genome Browser. \
\ Note these regions have not been trimmed to show the smallest possible functional element with maximum activity. They range in size from 300-4000 bp. \
\

\

\ Details on source of Regulatory Region data
\ \

\ \ please direct comments or questions to Laura Elnitski at \ elnitski@bio.cse.psu.edu.\ \

\ April 16,2002\

\ Data made available by Laura Elnitski, Webb Miller, Ross Hardison, Scott Schwartz, Emmanouil Dermitzakis, Andrew Clark, William Krivan and Wyeth Wasserman\ regulation 1 color 30,130,210\ group regulation\ longLabel Functional Regulatory Elements Compiled by Penn State\ priority 97\ shortLabel Known Regulatory\ track psuReg\ type bed 4 .\ visibility hide\ snp134Common Common SNPs(134) bed 6 + Simple Nucleotide Polymorphisms (dbSNP 134) Found in >= 1% of Samples 1 99.0916 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about a subset of the \ single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 134, available from\ ftp.ncbi.nih.gov/snp.\ Only SNPs that have a minor allele frequency of at least 1% and\ are mapped to a single location in the reference genome assembly are\ included in this subset. Frequency data are not available for all SNPs,\ so this subset is incomplete.\

\

\ The selection of SNPs with a minor allele frequency of 1% or greater\ is an attempt to identify variants that appear to be reasonably common\ in the general population. Taken as a set, common variants should be\ less likely to be associated with severe genetic diseases due to the\ effects of natural selection,\ following the view that deleterious variants are not likely to become\ common in the population.\ However, the significance of any particular variant should be interpreted\ only by a trained medical geneticist using all available information.\

\ \ The remainder of this page is identical on the following tracks:\
    \
  • Common SNPs(134)\
  • Flagged SNPs(134)\
  • Mult. SNPs(134)\
  • All SNPs(134)\
\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ On the track controls page, SNPs can be colored and/or filtered from the \ display according to several attributes:\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/-\
    • No Variation - the submission reports an invariant region in the surveyed sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - submitted by HapMap project (human only)\
    • By 1000Genomes - submitted by 1000Genomes project (human only)\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation is 3' to and within 500 bases of a\ transcript, or is 5' to and within 2000 bases of a transcript\ (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to the reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to the reference assembly (nonsense, missense, \ frameshift, cds-indel, coding-synonymy-unknown)\
    • Untranslated - variation is in a transcript, but not in a coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation is in an intron, but not in the first two or\ last two bases of the intron\
    • Splice Site - variation is in the first two or last two bases\ of an intron (splice-3, splice-5)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Unusual Conditions (UCSC): UCSC checks for several anomalies \ that may indicate a problem with the mapping, and reports them in the \ Annotations section of the SNP details page if found:\
      \
    • AlleleFreqSumNot1 - Allele frequencies do not sum\ to 1.0 (+-0.01). This SNP's allele frequency data are\ \ probably incomplete.
    • \
    • DuplicateObserved,\ MixedObserved - Multiple distinct insertion SNPs have \ \ been mapped to this location, with either the same inserted \ \ sequence (Duplicate) or different inserted sequence (Mixed).
    • \
    • FlankMismatchGenomeEqual,\ \ FlankMismatchGenomeLonger,\ \ FlankMismatchGenomeShorter - NCBI's alignment of\ the flanking sequences had at least one mismatch or gap\ \ near the mapped SNP position.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • MultipleAlignments - This SNP's flanking sequences \ align to more than one location in the reference assembly.
    • \
    • NamedDeletionZeroSpan - A deletion (from the\ genome) was observed but the annotation spans 0 bases.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • NamedInsertionNonzeroSpan - An insertion (into the\ genome) was observed but the annotation spans more than 0\ bases. (UCSC's re-alignment of flanking sequences to the\ genome may be informative.)
    • \
    • NonIntegerChromCount - At least one allele\ frequency corresponds to a non-integer (+-0.010000) count of\ chromosomes on which the allele was observed. The reported\ total sample count for this SNP is probably incorrect.
    • \
    • ObservedContainsIupac - At least one observed allele \ from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N).
    • \
    • ObservedMismatch - UCSC reference allele does not\ match any observed allele from dbSNP. This is tested only\ \ for SNPs whose class is single, in-del, insertion, deletion,\ \ mnp or mixed.
    • \
    • ObservedTooLong - Observed allele not given (length\ too long).
    • \
    • ObservedWrongFormat - Observed allele(s) from dbSNP\ have unexpected format for the given class.
    • \
    • RefAlleleMismatch - The reference allele from dbSNP\ does not match the UCSC reference allele, i.e., the bases in\ \ the mapped position range.
    • \
    • RefAlleleRevComp - The reference allele from dbSNP\ matches the reverse complement of the UCSC reference\ allele.
    • \
    • SingleClassLongerSpan - All observed alleles are\ single-base, but the annotation spans more than 1 base.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • SingleClassZeroSpan - All observed alleles are\ single-base, but the annotation spans 0 bases. (UCSC's\ re-alignment of flanking sequences to the genome may be\ informative.)
    • \
    \ Another condition, which does not necessarily imply any problem,\ is noted:\
      \
    • SingleClassTriAllelic, SingleClassQuadAllelic - \ Class is single and three or four different bases have been\ \ observed (usually there are only two).
    • \
    \
  • \
  • \ \ Miscellaneous Attributes (dbSNP): several properties extracted\ from dbSNP's SNP_bitfield table\ (see dbSNP_BitField_v5.pdf for details)\
      \
    • Clinically Associated - SNP is in OMIM/OMIA and/or at \ \ least one submitter is a Locus-Specific Database. This does\ \ not necessarily imply that the variant causes any disease,\ \ only that it has been observed in clinical studies.
    • \
    • Appears in OMIM/OMIA - SNP is mentioned in \ \ Online Mendelian Inheritance in Man for \ \ human SNPs, or Online Mendelian Inheritance in Animals for \ \ non-human animal SNPs. Some of these SNPs are quite common,\ \ others are known to cause disease; see OMIM/OMIA for more\ \ information.
    • \
    • Has Microattribution/Third-Party Annotation - At least\ \ one of the SNP's submitters studied this SNP in a biomedical\ \ setting, but is not a Locus-Specific Database or OMIM/OMIA.
    • \
    • Submitted by Locus-Specific Database - At least one of\ \ the SNP's submitters is associated with a database of variants\ \ associated with a particular gene. These variants may or may\ \ not be known to be causative.
    • \
    • MAF >= 5% in Some Population - Minor Allele Frequency is \ \ at least 5% in at least one population assayed.
    • \
    • MAF >= 5% in All Populations - Minor Allele Frequency is \ \ at least 5% in all populations assayed.
    • \
    • Genotype Conflict - Quality check: different genotypes \ \ have been submitted for the same individual.
    • \
    • Ref SNP Cluster has Non-overlapping Alleles - Quality\ \ check: this reference SNP was clustered from submitted SNPs\ \ with non-overlapping sets of observed alleles.
    • \
    • Some Assembly's Allele Does Not Match Observed - \ \ Quality check: at least one assembly mapped by dbSNP has an allele\ at the mapped position that is not present in this SNP's observed\ alleles.
    • \
    \
  • \
\ Several other properties do not have coloring options, but do have \ some filtering options:\
    \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 1\ on all tracks except the Mult. SNPs track, which defaults to 3.\ \
    \
  • \
  • \ \ Submitter handles: These are short, single-word identifiers of\ labs or consortia that submitted SNPs that were clustered into this\ reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs\ have been observed by many different submitters, and some by only a\ single submitter (although that single submitter may have tested a\ large number of samples).\
  • \
  • \ \ AlleleFrequencies: Some submissions to dbSNP include \ allele frequencies and the study's sample size \ (i.e., the number of distinct chromosomes, which is two times the\ number of individuals assayed, a.k.a. 2N). dbSNP combines all\ available frequencies and counts from submitted SNPs that are \ clustered together into a reference SNP.\
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to \ particular gene sets. Choose the gene sets from the list on the SNP \ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to. \ When one or more gene tracks are selected, the SNP details page \ lists all genes that the SNP hits (or is close to), with the same keywords \ used in the function category. The function usually \ agrees with NCBI's function, except when NCBI's functional annotation is \ relative to an XM_* predicted RefSeq (not included in the UCSC Genome \ Browser's RefSeq Genes track) and/or UCSC's functional annotation is \ relative to a transcript that is not in RefSeq.\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\

\ Non-repetitive genomic sequence is shown in upper case like the flanking \ sequence, and a "|" indicates each match between genomic and flanking bases.\ Repetitive genomic sequence (annotated by RepeatMasker and/or the\ Tandem Repeats Finder with period <= 12) is shown in lower case, and matching\ bases are indicated by a "+".\

\ \

Data Sources and Methods

\ \

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g., for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b134_SNPContigLoc_37_2.bcp.gz and \ b134_ContigInfo_37_2.bcp.gz.
  • \
  • b134_SNPMapInfo_37_2.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b134_SNPContigLocusId_37_2.bcp.gz.
  • \
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.
  • \
  • SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies.
  • \
  • Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and \ SNPSubSNPLink.bcp.gz.
  • \
  • SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP,\ such as clinically-associated. See the document \ dbSNP_BitField_v5.pdf for details.
  • \
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.
  • \
\ \

Orthologous Alleles (human assemblies only)

\

\ For the human assembly, we provide a related table that contains\ orthologous alleles in the chimpanzee, orangutan and rhesus macaque\ reference genome assemblies. \ We use our liftOver utility to identify the orthologous alleles. \ The candidate human SNPs are a filtered list that meet the criteria:\

    \
  • class = 'single'
  • \
  • mapped position in the human reference genome is one base long
  • \
  • aligned to only one location in the human reference genome
  • \
  • not aligned to a chrN_random chrom
  • \
  • biallelic (not tri- or quad-allelic)
  • \
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\

\ \ varRep 1 defaultGeneTracks knownGene\ group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP 134) Found in >= 1% of Samples\ maxWindowToDraw 10000000\ priority 99.0916\ shortLabel Common SNPs(134)\ snpExceptionDesc snp134ExceptionDesc\ snpSeq snp134Seq\ track snp134Common\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility dense\ snp134Flagged Flagged SNPs(134) bed 6 + Simple Nucleotide Polymorphisms (dbSNP 134) Flagged by dbSNP as Clinically Assoc 0 99.0917 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about a subset of the \ single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 134, available from\ ftp.ncbi.nih.gov/snp.\ Only SNPs flagged as clinically associated by dbSNP, \ mapped to a single location in the reference genome assembly, and \ not known to have a minor allele frequency of at \ least 1%, are included in this subset.\ Frequency data are not available for all SNPs, so this subset probably\ includes some SNPs whose true minor allele frequency is 1% or greater.\

\

\ The significance of any particular variant in this track should be\ interpreted only by a trained medical geneticist using all available\ information. For example, some variants are included in this track\ because of their inclusion in a Locus-Specific Database (LSDB) or\ mention in OMIM, but are not thought to be disease-causing, so\ inclusion of a variant in this track is not necessarily an indicator\ of risk. Again, all available information must be carefully considered\ by a qualified professional.\

\ \ The remainder of this page is identical on the following tracks:\
    \
  • Common SNPs(134)\
  • Flagged SNPs(134)\
  • Mult. SNPs(134)\
  • All SNPs(134)\
\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ On the track controls page, SNPs can be colored and/or filtered from the \ display according to several attributes:\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/-\
    • No Variation - the submission reports an invariant region in the surveyed sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - submitted by HapMap project (human only)\
    • By 1000Genomes - submitted by 1000Genomes project (human only)\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation is 3' to and within 500 bases of a\ transcript, or is 5' to and within 2000 bases of a transcript\ (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to the reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to the reference assembly (nonsense, missense, \ frameshift, cds-indel, coding-synonymy-unknown)\
    • Untranslated - variation is in a transcript, but not in a coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation is in an intron, but not in the first two or\ last two bases of the intron\
    • Splice Site - variation is in the first two or last two bases\ of an intron (splice-3, splice-5)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Unusual Conditions (UCSC): UCSC checks for several anomalies \ that may indicate a problem with the mapping, and reports them in the \ Annotations section of the SNP details page if found:\
      \
    • AlleleFreqSumNot1 - Allele frequencies do not sum\ to 1.0 (+-0.01). This SNP's allele frequency data are\ \ probably incomplete.
    • \
    • DuplicateObserved,\ MixedObserved - Multiple distinct insertion SNPs have \ \ been mapped to this location, with either the same inserted \ \ sequence (Duplicate) or different inserted sequence (Mixed).
    • \
    • FlankMismatchGenomeEqual,\ \ FlankMismatchGenomeLonger,\ \ FlankMismatchGenomeShorter - NCBI's alignment of\ the flanking sequences had at least one mismatch or gap\ \ near the mapped SNP position.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • MultipleAlignments - This SNP's flanking sequences \ align to more than one location in the reference assembly.
    • \
    • NamedDeletionZeroSpan - A deletion (from the\ genome) was observed but the annotation spans 0 bases.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • NamedInsertionNonzeroSpan - An insertion (into the\ genome) was observed but the annotation spans more than 0\ bases. (UCSC's re-alignment of flanking sequences to the\ genome may be informative.)
    • \
    • NonIntegerChromCount - At least one allele\ frequency corresponds to a non-integer (+-0.010000) count of\ chromosomes on which the allele was observed. The reported\ total sample count for this SNP is probably incorrect.
    • \
    • ObservedContainsIupac - At least one observed allele \ from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N).
    • \
    • ObservedMismatch - UCSC reference allele does not\ match any observed allele from dbSNP. This is tested only\ \ for SNPs whose class is single, in-del, insertion, deletion,\ \ mnp or mixed.
    • \
    • ObservedTooLong - Observed allele not given (length\ too long).
    • \
    • ObservedWrongFormat - Observed allele(s) from dbSNP\ have unexpected format for the given class.
    • \
    • RefAlleleMismatch - The reference allele from dbSNP\ does not match the UCSC reference allele, i.e., the bases in\ \ the mapped position range.
    • \
    • RefAlleleRevComp - The reference allele from dbSNP\ matches the reverse complement of the UCSC reference\ allele.
    • \
    • SingleClassLongerSpan - All observed alleles are\ single-base, but the annotation spans more than 1 base.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • SingleClassZeroSpan - All observed alleles are\ single-base, but the annotation spans 0 bases. (UCSC's\ re-alignment of flanking sequences to the genome may be\ informative.)
    • \
    \ Another condition, which does not necessarily imply any problem,\ is noted:\
      \
    • SingleClassTriAllelic, SingleClassQuadAllelic - \ Class is single and three or four different bases have been\ \ observed (usually there are only two).
    • \
    \
  • \
  • \ \ Miscellaneous Attributes (dbSNP): several properties extracted\ from dbSNP's SNP_bitfield table\ (see dbSNP_BitField_v5.pdf for details)\
      \
    • Clinically Associated - SNP is in OMIM/OMIA and/or at \ \ least one submitter is a Locus-Specific Database. This does\ \ not necessarily imply that the variant causes any disease,\ \ only that it has been observed in clinical studies.
    • \
    • Appears in OMIM/OMIA - SNP is mentioned in \ \ Online Mendelian Inheritance in Man for \ \ human SNPs, or Online Mendelian Inheritance in Animals for \ \ non-human animal SNPs. Some of these SNPs are quite common,\ \ others are known to cause disease; see OMIM/OMIA for more\ \ information.
    • \
    • Has Microattribution/Third-Party Annotation - At least\ \ one of the SNP's submitters studied this SNP in a biomedical\ \ setting, but is not a Locus-Specific Database or OMIM/OMIA.
    • \
    • Submitted by Locus-Specific Database - At least one of\ \ the SNP's submitters is associated with a database of variants\ \ associated with a particular gene. These variants may or may\ \ not be known to be causative.
    • \
    • MAF >= 5% in Some Population - Minor Allele Frequency is \ \ at least 5% in at least one population assayed.
    • \
    • MAF >= 5% in All Populations - Minor Allele Frequency is \ \ at least 5% in all populations assayed.
    • \
    • Genotype Conflict - Quality check: different genotypes \ \ have been submitted for the same individual.
    • \
    • Ref SNP Cluster has Non-overlapping Alleles - Quality\ \ check: this reference SNP was clustered from submitted SNPs\ \ with non-overlapping sets of observed alleles.
    • \
    • Some Assembly's Allele Does Not Match Observed - \ \ Quality check: at least one assembly mapped by dbSNP has an allele\ at the mapped position that is not present in this SNP's observed\ alleles.
    • \
    \
  • \
\ Several other properties do not have coloring options, but do have \ some filtering options:\
    \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 1\ on all tracks except the Mult. SNPs track, which defaults to 3.\ \
    \
  • \
  • \ \ Submitter handles: These are short, single-word identifiers of\ labs or consortia that submitted SNPs that were clustered into this\ reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs\ have been observed by many different submitters, and some by only a\ single submitter (although that single submitter may have tested a\ large number of samples).\
  • \
  • \ \ AlleleFrequencies: Some submissions to dbSNP include \ allele frequencies and the study's sample size \ (i.e., the number of distinct chromosomes, which is two times the\ number of individuals assayed, a.k.a. 2N). dbSNP combines all\ available frequencies and counts from submitted SNPs that are \ clustered together into a reference SNP.\
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to \ particular gene sets. Choose the gene sets from the list on the SNP \ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to. \ When one or more gene tracks are selected, the SNP details page \ lists all genes that the SNP hits (or is close to), with the same keywords \ used in the function category. The function usually \ agrees with NCBI's function, except when NCBI's functional annotation is \ relative to an XM_* predicted RefSeq (not included in the UCSC Genome \ Browser's RefSeq Genes track) and/or UCSC's functional annotation is \ relative to a transcript that is not in RefSeq.\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\

\ Non-repetitive genomic sequence is shown in upper case like the flanking \ sequence, and a "|" indicates each match between genomic and flanking bases.\ Repetitive genomic sequence (annotated by RepeatMasker and/or the\ Tandem Repeats Finder with period <= 12) is shown in lower case, and matching\ bases are indicated by a "+".\

\ \

Data Sources and Methods

\ \

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g., for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b134_SNPContigLoc_37_2.bcp.gz and \ b134_ContigInfo_37_2.bcp.gz.
  • \
  • b134_SNPMapInfo_37_2.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b134_SNPContigLocusId_37_2.bcp.gz.
  • \
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.
  • \
  • SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies.
  • \
  • Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and \ SNPSubSNPLink.bcp.gz.
  • \
  • SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP,\ such as clinically-associated. See the document \ dbSNP_BitField_v5.pdf for details.
  • \
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.
  • \
\ \

Orthologous Alleles (human assemblies only)

\

\ For the human assembly, we provide a related table that contains\ orthologous alleles in the chimpanzee, orangutan and rhesus macaque\ reference genome assemblies. \ We use our liftOver utility to identify the orthologous alleles. \ The candidate human SNPs are a filtered list that meet the criteria:\

    \
  • class = 'single'
  • \
  • mapped position in the human reference genome is one base long
  • \
  • aligned to only one location in the human reference genome
  • \
  • not aligned to a chrN_random chrom
  • \
  • biallelic (not tri- or quad-allelic)
  • \
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\

\ \ varRep 1 defaultGeneTracks knownGene\ group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP 134) Flagged by dbSNP as Clinically Assoc\ priority 99.0917\ shortLabel Flagged SNPs(134)\ snpExceptionDesc snp134ExceptionDesc\ snpSeq snp134Seq\ track snp134Flagged\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility hide\ snp134Mult Mult. SNPs(134) bed 6 + Simple Nucleotide Polymorphisms (dbSNP 134) That Map to Multiple Genomic Loci 0 99.0918 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about a subset of the \ single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 134, available from\ ftp.ncbi.nih.gov/snp.\ Only SNPs that have been mapped to multiple locations in the reference\ genome assembly are included in this subset. When a SNP's flanking sequences \ map to multiple locations in the reference genome, it calls into question \ whether there is true variation at those sites, or whether the sequences\ at those sites are merely highly similar but not identical.\

\

\ The default maximum weight for this track is 3,\ unlike the other dbSNP build 134 tracks which have a maximum weight of 1. \ That enables these multiply-mapped SNPs to appear in the display, while \ by default they will not appear in the All SNPs(134) track because of its \ maximum weight filter.\

\ \ The remainder of this page is identical on the following tracks:\
    \
  • Common SNPs(134)\
  • Flagged SNPs(134)\
  • Mult. SNPs(134)\
  • All SNPs(134)\
\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ On the track controls page, SNPs can be colored and/or filtered from the \ display according to several attributes:\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/-\
    • No Variation - the submission reports an invariant region in the surveyed sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - submitted by HapMap project (human only)\
    • By 1000Genomes - submitted by 1000Genomes project (human only)\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation is 3' to and within 500 bases of a\ transcript, or is 5' to and within 2000 bases of a transcript\ (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to the reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to the reference assembly (nonsense, missense, \ frameshift, cds-indel, coding-synonymy-unknown)\
    • Untranslated - variation is in a transcript, but not in a coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation is in an intron, but not in the first two or\ last two bases of the intron\
    • Splice Site - variation is in the first two or last two bases\ of an intron (splice-3, splice-5)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Unusual Conditions (UCSC): UCSC checks for several anomalies \ that may indicate a problem with the mapping, and reports them in the \ Annotations section of the SNP details page if found:\
      \
    • AlleleFreqSumNot1 - Allele frequencies do not sum\ to 1.0 (+-0.01). This SNP's allele frequency data are\ \ probably incomplete.
    • \
    • DuplicateObserved,\ MixedObserved - Multiple distinct insertion SNPs have \ \ been mapped to this location, with either the same inserted \ \ sequence (Duplicate) or different inserted sequence (Mixed).
    • \
    • FlankMismatchGenomeEqual,\ \ FlankMismatchGenomeLonger,\ \ FlankMismatchGenomeShorter - NCBI's alignment of\ the flanking sequences had at least one mismatch or gap\ \ near the mapped SNP position.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • MultipleAlignments - This SNP's flanking sequences \ align to more than one location in the reference assembly.
    • \
    • NamedDeletionZeroSpan - A deletion (from the\ genome) was observed but the annotation spans 0 bases.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • NamedInsertionNonzeroSpan - An insertion (into the\ genome) was observed but the annotation spans more than 0\ bases. (UCSC's re-alignment of flanking sequences to the\ genome may be informative.)
    • \
    • NonIntegerChromCount - At least one allele\ frequency corresponds to a non-integer (+-0.010000) count of\ chromosomes on which the allele was observed. The reported\ total sample count for this SNP is probably incorrect.
    • \
    • ObservedContainsIupac - At least one observed allele \ from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N).
    • \
    • ObservedMismatch - UCSC reference allele does not\ match any observed allele from dbSNP. This is tested only\ \ for SNPs whose class is single, in-del, insertion, deletion,\ \ mnp or mixed.
    • \
    • ObservedTooLong - Observed allele not given (length\ too long).
    • \
    • ObservedWrongFormat - Observed allele(s) from dbSNP\ have unexpected format for the given class.
    • \
    • RefAlleleMismatch - The reference allele from dbSNP\ does not match the UCSC reference allele, i.e., the bases in\ \ the mapped position range.
    • \
    • RefAlleleRevComp - The reference allele from dbSNP\ matches the reverse complement of the UCSC reference\ allele.
    • \
    • SingleClassLongerSpan - All observed alleles are\ single-base, but the annotation spans more than 1 base.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • SingleClassZeroSpan - All observed alleles are\ single-base, but the annotation spans 0 bases. (UCSC's\ re-alignment of flanking sequences to the genome may be\ informative.)
    • \
    \ Another condition, which does not necessarily imply any problem,\ is noted:\
      \
    • SingleClassTriAllelic, SingleClassQuadAllelic - \ Class is single and three or four different bases have been\ \ observed (usually there are only two).
    • \
    \
  • \
  • \ \ Miscellaneous Attributes (dbSNP): several properties extracted\ from dbSNP's SNP_bitfield table\ (see dbSNP_BitField_v5.pdf for details)\
      \
    • Clinically Associated - SNP is in OMIM/OMIA and/or at \ \ least one submitter is a Locus-Specific Database. This does\ \ not necessarily imply that the variant causes any disease,\ \ only that it has been observed in clinical studies.
    • \
    • Appears in OMIM/OMIA - SNP is mentioned in \ \ Online Mendelian Inheritance in Man for \ \ human SNPs, or Online Mendelian Inheritance in Animals for \ \ non-human animal SNPs. Some of these SNPs are quite common,\ \ others are known to cause disease; see OMIM/OMIA for more\ \ information.
    • \
    • Has Microattribution/Third-Party Annotation - At least\ \ one of the SNP's submitters studied this SNP in a biomedical\ \ setting, but is not a Locus-Specific Database or OMIM/OMIA.
    • \
    • Submitted by Locus-Specific Database - At least one of\ \ the SNP's submitters is associated with a database of variants\ \ associated with a particular gene. These variants may or may\ \ not be known to be causative.
    • \
    • MAF >= 5% in Some Population - Minor Allele Frequency is \ \ at least 5% in at least one population assayed.
    • \
    • MAF >= 5% in All Populations - Minor Allele Frequency is \ \ at least 5% in all populations assayed.
    • \
    • Genotype Conflict - Quality check: different genotypes \ \ have been submitted for the same individual.
    • \
    • Ref SNP Cluster has Non-overlapping Alleles - Quality\ \ check: this reference SNP was clustered from submitted SNPs\ \ with non-overlapping sets of observed alleles.
    • \
    • Some Assembly's Allele Does Not Match Observed - \ \ Quality check: at least one assembly mapped by dbSNP has an allele\ at the mapped position that is not present in this SNP's observed\ alleles.
    • \
    \
  • \
\ Several other properties do not have coloring options, but do have \ some filtering options:\
    \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 1\ on all tracks except the Mult. SNPs track, which defaults to 3.\ \
    \
  • \
  • \ \ Submitter handles: These are short, single-word identifiers of\ labs or consortia that submitted SNPs that were clustered into this\ reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs\ have been observed by many different submitters, and some by only a\ single submitter (although that single submitter may have tested a\ large number of samples).\
  • \
  • \ \ AlleleFrequencies: Some submissions to dbSNP include \ allele frequencies and the study's sample size \ (i.e., the number of distinct chromosomes, which is two times the\ number of individuals assayed, a.k.a. 2N). dbSNP combines all\ available frequencies and counts from submitted SNPs that are \ clustered together into a reference SNP.\
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to \ particular gene sets. Choose the gene sets from the list on the SNP \ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to. \ When one or more gene tracks are selected, the SNP details page \ lists all genes that the SNP hits (or is close to), with the same keywords \ used in the function category. The function usually \ agrees with NCBI's function, except when NCBI's functional annotation is \ relative to an XM_* predicted RefSeq (not included in the UCSC Genome \ Browser's RefSeq Genes track) and/or UCSC's functional annotation is \ relative to a transcript that is not in RefSeq.\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\

\ Non-repetitive genomic sequence is shown in upper case like the flanking \ sequence, and a "|" indicates each match between genomic and flanking bases.\ Repetitive genomic sequence (annotated by RepeatMasker and/or the\ Tandem Repeats Finder with period <= 12) is shown in lower case, and matching\ bases are indicated by a "+".\

\ \

Data Sources and Methods

\ \

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g., for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b134_SNPContigLoc_37_2.bcp.gz and \ b134_ContigInfo_37_2.bcp.gz.
  • \
  • b134_SNPMapInfo_37_2.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b134_SNPContigLocusId_37_2.bcp.gz.
  • \
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.
  • \
  • SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies.
  • \
  • Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and \ SNPSubSNPLink.bcp.gz.
  • \
  • SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP,\ such as clinically-associated. See the document \ dbSNP_BitField_v5.pdf for details.
  • \
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.
  • \
\ \

Orthologous Alleles (human assemblies only)

\

\ For the human assembly, we provide a related table that contains\ orthologous alleles in the chimpanzee, orangutan and rhesus macaque\ reference genome assemblies. \ We use our liftOver utility to identify the orthologous alleles. \ The candidate human SNPs are a filtered list that meet the criteria:\

    \
  • class = 'single'
  • \
  • mapped position in the human reference genome is one base long
  • \
  • aligned to only one location in the human reference genome
  • \
  • not aligned to a chrN_random chrom
  • \
  • biallelic (not tri- or quad-allelic)
  • \
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\

\ \ varRep 1 defaultGeneTracks knownGene\ defaultMaxWeight 3\ group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP 134) That Map to Multiple Genomic Loci\ maxWindowToDraw 10000000\ priority 99.0918\ shortLabel Mult. SNPs(134)\ snpExceptionDesc snp134ExceptionDesc\ snpSeq snp134Seq\ track snp134Mult\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility hide\ snp134 All SNPs(134) bed 6 + Simple Nucleotide Polymorphisms (dbSNP 134) 0 99.0919 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 134, available from\ ftp.ncbi.nih.gov/snp.\

\

\ Three tracks contain subsets of the items in this track:\

    \
  • Common SNPs(134): SNPs that have a minor allele frequency\ of at least 1% and are mapped to a single location in the reference\ genome assembly. Frequency data are not available for all SNPs,\ so this subset is incomplete.
  • \
  • Flagged SNPs(134): SNPs flagged as clinically associated by dbSNP, \ mapped to a single location in the reference genome assembly, and \ not known to have a minor allele frequency of at least 1%.\ Frequency data are not available for all SNPs, so this subset may\ include some SNPs whose true minor allele frequency is 1% or greater.
  • \
  • Mult. SNPs(134): SNPs that have been mapped to multiple locations\ in the reference genome assembly.
  • \
\

\

\ The default maximum weight for this track is 1, so unless\ the setting is changed in the track controls, SNPs that map to multiple genomic \ locations will be omitted from display. When a SNP's flanking sequences \ map to multiple locations in the reference genome, it calls into question \ whether there is true variation at those sites, or whether the sequences\ at those sites are merely highly similar but not identical.\

\ \ The remainder of this page is identical on the following tracks:\
    \
  • Common SNPs(134)\
  • Flagged SNPs(134)\
  • Mult. SNPs(134)\
  • All SNPs(134)\
\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ On the track controls page, SNPs can be colored and/or filtered from the \ display according to several attributes:\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/-\
    • No Variation - the submission reports an invariant region in the surveyed sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - submitted by HapMap project (human only)\
    • By 1000Genomes - submitted by 1000Genomes project (human only)\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation is 3' to and within 500 bases of a\ transcript, or is 5' to and within 2000 bases of a transcript\ (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to the reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to the reference assembly (nonsense, missense, \ frameshift, cds-indel, coding-synonymy-unknown)\
    • Untranslated - variation is in a transcript, but not in a coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation is in an intron, but not in the first two or\ last two bases of the intron\
    • Splice Site - variation is in the first two or last two bases\ of an intron (splice-3, splice-5)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Unusual Conditions (UCSC): UCSC checks for several anomalies \ that may indicate a problem with the mapping, and reports them in the \ Annotations section of the SNP details page if found:\
      \
    • AlleleFreqSumNot1 - Allele frequencies do not sum\ to 1.0 (+-0.01). This SNP's allele frequency data are\ \ probably incomplete.
    • \
    • DuplicateObserved,\ MixedObserved - Multiple distinct insertion SNPs have \ \ been mapped to this location, with either the same inserted \ \ sequence (Duplicate) or different inserted sequence (Mixed).
    • \
    • FlankMismatchGenomeEqual,\ \ FlankMismatchGenomeLonger,\ \ FlankMismatchGenomeShorter - NCBI's alignment of\ the flanking sequences had at least one mismatch or gap\ \ near the mapped SNP position.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • MultipleAlignments - This SNP's flanking sequences \ align to more than one location in the reference assembly.
    • \
    • NamedDeletionZeroSpan - A deletion (from the\ genome) was observed but the annotation spans 0 bases.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • NamedInsertionNonzeroSpan - An insertion (into the\ genome) was observed but the annotation spans more than 0\ bases. (UCSC's re-alignment of flanking sequences to the\ genome may be informative.)
    • \
    • NonIntegerChromCount - At least one allele\ frequency corresponds to a non-integer (+-0.010000) count of\ chromosomes on which the allele was observed. The reported\ total sample count for this SNP is probably incorrect.
    • \
    • ObservedContainsIupac - At least one observed allele \ from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N).
    • \
    • ObservedMismatch - UCSC reference allele does not\ match any observed allele from dbSNP. This is tested only\ \ for SNPs whose class is single, in-del, insertion, deletion,\ \ mnp or mixed.
    • \
    • ObservedTooLong - Observed allele not given (length\ too long).
    • \
    • ObservedWrongFormat - Observed allele(s) from dbSNP\ have unexpected format for the given class.
    • \
    • RefAlleleMismatch - The reference allele from dbSNP\ does not match the UCSC reference allele, i.e., the bases in\ \ the mapped position range.
    • \
    • RefAlleleRevComp - The reference allele from dbSNP\ matches the reverse complement of the UCSC reference\ allele.
    • \
    • SingleClassLongerSpan - All observed alleles are\ single-base, but the annotation spans more than 1 base.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • SingleClassZeroSpan - All observed alleles are\ single-base, but the annotation spans 0 bases. (UCSC's\ re-alignment of flanking sequences to the genome may be\ informative.)
    • \
    \ Another condition, which does not necessarily imply any problem,\ is noted:\
      \
    • SingleClassTriAllelic, SingleClassQuadAllelic - \ Class is single and three or four different bases have been\ \ observed (usually there are only two).
    • \
    \
  • \
  • \ \ Miscellaneous Attributes (dbSNP): several properties extracted\ from dbSNP's SNP_bitfield table\ (see dbSNP_BitField_v5.pdf for details)\
      \
    • Clinically Associated - SNP is in OMIM/OMIA and/or at \ \ least one submitter is a Locus-Specific Database. This does\ \ not necessarily imply that the variant causes any disease,\ \ only that it has been observed in clinical studies.
    • \
    • Appears in OMIM/OMIA - SNP is mentioned in \ \ Online Mendelian Inheritance in Man for \ \ human SNPs, or Online Mendelian Inheritance in Animals for \ \ non-human animal SNPs. Some of these SNPs are quite common,\ \ others are known to cause disease; see OMIM/OMIA for more\ \ information.
    • \
    • Has Microattribution/Third-Party Annotation - At least\ \ one of the SNP's submitters studied this SNP in a biomedical\ \ setting, but is not a Locus-Specific Database or OMIM/OMIA.
    • \
    • Submitted by Locus-Specific Database - At least one of\ \ the SNP's submitters is associated with a database of variants\ \ associated with a particular gene. These variants may or may\ \ not be known to be causative.
    • \
    • MAF >= 5% in Some Population - Minor Allele Frequency is \ \ at least 5% in at least one population assayed.
    • \
    • MAF >= 5% in All Populations - Minor Allele Frequency is \ \ at least 5% in all populations assayed.
    • \
    • Genotype Conflict - Quality check: different genotypes \ \ have been submitted for the same individual.
    • \
    • Ref SNP Cluster has Non-overlapping Alleles - Quality\ \ check: this reference SNP was clustered from submitted SNPs\ \ with non-overlapping sets of observed alleles.
    • \
    • Some Assembly's Allele Does Not Match Observed - \ \ Quality check: at least one assembly mapped by dbSNP has an allele\ at the mapped position that is not present in this SNP's observed\ alleles.
    • \
    \
  • \
\ Several other properties do not have coloring options, but do have \ some filtering options:\
    \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 1\ on all tracks except the Mult. SNPs track, which defaults to 3.\ \
    \
  • \
  • \ \ Submitter handles: These are short, single-word identifiers of\ labs or consortia that submitted SNPs that were clustered into this\ reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs\ have been observed by many different submitters, and some by only a\ single submitter (although that single submitter may have tested a\ large number of samples).\
  • \
  • \ \ AlleleFrequencies: Some submissions to dbSNP include \ allele frequencies and the study's sample size \ (i.e., the number of distinct chromosomes, which is two times the\ number of individuals assayed, a.k.a. 2N). dbSNP combines all\ available frequencies and counts from submitted SNPs that are \ clustered together into a reference SNP.\
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to \ particular gene sets. Choose the gene sets from the list on the SNP \ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to. \ When one or more gene tracks are selected, the SNP details page \ lists all genes that the SNP hits (or is close to), with the same keywords \ used in the function category. The function usually \ agrees with NCBI's function, except when NCBI's functional annotation is \ relative to an XM_* predicted RefSeq (not included in the UCSC Genome \ Browser's RefSeq Genes track) and/or UCSC's functional annotation is \ relative to a transcript that is not in RefSeq.\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\

\ Non-repetitive genomic sequence is shown in upper case like the flanking \ sequence, and a "|" indicates each match between genomic and flanking bases.\ Repetitive genomic sequence (annotated by RepeatMasker and/or the\ Tandem Repeats Finder with period <= 12) is shown in lower case, and matching\ bases are indicated by a "+".\

\ \

Data Sources and Methods

\ \

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g., for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b134_SNPContigLoc_37_2.bcp.gz and \ b134_ContigInfo_37_2.bcp.gz.
  • \
  • b134_SNPMapInfo_37_2.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b134_SNPContigLocusId_37_2.bcp.gz.
  • \
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.
  • \
  • SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies.
  • \
  • Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and \ SNPSubSNPLink.bcp.gz.
  • \
  • SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP,\ such as clinically-associated. See the document \ dbSNP_BitField_v5.pdf for details.
  • \
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.
  • \
\ \

Orthologous Alleles (human assemblies only)

\

\ For the human assembly, we provide a related table that contains\ orthologous alleles in the chimpanzee, orangutan and rhesus macaque\ reference genome assemblies. \ We use our liftOver utility to identify the orthologous alleles. \ The candidate human SNPs are a filtered list that meet the criteria:\

    \
  • class = 'single'
  • \
  • mapped position in the human reference genome is one base long
  • \
  • aligned to only one location in the human reference genome
  • \
  • not aligned to a chrN_random chrom
  • \
  • biallelic (not tri- or quad-allelic)
  • \
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\

\ \ varRep 1 defaultGeneTracks knownGene\ group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP 134)\ maxWindowToDraw 10000000\ priority 99.0919\ shortLabel All SNPs(134)\ track snp134\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility hide\ snp132Common Common SNPs(132) bed 6 + Simple Nucleotide Polymorphisms (dbSNP 132) Found in >= 1% of Samples 1 99.0921 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about a subset of the \ single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 132, available from\ ftp.ncbi.nih.gov/snp.\ Only SNPs that have a minor allele frequency of at least 1% and\ are mapped to a single location in the reference genome assembly are\ included in this subset. Frequency data are not available for all SNPs,\ so this subset is incomplete.\

\

\ The selection of SNPs with a minor allele frequency of 1% or greater\ is an attempt to identify variants that appear to be reasonably common\ in the general population. Taken as a set, common variants should be\ less likely to be associated with severe genetic diseases due to the\ effects of natural selection,\ following the view that deleterious variants are not likely to become\ common in the population.\ However, the significance of any particular variant should be interpreted\ only by a trained medical geneticist using all available information.\

\ \ The remainder of this page is identical on the following tracks:\
    \
  • Common SNPs(132)\
  • Flagged SNPs(132)\
  • Mult. SNPs(132)\
  • All SNPs(132)\
\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ On the track controls page, SNPs can be colored and/or filtered from the \ display according to several attributes:\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/-\
    • No Variation - the submission reports an invariant region in the surveyed sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - submitted by HapMap project (human only)\
    • By 1000Genomes - submitted by 1000Genomes project (human only)\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation is 3' to and within 500 bases of a\ transcript, or is 5' to and within 2000 bases of a transcript\ (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to the reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to the reference assembly (nonsense, missense, \ frameshift, cds-indel, coding-synonymy-unknown)\
    • Untranslated - variation is in a transcript, but not in a coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation is in an intron, but not in the first two or\ last two bases of the intron\
    • Splice Site - variation is in the first two or last two bases\ of an intron (splice-3, splice-5)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Unusual Conditions (UCSC): UCSC checks for several anomalies \ that may indicate a problem with the mapping, and reports them in the \ Annotations section of the SNP details page if found:\
      \
    • AlleleFreqSumNot1 - Allele frequencies do not sum\ to 1.0 (+-0.01). This SNP's allele frequency data are\ \ probably incomplete.
    • \
    • DuplicateObserved,\ MixedObserved - Multiple distinct insertion SNPs have \ \ been mapped to this location, with either the same inserted \ \ sequence (Duplicate) or different inserted sequence (Mixed).
    • \
    • FlankMismatchGenomeEqual,\ \ FlankMismatchGenomeLonger,\ \ FlankMismatchGenomeShorter - NCBI's alignment of\ the flanking sequences had at least one mismatch or gap\ \ near the mapped SNP position.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • MultipleAlignments - This SNP's flanking sequences \ align to more than one location in the reference assembly.
    • \
    • NamedDeletionZeroSpan - A deletion (from the\ genome) was observed but the annotation spans 0 bases.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • NamedInsertionNonzeroSpan - An insertion (into the\ genome) was observed but the annotation spans more than 0\ bases. (UCSC's re-alignment of flanking sequences to the\ genome may be informative.)
    • \
    • NonIntegerChromCount - At least one allele\ frequency corresponds to a non-integer (+-0.010000) count of\ chromosomes on which the allele was observed. The reported\ total sample count for this SNP is probably incorrect.
    • \
    • ObservedContainsIupac - At least one observed allele \ from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N).
    • \
    • ObservedMismatch - UCSC reference allele does not\ match any observed allele from dbSNP. This is tested only\ \ for SNPs whose class is single, in-del, insertion, deletion,\ \ mnp or mixed.
    • \
    • ObservedTooLong - Observed allele not given (length\ too long).
    • \
    • ObservedWrongFormat - Observed allele(s) from dbSNP\ have unexpected format for the given class.
    • \
    • RefAlleleMismatch - The reference allele from dbSNP\ does not match the UCSC reference allele, i.e., the bases in\ \ the mapped position range.
    • \
    • RefAlleleRevComp - The reference allele from dbSNP\ matches the reverse complement of the UCSC reference\ allele.
    • \
    • SingleClassLongerSpan - All observed alleles are\ single-base, but the annotation spans more than 1 base.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • SingleClassZeroSpan - All observed alleles are\ single-base, but the annotation spans 0 bases. (UCSC's\ re-alignment of flanking sequences to the genome may be\ informative.)
    • \
    \ Another condition, which does not necessarily imply any problem,\ is noted:\
      \
    • SingleClassTriAllelic, SingleClassQuadAllelic - \ Class is single and three or four different bases have been\ \ observed (usually there are only two).
    • \
    \
  • \
  • \ \ Miscellaneous Attributes (dbSNP): several properties extracted\ from dbSNP's SNP_bitfield table\ (see dbSNP_BitField_v5.pdf for details)\
      \
    • Clinically Associated - SNP is in OMIM/OMIA and/or at \ \ least one submitter is a Locus-Specific Database. This does\ \ not necessarily imply that the variant causes any disease,\ \ only that it has been observed in clinical studies.
    • \
    • Appears in OMIM/OMIA - SNP is mentioned in \ \ Online Mendelian Inheritance in Man for \ \ human SNPs, or Online Mendelian Inheritance in Animals for \ \ non-human animal SNPs. Some of these SNPs are quite common,\ \ others are known to cause disease; see OMIM/OMIA for more\ \ information.
    • \
    • Has Microattribution/Third-Party Annotation - At least\ \ one of the SNP's submitters studied this SNP in a biomedical\ \ setting, but is not a Locus-Specific Database or OMIM/OMIA.
    • \
    • Submitted by Locus-Specific Database - At least one of\ \ the SNP's submitters is associated with a database of variants\ \ associated with a particular gene. These variants may or may\ \ not be known to be causative.
    • \
    • MAF >= 5% in Some Population - Minor Allele Frequency is \ \ at least 5% in at least one population assayed.\ \ Warning: this bit appears to have been set incorrectly for \ \ many SNPs in build 132
    • \
    • MAF >= 5% in All Populations - Minor Allele Frequency is \ \ at least 5% in all populations assayed.\ \ Warning: this bit appears to have been set incorrectly for\ \ some SNPs in build 132
    • \
    • Genotype Conflict - Quality check: different genotypes \ \ have been submitted for the same individual.
    • \
    • Ref SNP Cluster has Non-overlapping Alleles - Quality\ \ check: this reference SNP was clustered from submitted SNPs\ \ with non-overlapping sets of observed alleles.
    • \
    • Some Assembly's Allele Does Not Match Observed - \ \ Quality check: at least one assembly mapped by dbSNP has an allele\ at the mapped position that is not present in this SNP's observed\ alleles.
    • \
    \
  • \
\ Several other properties do not have coloring options, but do have \ some filtering options:\
    \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 1\ on all tracks except the Mult. SNPs track, which defaults to 3.\ \
    \
  • \
  • \ \ Submitter handles: These are short, single-word identifiers of\ labs or consortia that submitted SNPs that were clustered into this\ reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs\ have been observed by many different submitters, and some by only a\ single submitter (although that single submitter may have tested a\ large number of samples).\
  • \
  • \ \ AlleleFrequencies: Some submissions to dbSNP include \ allele frequencies and the study's sample size \ (i.e., the number of distinct chromosomes, which is two times the\ number of individuals assayed, a.k.a. 2N). dbSNP combines all\ available frequencies and counts from submitted SNPs that are \ clustered together into a reference SNP.\
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to \ particular gene sets. Choose the gene sets from the list on the SNP \ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to. \ When one or more gene tracks are selected, the SNP details page \ lists all genes that the SNP hits (or is close to), with the same keywords \ used in the function category. The function usually \ agrees with NCBI's function, except when NCBI's functional annotation is \ relative to an XM_* predicted RefSeq (not included in the UCSC Genome \ Browser's RefSeq Genes track) and/or UCSC's functional annotation is \ relative to a transcript that is not in RefSeq.\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\

\ Non-repetitive genomic sequence is shown in upper case like the flanking \ sequence, and a "|" indicates each match between genomic and flanking bases.\ Repetitive genomic sequence (annotated by RepeatMasker and/or the\ Tandem Repeats Finder with period <= 12) is shown in lower case, and matching\ bases are indicated by a "+".\

\ \

Data Sources and Methods

\ \

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g., for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b132_SNPContigLoc_37_1.bcp.gz and \ b132_ContigInfo_37_1.bcp.gz.
  • \
  • b132_SNPMapInfo_37_1.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b132_SNPContigLocusId_37_1.bcp.gz.
  • \
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.
  • \
  • SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies.
  • \
  • Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and \ SNPSubSNPLink.bcp.gz.
  • \
  • SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP,\ such as clinically-associated. See the document \ dbSNP_BitField_v5.pdf for details.
  • \
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.
  • \
\ \

Orthologous Alleles (human assemblies only)

\

\ For the human assembly, we provide a related table that contains\ orthologous alleles in the chimpanzee, orangutan and rhesus macaque\ reference genome assemblies. \ We use our liftOver utility to identify the orthologous alleles. \ The candidate human SNPs are a filtered list that meet the criteria:\

    \
  • class = 'single'
  • \
  • mapped position in the human reference genome is one base long
  • \
  • aligned to only one location in the human reference genome
  • \
  • not aligned to a chrN_random chrom
  • \
  • biallelic (not tri- or quad-allelic)
  • \
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\

\ \ varRep 1 defaultGeneTracks knownGene\ group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP 132) Found in >= 1% of Samples\ maxWindowToDraw 10000000\ priority 99.0921\ shortLabel Common SNPs(132)\ snpExceptionDesc snp132ExceptionDesc\ snpSeq snp132Seq\ track snp132Common\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility dense\ snp132Flagged Flagged SNPs(132) bed 6 + Simple Nucleotide Polymorphisms (dbSNP 132) Flagged by dbSNP as Clinically Assoc 0 99.0922 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about a subset of the \ single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 132, available from\ ftp.ncbi.nih.gov/snp.\ Only SNPs flagged as clinically associated by dbSNP, \ mapped to a single location in the reference genome assembly, and \ not known to have a minor allele frequency of at \ least 1%, are included in this subset.\ Frequency data are not available for all SNPs, so this subset probably\ includes some SNPs whose true minor allele frequency is 1% or greater.\

\

\ The significance of any particular variant in this track should be\ interpreted only by a trained medical geneticist using all available\ information. For example, some variants are included in this track\ because of their inclusion in a Locus-Specific Database (LSDB) or\ mention in OMIM, but are not thought to be disease-causing, so\ inclusion of a variant in this track is not necessarily an indicator\ of risk. Again, all available information must be carefully considered\ by a qualified professional.\

\ \ The remainder of this page is identical on the following tracks:\
    \
  • Common SNPs(132)\
  • Flagged SNPs(132)\
  • Mult. SNPs(132)\
  • All SNPs(132)\
\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ On the track controls page, SNPs can be colored and/or filtered from the \ display according to several attributes:\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/-\
    • No Variation - the submission reports an invariant region in the surveyed sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - submitted by HapMap project (human only)\
    • By 1000Genomes - submitted by 1000Genomes project (human only)\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation is 3' to and within 500 bases of a\ transcript, or is 5' to and within 2000 bases of a transcript\ (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to the reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to the reference assembly (nonsense, missense, \ frameshift, cds-indel, coding-synonymy-unknown)\
    • Untranslated - variation is in a transcript, but not in a coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation is in an intron, but not in the first two or\ last two bases of the intron\
    • Splice Site - variation is in the first two or last two bases\ of an intron (splice-3, splice-5)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Unusual Conditions (UCSC): UCSC checks for several anomalies \ that may indicate a problem with the mapping, and reports them in the \ Annotations section of the SNP details page if found:\
      \
    • AlleleFreqSumNot1 - Allele frequencies do not sum\ to 1.0 (+-0.01). This SNP's allele frequency data are\ \ probably incomplete.
    • \
    • DuplicateObserved,\ MixedObserved - Multiple distinct insertion SNPs have \ \ been mapped to this location, with either the same inserted \ \ sequence (Duplicate) or different inserted sequence (Mixed).
    • \
    • FlankMismatchGenomeEqual,\ \ FlankMismatchGenomeLonger,\ \ FlankMismatchGenomeShorter - NCBI's alignment of\ the flanking sequences had at least one mismatch or gap\ \ near the mapped SNP position.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • MultipleAlignments - This SNP's flanking sequences \ align to more than one location in the reference assembly.
    • \
    • NamedDeletionZeroSpan - A deletion (from the\ genome) was observed but the annotation spans 0 bases.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • NamedInsertionNonzeroSpan - An insertion (into the\ genome) was observed but the annotation spans more than 0\ bases. (UCSC's re-alignment of flanking sequences to the\ genome may be informative.)
    • \
    • NonIntegerChromCount - At least one allele\ frequency corresponds to a non-integer (+-0.010000) count of\ chromosomes on which the allele was observed. The reported\ total sample count for this SNP is probably incorrect.
    • \
    • ObservedContainsIupac - At least one observed allele \ from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N).
    • \
    • ObservedMismatch - UCSC reference allele does not\ match any observed allele from dbSNP. This is tested only\ \ for SNPs whose class is single, in-del, insertion, deletion,\ \ mnp or mixed.
    • \
    • ObservedTooLong - Observed allele not given (length\ too long).
    • \
    • ObservedWrongFormat - Observed allele(s) from dbSNP\ have unexpected format for the given class.
    • \
    • RefAlleleMismatch - The reference allele from dbSNP\ does not match the UCSC reference allele, i.e., the bases in\ \ the mapped position range.
    • \
    • RefAlleleRevComp - The reference allele from dbSNP\ matches the reverse complement of the UCSC reference\ allele.
    • \
    • SingleClassLongerSpan - All observed alleles are\ single-base, but the annotation spans more than 1 base.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • SingleClassZeroSpan - All observed alleles are\ single-base, but the annotation spans 0 bases. (UCSC's\ re-alignment of flanking sequences to the genome may be\ informative.)
    • \
    \ Another condition, which does not necessarily imply any problem,\ is noted:\
      \
    • SingleClassTriAllelic, SingleClassQuadAllelic - \ Class is single and three or four different bases have been\ \ observed (usually there are only two).
    • \
    \
  • \
  • \ \ Miscellaneous Attributes (dbSNP): several properties extracted\ from dbSNP's SNP_bitfield table\ (see dbSNP_BitField_v5.pdf for details)\
      \
    • Clinically Associated - SNP is in OMIM/OMIA and/or at \ \ least one submitter is a Locus-Specific Database. This does\ \ not necessarily imply that the variant causes any disease,\ \ only that it has been observed in clinical studies.
    • \
    • Appears in OMIM/OMIA - SNP is mentioned in \ \ Online Mendelian Inheritance in Man for \ \ human SNPs, or Online Mendelian Inheritance in Animals for \ \ non-human animal SNPs. Some of these SNPs are quite common,\ \ others are known to cause disease; see OMIM/OMIA for more\ \ information.
    • \
    • Has Microattribution/Third-Party Annotation - At least\ \ one of the SNP's submitters studied this SNP in a biomedical\ \ setting, but is not a Locus-Specific Database or OMIM/OMIA.
    • \
    • Submitted by Locus-Specific Database - At least one of\ \ the SNP's submitters is associated with a database of variants\ \ associated with a particular gene. These variants may or may\ \ not be known to be causative.
    • \
    • MAF >= 5% in Some Population - Minor Allele Frequency is \ \ at least 5% in at least one population assayed.\ \ Warning: this bit appears to have been set incorrectly for \ \ many SNPs in build 132
    • \
    • MAF >= 5% in All Populations - Minor Allele Frequency is \ \ at least 5% in all populations assayed.\ \ Warning: this bit appears to have been set incorrectly for\ \ some SNPs in build 132
    • \
    • Genotype Conflict - Quality check: different genotypes \ \ have been submitted for the same individual.
    • \
    • Ref SNP Cluster has Non-overlapping Alleles - Quality\ \ check: this reference SNP was clustered from submitted SNPs\ \ with non-overlapping sets of observed alleles.
    • \
    • Some Assembly's Allele Does Not Match Observed - \ \ Quality check: at least one assembly mapped by dbSNP has an allele\ at the mapped position that is not present in this SNP's observed\ alleles.
    • \
    \
  • \
\ Several other properties do not have coloring options, but do have \ some filtering options:\
    \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 1\ on all tracks except the Mult. SNPs track, which defaults to 3.\ \
    \
  • \
  • \ \ Submitter handles: These are short, single-word identifiers of\ labs or consortia that submitted SNPs that were clustered into this\ reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs\ have been observed by many different submitters, and some by only a\ single submitter (although that single submitter may have tested a\ large number of samples).\
  • \
  • \ \ AlleleFrequencies: Some submissions to dbSNP include \ allele frequencies and the study's sample size \ (i.e., the number of distinct chromosomes, which is two times the\ number of individuals assayed, a.k.a. 2N). dbSNP combines all\ available frequencies and counts from submitted SNPs that are \ clustered together into a reference SNP.\
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to \ particular gene sets. Choose the gene sets from the list on the SNP \ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to. \ When one or more gene tracks are selected, the SNP details page \ lists all genes that the SNP hits (or is close to), with the same keywords \ used in the function category. The function usually \ agrees with NCBI's function, except when NCBI's functional annotation is \ relative to an XM_* predicted RefSeq (not included in the UCSC Genome \ Browser's RefSeq Genes track) and/or UCSC's functional annotation is \ relative to a transcript that is not in RefSeq.\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\

\ Non-repetitive genomic sequence is shown in upper case like the flanking \ sequence, and a "|" indicates each match between genomic and flanking bases.\ Repetitive genomic sequence (annotated by RepeatMasker and/or the\ Tandem Repeats Finder with period <= 12) is shown in lower case, and matching\ bases are indicated by a "+".\

\ \

Data Sources and Methods

\ \

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g., for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b132_SNPContigLoc_37_1.bcp.gz and \ b132_ContigInfo_37_1.bcp.gz.
  • \
  • b132_SNPMapInfo_37_1.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b132_SNPContigLocusId_37_1.bcp.gz.
  • \
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.
  • \
  • SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies.
  • \
  • Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and \ SNPSubSNPLink.bcp.gz.
  • \
  • SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP,\ such as clinically-associated. See the document \ dbSNP_BitField_v5.pdf for details.
  • \
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.
  • \
\ \

Orthologous Alleles (human assemblies only)

\

\ For the human assembly, we provide a related table that contains\ orthologous alleles in the chimpanzee, orangutan and rhesus macaque\ reference genome assemblies. \ We use our liftOver utility to identify the orthologous alleles. \ The candidate human SNPs are a filtered list that meet the criteria:\

    \
  • class = 'single'
  • \
  • mapped position in the human reference genome is one base long
  • \
  • aligned to only one location in the human reference genome
  • \
  • not aligned to a chrN_random chrom
  • \
  • biallelic (not tri- or quad-allelic)
  • \
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\

\ \ varRep 1 defaultGeneTracks knownGene\ group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP 132) Flagged by dbSNP as Clinically Assoc\ priority 99.0922\ shortLabel Flagged SNPs(132)\ snpExceptionDesc snp132ExceptionDesc\ snpSeq snp132Seq\ track snp132Flagged\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility hide\ snp132Mult Mult. SNPs(132) bed 6 + Simple Nucleotide Polymorphisms (dbSNP 132) That Map to Multiple Genomic Loci 0 99.0923 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about a subset of the \ single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 132, available from\ ftp.ncbi.nih.gov/snp.\ Only SNPs that have been mapped to multiple locations in the reference\ genome assembly are included in this subset. When a SNP's flanking sequences \ map to multiple locations in the reference genome, it calls into question \ whether there is true variation at those sites, or whether the sequences\ at those sites are merely highly similar but not identical.\

\

\ The default maximum weight for this track is 3,\ unlike the other dbSNP build 132 tracks which have a maximum weight of 1. \ That enables these multiply-mapped SNPs to appear in the display, while \ by default they will not appear in the All SNPs(132) track because of its \ maximum weight filter.\

\ \ The remainder of this page is identical on the following tracks:\
    \
  • Common SNPs(132)\
  • Flagged SNPs(132)\
  • Mult. SNPs(132)\
  • All SNPs(132)\
\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ On the track controls page, SNPs can be colored and/or filtered from the \ display according to several attributes:\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/-\
    • No Variation - the submission reports an invariant region in the surveyed sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - submitted by HapMap project (human only)\
    • By 1000Genomes - submitted by 1000Genomes project (human only)\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation is 3' to and within 500 bases of a\ transcript, or is 5' to and within 2000 bases of a transcript\ (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to the reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to the reference assembly (nonsense, missense, \ frameshift, cds-indel, coding-synonymy-unknown)\
    • Untranslated - variation is in a transcript, but not in a coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation is in an intron, but not in the first two or\ last two bases of the intron\
    • Splice Site - variation is in the first two or last two bases\ of an intron (splice-3, splice-5)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Unusual Conditions (UCSC): UCSC checks for several anomalies \ that may indicate a problem with the mapping, and reports them in the \ Annotations section of the SNP details page if found:\
      \
    • AlleleFreqSumNot1 - Allele frequencies do not sum\ to 1.0 (+-0.01). This SNP's allele frequency data are\ \ probably incomplete.
    • \
    • DuplicateObserved,\ MixedObserved - Multiple distinct insertion SNPs have \ \ been mapped to this location, with either the same inserted \ \ sequence (Duplicate) or different inserted sequence (Mixed).
    • \
    • FlankMismatchGenomeEqual,\ \ FlankMismatchGenomeLonger,\ \ FlankMismatchGenomeShorter - NCBI's alignment of\ the flanking sequences had at least one mismatch or gap\ \ near the mapped SNP position.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • MultipleAlignments - This SNP's flanking sequences \ align to more than one location in the reference assembly.
    • \
    • NamedDeletionZeroSpan - A deletion (from the\ genome) was observed but the annotation spans 0 bases.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • NamedInsertionNonzeroSpan - An insertion (into the\ genome) was observed but the annotation spans more than 0\ bases. (UCSC's re-alignment of flanking sequences to the\ genome may be informative.)
    • \
    • NonIntegerChromCount - At least one allele\ frequency corresponds to a non-integer (+-0.010000) count of\ chromosomes on which the allele was observed. The reported\ total sample count for this SNP is probably incorrect.
    • \
    • ObservedContainsIupac - At least one observed allele \ from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N).
    • \
    • ObservedMismatch - UCSC reference allele does not\ match any observed allele from dbSNP. This is tested only\ \ for SNPs whose class is single, in-del, insertion, deletion,\ \ mnp or mixed.
    • \
    • ObservedTooLong - Observed allele not given (length\ too long).
    • \
    • ObservedWrongFormat - Observed allele(s) from dbSNP\ have unexpected format for the given class.
    • \
    • RefAlleleMismatch - The reference allele from dbSNP\ does not match the UCSC reference allele, i.e., the bases in\ \ the mapped position range.
    • \
    • RefAlleleRevComp - The reference allele from dbSNP\ matches the reverse complement of the UCSC reference\ allele.
    • \
    • SingleClassLongerSpan - All observed alleles are\ single-base, but the annotation spans more than 1 base.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • SingleClassZeroSpan - All observed alleles are\ single-base, but the annotation spans 0 bases. (UCSC's\ re-alignment of flanking sequences to the genome may be\ informative.)
    • \
    \ Another condition, which does not necessarily imply any problem,\ is noted:\
      \
    • SingleClassTriAllelic, SingleClassQuadAllelic - \ Class is single and three or four different bases have been\ \ observed (usually there are only two).
    • \
    \
  • \
  • \ \ Miscellaneous Attributes (dbSNP): several properties extracted\ from dbSNP's SNP_bitfield table\ (see dbSNP_BitField_v5.pdf for details)\
      \
    • Clinically Associated - SNP is in OMIM/OMIA and/or at \ \ least one submitter is a Locus-Specific Database. This does\ \ not necessarily imply that the variant causes any disease,\ \ only that it has been observed in clinical studies.
    • \
    • Appears in OMIM/OMIA - SNP is mentioned in \ \ Online Mendelian Inheritance in Man for \ \ human SNPs, or Online Mendelian Inheritance in Animals for \ \ non-human animal SNPs. Some of these SNPs are quite common,\ \ others are known to cause disease; see OMIM/OMIA for more\ \ information.
    • \
    • Has Microattribution/Third-Party Annotation - At least\ \ one of the SNP's submitters studied this SNP in a biomedical\ \ setting, but is not a Locus-Specific Database or OMIM/OMIA.
    • \
    • Submitted by Locus-Specific Database - At least one of\ \ the SNP's submitters is associated with a database of variants\ \ associated with a particular gene. These variants may or may\ \ not be known to be causative.
    • \
    • MAF >= 5% in Some Population - Minor Allele Frequency is \ \ at least 5% in at least one population assayed.\ \ Warning: this bit appears to have been set incorrectly for \ \ many SNPs in build 132
    • \
    • MAF >= 5% in All Populations - Minor Allele Frequency is \ \ at least 5% in all populations assayed.\ \ Warning: this bit appears to have been set incorrectly for\ \ some SNPs in build 132
    • \
    • Genotype Conflict - Quality check: different genotypes \ \ have been submitted for the same individual.
    • \
    • Ref SNP Cluster has Non-overlapping Alleles - Quality\ \ check: this reference SNP was clustered from submitted SNPs\ \ with non-overlapping sets of observed alleles.
    • \
    • Some Assembly's Allele Does Not Match Observed - \ \ Quality check: at least one assembly mapped by dbSNP has an allele\ at the mapped position that is not present in this SNP's observed\ alleles.
    • \
    \
  • \
\ Several other properties do not have coloring options, but do have \ some filtering options:\
    \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 1\ on all tracks except the Mult. SNPs track, which defaults to 3.\ \
    \
  • \
  • \ \ Submitter handles: These are short, single-word identifiers of\ labs or consortia that submitted SNPs that were clustered into this\ reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs\ have been observed by many different submitters, and some by only a\ single submitter (although that single submitter may have tested a\ large number of samples).\
  • \
  • \ \ AlleleFrequencies: Some submissions to dbSNP include \ allele frequencies and the study's sample size \ (i.e., the number of distinct chromosomes, which is two times the\ number of individuals assayed, a.k.a. 2N). dbSNP combines all\ available frequencies and counts from submitted SNPs that are \ clustered together into a reference SNP.\
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to \ particular gene sets. Choose the gene sets from the list on the SNP \ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to. \ When one or more gene tracks are selected, the SNP details page \ lists all genes that the SNP hits (or is close to), with the same keywords \ used in the function category. The function usually \ agrees with NCBI's function, except when NCBI's functional annotation is \ relative to an XM_* predicted RefSeq (not included in the UCSC Genome \ Browser's RefSeq Genes track) and/or UCSC's functional annotation is \ relative to a transcript that is not in RefSeq.\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\

\ Non-repetitive genomic sequence is shown in upper case like the flanking \ sequence, and a "|" indicates each match between genomic and flanking bases.\ Repetitive genomic sequence (annotated by RepeatMasker and/or the\ Tandem Repeats Finder with period <= 12) is shown in lower case, and matching\ bases are indicated by a "+".\

\ \

Data Sources and Methods

\ \

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g., for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b132_SNPContigLoc_37_1.bcp.gz and \ b132_ContigInfo_37_1.bcp.gz.
  • \
  • b132_SNPMapInfo_37_1.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b132_SNPContigLocusId_37_1.bcp.gz.
  • \
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.
  • \
  • SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies.
  • \
  • Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and \ SNPSubSNPLink.bcp.gz.
  • \
  • SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP,\ such as clinically-associated. See the document \ dbSNP_BitField_v5.pdf for details.
  • \
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.
  • \
\ \

Orthologous Alleles (human assemblies only)

\

\ For the human assembly, we provide a related table that contains\ orthologous alleles in the chimpanzee, orangutan and rhesus macaque\ reference genome assemblies. \ We use our liftOver utility to identify the orthologous alleles. \ The candidate human SNPs are a filtered list that meet the criteria:\

    \
  • class = 'single'
  • \
  • mapped position in the human reference genome is one base long
  • \
  • aligned to only one location in the human reference genome
  • \
  • not aligned to a chrN_random chrom
  • \
  • biallelic (not tri- or quad-allelic)
  • \
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\

\ \ varRep 1 defaultGeneTracks knownGene\ defaultMaxWeight 3\ group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP 132) That Map to Multiple Genomic Loci\ maxWindowToDraw 10000000\ priority 99.0923\ shortLabel Mult. SNPs(132)\ snpExceptionDesc snp132ExceptionDesc\ snpSeq snp132Seq\ track snp132Mult\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility hide\ snp132 All SNPs(132) bed 6 + Simple Nucleotide Polymorphisms (dbSNP 132) 0 99.0925 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 132, available from\ ftp.ncbi.nih.gov/snp.\

\

\ Three tracks contain subsets of the items in this track:\

    \
  • Common SNPs(132): SNPs that have a minor allele frequency\ of at least 1% and are mapped to a single location in the reference\ genome assembly. Frequency data are not available for all SNPs,\ so this subset is incomplete.
  • \
  • Flagged SNPs(132): SNPs flagged as clinically associated by dbSNP, \ mapped to a single location in the reference genome assembly, and \ not known to have a minor allele frequency of at least 1%.\ Frequency data are not available for all SNPs, so this subset may\ include some SNPs whose true minor allele frequency is 1% or greater.
  • \
  • Mult. SNPs(132): SNPs that have been mapped to multiple locations\ in the reference genome assembly.
  • \
\

\

\ The default maximum weight for this track is 1, so unless\ the setting is changed in the track controls, SNPs that map to multiple genomic \ locations will be omitted from display. When a SNP's flanking sequences \ map to multiple locations in the reference genome, it calls into question \ whether there is true variation at those sites, or whether the sequences\ at those sites are merely highly similar but not identical.\

\ \ The remainder of this page is identical on the following tracks:\
    \
  • Common SNPs(132)\
  • Flagged SNPs(132)\
  • Mult. SNPs(132)\
  • All SNPs(132)\
\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ On the track controls page, SNPs can be colored and/or filtered from the \ display according to several attributes:\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/-\
    • No Variation - the submission reports an invariant region in the surveyed sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - submitted by HapMap project (human only)\
    • By 1000Genomes - submitted by 1000Genomes project (human only)\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation is 3' to and within 500 bases of a\ transcript, or is 5' to and within 2000 bases of a transcript\ (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to the reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to the reference assembly (nonsense, missense, \ frameshift, cds-indel, coding-synonymy-unknown)\
    • Untranslated - variation is in a transcript, but not in a coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation is in an intron, but not in the first two or\ last two bases of the intron\
    • Splice Site - variation is in the first two or last two bases\ of an intron (splice-3, splice-5)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Unusual Conditions (UCSC): UCSC checks for several anomalies \ that may indicate a problem with the mapping, and reports them in the \ Annotations section of the SNP details page if found:\
      \
    • AlleleFreqSumNot1 - Allele frequencies do not sum\ to 1.0 (+-0.01). This SNP's allele frequency data are\ \ probably incomplete.
    • \
    • DuplicateObserved,\ MixedObserved - Multiple distinct insertion SNPs have \ \ been mapped to this location, with either the same inserted \ \ sequence (Duplicate) or different inserted sequence (Mixed).
    • \
    • FlankMismatchGenomeEqual,\ \ FlankMismatchGenomeLonger,\ \ FlankMismatchGenomeShorter - NCBI's alignment of\ the flanking sequences had at least one mismatch or gap\ \ near the mapped SNP position.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • MultipleAlignments - This SNP's flanking sequences \ align to more than one location in the reference assembly.
    • \
    • NamedDeletionZeroSpan - A deletion (from the\ genome) was observed but the annotation spans 0 bases.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • NamedInsertionNonzeroSpan - An insertion (into the\ genome) was observed but the annotation spans more than 0\ bases. (UCSC's re-alignment of flanking sequences to the\ genome may be informative.)
    • \
    • NonIntegerChromCount - At least one allele\ frequency corresponds to a non-integer (+-0.010000) count of\ chromosomes on which the allele was observed. The reported\ total sample count for this SNP is probably incorrect.
    • \
    • ObservedContainsIupac - At least one observed allele \ from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N).
    • \
    • ObservedMismatch - UCSC reference allele does not\ match any observed allele from dbSNP. This is tested only\ \ for SNPs whose class is single, in-del, insertion, deletion,\ \ mnp or mixed.
    • \
    • ObservedTooLong - Observed allele not given (length\ too long).
    • \
    • ObservedWrongFormat - Observed allele(s) from dbSNP\ have unexpected format for the given class.
    • \
    • RefAlleleMismatch - The reference allele from dbSNP\ does not match the UCSC reference allele, i.e., the bases in\ \ the mapped position range.
    • \
    • RefAlleleRevComp - The reference allele from dbSNP\ matches the reverse complement of the UCSC reference\ allele.
    • \
    • SingleClassLongerSpan - All observed alleles are\ single-base, but the annotation spans more than 1 base.\ (UCSC's re-alignment of flanking sequences to the genome may\ be informative.)
    • \
    • SingleClassZeroSpan - All observed alleles are\ single-base, but the annotation spans 0 bases. (UCSC's\ re-alignment of flanking sequences to the genome may be\ informative.)
    • \
    \ Another condition, which does not necessarily imply any problem,\ is noted:\
      \
    • SingleClassTriAllelic, SingleClassQuadAllelic - \ Class is single and three or four different bases have been\ \ observed (usually there are only two).
    • \
    \
  • \
  • \ \ Miscellaneous Attributes (dbSNP): several properties extracted\ from dbSNP's SNP_bitfield table\ (see dbSNP_BitField_v5.pdf for details)\
      \
    • Clinically Associated - SNP is in OMIM/OMIA and/or at \ \ least one submitter is a Locus-Specific Database. This does\ \ not necessarily imply that the variant causes any disease,\ \ only that it has been observed in clinical studies.
    • \
    • Appears in OMIM/OMIA - SNP is mentioned in \ \ Online Mendelian Inheritance in Man for \ \ human SNPs, or Online Mendelian Inheritance in Animals for \ \ non-human animal SNPs. Some of these SNPs are quite common,\ \ others are known to cause disease; see OMIM/OMIA for more\ \ information.
    • \
    • Has Microattribution/Third-Party Annotation - At least\ \ one of the SNP's submitters studied this SNP in a biomedical\ \ setting, but is not a Locus-Specific Database or OMIM/OMIA.
    • \
    • Submitted by Locus-Specific Database - At least one of\ \ the SNP's submitters is associated with a database of variants\ \ associated with a particular gene. These variants may or may\ \ not be known to be causative.
    • \
    • MAF >= 5% in Some Population - Minor Allele Frequency is \ \ at least 5% in at least one population assayed.\ \ Warning: this bit appears to have been set incorrectly for \ \ many SNPs in build 132
    • \
    • MAF >= 5% in All Populations - Minor Allele Frequency is \ \ at least 5% in all populations assayed.\ \ Warning: this bit appears to have been set incorrectly for\ \ some SNPs in build 132
    • \
    • Genotype Conflict - Quality check: different genotypes \ \ have been submitted for the same individual.
    • \
    • Ref SNP Cluster has Non-overlapping Alleles - Quality\ \ check: this reference SNP was clustered from submitted SNPs\ \ with non-overlapping sets of observed alleles.
    • \
    • Some Assembly's Allele Does Not Match Observed - \ \ Quality check: at least one assembly mapped by dbSNP has an allele\ at the mapped position that is not present in this SNP's observed\ alleles.
    • \
    \
  • \
\ Several other properties do not have coloring options, but do have \ some filtering options:\
    \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 1\ on all tracks except the Mult. SNPs track, which defaults to 3.\ \
    \
  • \
  • \ \ Submitter handles: These are short, single-word identifiers of\ labs or consortia that submitted SNPs that were clustered into this\ reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs\ have been observed by many different submitters, and some by only a\ single submitter (although that single submitter may have tested a\ large number of samples).\
  • \
  • \ \ AlleleFrequencies: Some submissions to dbSNP include \ allele frequencies and the study's sample size \ (i.e., the number of distinct chromosomes, which is two times the\ number of individuals assayed, a.k.a. 2N). dbSNP combines all\ available frequencies and counts from submitted SNPs that are \ clustered together into a reference SNP.\
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to \ particular gene sets. Choose the gene sets from the list on the SNP \ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to. \ When one or more gene tracks are selected, the SNP details page \ lists all genes that the SNP hits (or is close to), with the same keywords \ used in the function category. The function usually \ agrees with NCBI's function, except when NCBI's functional annotation is \ relative to an XM_* predicted RefSeq (not included in the UCSC Genome \ Browser's RefSeq Genes track) and/or UCSC's functional annotation is \ relative to a transcript that is not in RefSeq.\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\

\ Non-repetitive genomic sequence is shown in upper case like the flanking \ sequence, and a "|" indicates each match between genomic and flanking bases.\ Repetitive genomic sequence (annotated by RepeatMasker and/or the\ Tandem Repeats Finder with period <= 12) is shown in lower case, and matching\ bases are indicated by a "+".\

\ \

Data Sources and Methods

\ \

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g., for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b132_SNPContigLoc_37_1.bcp.gz and \ b132_ContigInfo_37_1.bcp.gz.
  • \
  • b132_SNPMapInfo_37_1.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b132_SNPContigLocusId_37_1.bcp.gz.
  • \
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.
  • \
  • SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies.
  • \
  • Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and \ SNPSubSNPLink.bcp.gz.
  • \
  • SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP,\ such as clinically-associated. See the document \ dbSNP_BitField_v5.pdf for details.
  • \
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.
  • \
\ \

Orthologous Alleles (human assemblies only)

\

\ For the human assembly, we provide a related table that contains\ orthologous alleles in the chimpanzee, orangutan and rhesus macaque\ reference genome assemblies. \ We use our liftOver utility to identify the orthologous alleles. \ The candidate human SNPs are a filtered list that meet the criteria:\

    \
  • class = 'single'
  • \
  • mapped position in the human reference genome is one base long
  • \
  • aligned to only one location in the human reference genome
  • \
  • not aligned to a chrN_random chrom
  • \
  • biallelic (not tri- or quad-allelic)
  • \
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\

\ \ varRep 1 defaultGeneTracks knownGene\ group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP 132)\ maxWindowToDraw 10000000\ priority 99.0925\ shortLabel All SNPs(132)\ track snp132\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility hide\ snp131 SNPs (131) bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 131) 0 99.093 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 131, available from\ ftp.ncbi.nih.gov/snp.\

\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ The configuration categories reflect the following definitions (not all categories apply\ to this assembly):\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name\
    • No Variation - no variation asserted for sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - submitted by HapMap project (human only)\
    • By 1000Genomes - submitted by 1000Genomes project (human only)\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation within 2000 bases of gene, but not \ \ in transcript (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to reference assembly (nonsense, missense, \ frameshift, cds-indel, coding-synonymy-unknown)\
    • Untranslated - variation in transcript, but not in coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation in intron, but not in first two or last two bases of intron\
    • Splice Site - variation in first two or last two bases of \ \ intron (splice-3, splice-5)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 3.\
    \
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to \ particular gene sets. Choose the gene sets from the list on the SNP \ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to. \ When one or more gene tracks are selected, the SNP details page \ lists all genes that the SNP hits (or is close to), with the same keywords \ used in the function category. The function usually \ agrees with NCBI's function, but can sometimes give a bit more detail\ (e.g. more detail about how close a near-gene SNP is to a nearby gene).\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Annotations

\

\ UCSC checks for several unusual conditions that may indicate a problem \ with the mapping, and reports them in the Annotations section if found:\

\
    \
  • The dbSNP reference allele is not the same as the UCSC reference\ allele, i.e. the bases in the mapped position range.
  • \
  • Class is single, in-del, mnp or mixed and the UCSC reference\ allele does not match any observed allele.
  • \
  • In NCBI's alignment of flanking sequences to the genome, part\ of the flanking sequence around the SNP does not align to\ the genome.
  • \
  • Class is single, but the size of the mapped SNP is not one base.
  • \
  • Class is named and indicates an insertion or deletion, but the size\ of the mapped SNP implies otherwise.
  • \
  • Class is single and the format of observed alleles is unexpected.
  • \
  • The length of the observed allele(s) is not available because it is\ too long.
  • \
  • Multiple distinct insertion SNPs have been mapped to this location.
  • \
  • At least one observed allele contains an ambiguous \ IUPAC base (e.g. R, Y, N).
  • \
\ \ Another condition, which does not necessarily imply any problem, is noted:\
    \
  • Class is single and SNP is tri-allelic or quad-allelic.
  • \
\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\

\ Non-repetitive genomic sequence is shown in upper case like the flanking \ sequence, and a "|" indicates each match between genomic and flanking bases.\ Repetitive genomic sequence (annotated by RepeatMasker and/or the\ Tandem Repeats Finder with period <= 12) is shown in lower case, and matching\ bases are indicated by a "+".\

\ \

Data Sources

\

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g. for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b131_SNPContigLoc_37_1.bcp.gz and \ b131_ContigInfo_37_1.bcp.gz. \
  • b131_SNPMapInfo_37_1.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b131_SNPContigLocusId_37_1.bcp.gz.\
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.\
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.\
\ \

Orthologous Alleles (human assemblies only)

\

\ Beginning with the March 2006 human assembly, we provide a related table that \ contains orthologous alleles in the chimpanzee and rhesus macaque assemblies.\ Beginning with dbSNP build 129, the orangutan assembly is also included.\ We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are \ a filtered list that meet the criteria:\

    \
  • class = 'single'\
  • chromEnd = chromStart + 1\
  • align to just one location\
  • are not aligned to a chrN_random chrom\
  • are biallelic (not tri or quad allelic)\
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\

\  \ varRep 1 defaultGeneTracks knownGene\ group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP build 131)\ maxWindowToDraw 10000000\ priority 99.093\ shortLabel SNPs (131)\ track snp131\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility hide\ snp131Composite SNPs (131) Comp bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 131) -- Composite version 0 99.093 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$ varRep 1 chimpDb panTro2\ chimpOrangMacOrthoTable snp131OrthoPt2Pa2Rm2\ codingAnnoLabel_snp131CodingDbSnp dbSNP\ codingAnnotations snp131CodingDbSnp,\ compositeTrack on\ defaultGeneTracks knownGene\ dimensions dimX=view\ group varRep\ hapmapPhase III\ longLabel Simple Nucleotide Polymorphisms (dbSNP build 131) -- Composite version\ macaqueDb rheMac2\ maxWindowToDraw 10000000\ orangDb ponAbe2\ priority 99.093\ shortLabel SNPs (131) Comp\ snpExceptionDesc snp131ExceptionDesc\ snpExceptions snp131Exceptions\ snpSeq snp131Seq\ subGroup1 view Views common=Common_SNPs misc=Misc_SNPs nonu=Non-Unique_SNPs\ track snp131Composite\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility hide\ snp130 SNPs (130) bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 130) 1 99.094 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 130, available from\ ftp.ncbi.nih.gov/snp.\

\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ The configuration categories reflect the following definitions (not all categories apply\ to this assembly):\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name\
    • No Variation - no variation asserted for sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - validated by HapMap project\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation within 2000 bases of gene, but not \ \ in transcript (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to reference assembly (nonsense, missense, \ frameshift)\
    • Untranslated - variation in transcript, but not in coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation in intron, but not in first two or last two bases of intron\
    • Splice Site - variation in first two or last two bases of \ \ intron (splice-3, splice-5)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 3.\
    \
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to \ particular gene sets. Choose the gene sets from the list on the SNP \ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to. \ When one or more gene tracks are selected, the SNP details page \ lists all genes that the SNP hits (or is close to), with the same keywords \ used in the function category. The function usually \ agrees with NCBI's function, but can sometimes give a bit more detail\ (e.g. more detail about how close a near-gene SNP is to a nearby gene).\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Annotations

\

\ UCSC checks for several unusual conditions that may indicate a problem \ with the mapping, and reports them in the Annotations section if found:\

\
    \
  • The dbSNP reference allele is not the same as the UCSC reference\ allele, i.e. the bases in the mapped position range.
  • \
  • Class is single, in-del, mnp or mixed and the UCSC reference\ allele does not match any observed allele.
  • \
  • In NCBI's alignment of flanking sequences to the genome, part\ of the flanking sequence around the SNP does not align to\ the genome.
  • \
  • Class is single, but the size of the mapped SNP is not one base.
  • \
  • Class is named and indicates an insertion or deletion, but the size\ of the mapped SNP implies otherwise.
  • \
  • Class is single and the format of observed alleles is unexpected.
  • \
  • The length of the observed allele(s) is not available because it is\ too long.
  • \
  • Multiple distinct insertion SNPs have been mapped to this location.
  • \
  • At least one observed allele contains an ambiguous \ IUPAC base (e.g. R, Y, N).
  • \
\ \ Another condition, which does not necessarily imply any problem, is noted:\
    \
  • Class is single and SNP is tri-allelic or quad-allelic.
  • \
\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\

\ Non-repetitive genomic sequence is shown in upper case like the flanking \ sequence, and a "|" indicates each match between genomic and flanking bases.\ Repetitive genomic sequence (annotated by RepeatMasker and/or the\ Tandem Repeats Finder with period <= 12) is shown in lower case, and matching\ bases are indicated by a "+".\

\ \

Data Sources

\

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g. for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b130_SNPContigLoc_36_3.bcp.gz and \ b130_SNPContigInfo_36_3.bcp.gz. \
  • b130_SNPMapInfo_36_3.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b130_SNPContigLocusId_36_3.bcp.gz.\
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.\
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.\
\ \

Orthologous Alleles (human assemblies only)

\

\ Beginning with the March 2006 human assembly, we provide a related table that \ contains orthologous alleles in the chimpanzee and rhesus macaque assemblies.\ Beginning with dbSNP build 129, the orangutan assembly is also included.\ We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are \ a filtered list that meet the criteria:\

    \
  • class = 'single'\
  • chromEnd = chromStart + 1\
  • align to just one location\
  • are not aligned to a chrN_random chrom\
  • are biallelic (not tri or quad allelic)\
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\

\  \ varRep 1 defaultGeneTracks knownGene\ group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP build 130)\ maxWindowToDraw 10000000\ priority 99.094\ shortLabel SNPs (130)\ track snp130\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility dense\ snp129 SNPs (129) bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 129) 1 99.095 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 129, available from\ ftp.ncbi.nih.gov/snp.\

\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ The configuration categories reflect the following definitions (not all categories apply\ to this assembly):\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name\
    • No Variation - no variation asserted for sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - validated by HapMap project\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation within 2000 bases of gene, but not \ \ in transcript (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to reference assembly (nonsense, missense, \ frameshift)\
    • Untranslated - variation in transcript, but not in coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation in intron, but not in first two or last two bases of intron\
    • Splice Site - variation in first two or last two bases of \ \ intron (splice-3, splice-5)\
    • Reference (coding) - one of the observed alleles of a SNP\ \ in a coding region matches the reference assembly (cds-reference)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \ Note: the dbSNP release 129 fasta headers have swapped values:\ "genomic" for "cDNA" SNPs and vice versa. \ UCSC has swapped them back, so the displayed molecule type \ should be correct but might disagree with files downloaded\ from dbSNP.
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 3.\
    \
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to\ particular gene sets. Choose the gene sets from the list on the SNP\ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to.\ When one or more gene tracks are selected, the SNP details page\ lists all genes that the SNP hits (or is close to), with the same keywords\ used in the function category. The function usually\ agrees with NCBI's function, but can sometimes give a bit more detail\ (e.g. more detail about how close a near-gene SNP is to a nearby gene).\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Annotations

\

\ UCSC checks for several unusual conditions that may indicate a problem \ with the mapping, and reports them in the Annotations section if found:\

\
    \
  • The dbSNP reference allele is not the same as the UCSC reference\ allele, i.e. the bases in the mapped position range.
  • \
  • Class is single, in-del, mnp or mixed and the UCSC reference\ allele does not match any observed allele.
  • \
  • In NCBI's alignment of flanking sequences to the genome, part\ of the flanking sequence around the SNP does not align to\ the genome.
  • \
  • Class is single, but the size of the mapped SNP is not one base.
  • \
  • Class is named and indicates an insertion or deletion, but the size\ of the mapped SNP implies otherwise.
  • \
  • Class is single and the format of observed alleles is unexpected.
  • \
  • The length of the observed allele(s) is not available because it is\ too long.
  • \
  • Multiple distinct insertion SNPs have been mapped to this location.
  • \
  • At least one observed allele contains an ambiguous \ IUPAC base (e.g. R, Y, N).
  • \
\ \ Another condition, which does not necessarily imply any problem, is noted:\
    \
  • Class is single and SNP is tri-allelic or quad-allelic.
  • \
\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\ \

Data Sources

\

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g. for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b129_SNPContigLoc_36_3.bcp.gz and \ b129_SNPContigInfo_36_3.bcp.gz. \
  • b129_SNPMapInfo_36_3.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b129_SNPContigLocusId_36_3.bcp.gz.\
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.\
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.\
\ \

Orthologous Alleles (human assemblies only)

\

\ Beginning with the March 2006 human assembly, we provide a related table that \ contains orthologous alleles in the chimpanzee and rhesus macaque assemblies.\ Beginning with dbSNP build 129, the orangutan assembly is also included.\ We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are \ a filtered list that meet the criteria:\

    \
  • class = 'single'\
  • chromEnd = chromStart + 1\
  • align to just one location\
  • are not aligned to a chrN_random chrom\
  • are biallelic (not tri or quad allelic)\
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\ \ varRep 1 defaultGeneTracks knownGene\ group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP build 129)\ maxWindowToDraw 10000000\ priority 99.095\ shortLabel SNPs (129)\ track snp129\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility dense\ snp128 SNPs (128) bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 128) 1 99.096 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 128, available from\ ftp.ncbi.nih.gov/snp.\

\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ The configuration categories reflect the following definitions (not all categories apply\ to this assembly):\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name\
    • No Variation - no variation asserted for sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - validated by HapMap project\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation within 2000 bases of gene, but not \ \ in transcript (in build 127 and before, the keyword was \ \ locus, but in build 128, the more specific terms \ \ near-gene-3 and near-gene-5 are used)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to reference assembly (coding-nonsynon in build \ \ 127; nonsense, missense, frameshift \ \ in build 128)\
    • Untranslated - variation in transcript, but not in coding \ \ region interval (untranslated in build 127; \ \ untranslated-3, untranslated-5 in build 128)\
    • Intron - variation in intron, but not in first two or last two bases of intron\
    • Splice Site - variation in first two or last two bases of \ \ intron (splice-site in build 127; splice-3, \ \ splice-5 in build 128)\
    • Reference (coding) - one of the observed alleles of a SNP\ \ in a coding region matches the reference assembly (cds-reference)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 3.\
    \
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to\ particular gene sets. Choose the gene sets from the list on the SNP\ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to.\ When one or more gene tracks are selected, the SNP details page\ lists all genes that the SNP hits (or is close to), with the same keywords\ used in the function category. The function usually\ agrees with NCBI's function, but can sometimes give a bit more detail\ (e.g. more detail about how close a near-gene SNP is to a nearby gene).\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Annotations

\

\ UCSC checks for several unusual conditions that may indicate a problem \ with the mapping, and reports them in the Annotations section if found:\

\
    \
  • The dbSNP reference allele is not the same as the UCSC reference\ allele, i.e. the bases in the mapped position range.
  • \
  • Class is single, in-del, mnp or mixed and the UCSC reference\ allele does not match any observed allele.
  • \
  • In NCBI's alignment of flanking sequences to the genome, part\ of the flanking sequence around the SNP does not align to\ the genome.
  • \
  • Class is single, but the size of the mapped SNP is not one base.
  • \
  • Class is named and indicates an insertion or deletion, but the size\ of the mapped SNP implies otherwise.
  • \
  • Class is single and the format of observed alleles is unexpected.
  • \
  • The length of the observed allele(s) is not available because it is\ too long.
  • \
  • Multiple distinct insertion SNPs have been mapped to this location.
  • \
  • At least one observed allele contains an ambiguous \ IUPAC base (e.g. R, Y, N).
  • \
\ \ Another condition, which does not necessarily imply any problem, is noted:\
    \
  • Class is single and SNP is tri-allelic or quad-allelic.
  • \
\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\ \

Data Sources

\

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g. for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b128_SNPContigLoc_36_2.bcp.gz and \ b128_SNPContigInfo_36_2.bcp.gz. \
  • b128_SNPMapInfo_36_2.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b128_SNPContigLocusId_36_2.bcp.gz.\
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.\
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.\
\ \

Orthologous Alleles (human only)

\

\ Beginning with the March 2006 human assembly, we provide a related table that \ contains orthologous alleles in the chimpanzee and rhesus macaque assemblies.\ We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are \ a filtered list that meet the criteria:\

    \
  • class = 'single'\
  • chromEnd = chromStart + 1\
  • align to just one location\
  • are not aligned to a chrN_random chrom\
  • are biallelic (not tri or quad allelic)\
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\ \ varRep 1 group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP build 128)\ maxWindowToDraw 10000000\ priority 99.096\ shortLabel SNPs (128)\ track snp128\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility dense\ snp127 SNPs (127) bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 127) 1 99.097 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains\ dbSNP\ build 127, available from\ ftp.ncbi.nih.gov/snp.\

\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ The configuration categories reflect the following definitions (not all categories apply\ to this assembly):\

\
    \ \
  • \ \ Location Type: Describes the alignment of the flanking sequence
    \
      \
    • Range - the flank alignments leave a gap of 2 or more bases in the reference assembly\
    • Exact - the flank alignments leave exactly one base between them\
    • Between - the flank alignments are contiguous; the variation is an insertion\
    • RangeInsertion - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly; \ \ \ \ \ the submitted sequence is shorter\
    • RangeSubstitution - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly;\ \ \ \ \ the submitted sequence and the reference assembly sequence are of equal length\
    • RangeDeletion - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly;\ \ \ \ \ the submitted sequence is longer\
    \
  • \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion (applies to RangeInsertion, RangeSubstitution, RangeDeletion)\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name\
    • No Variation - no variation asserted for sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - validated by HapMap project\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation within 2000 bases of gene, but not in transcript\
    • Coding - Synonymous - no change in peptide for allele with respect to reference assembly\
    • Coding - Non-Synonymous - change in peptide for allele with respect to reference assembly\
    • Untranslated - variation in transcript, but not in coding region interval\
    • Intron - variation in intron, but not in first two or last two bases of intron\
    • Splice Site - variation in first two or last two bases of intron\
    • Reference - allele observed in a coding region of the reference sequence\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment count
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 3.\
    \
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to\ particular gene sets. Choose the gene sets from the list on the SNP\ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to.\ When one or more gene tracks are selected, the SNP details page\ lists all genes that the SNP hits (or is close to), with the same keywords\ used in the function category. The function usually\ agrees with NCBI's function, but can sometimes give a bit more detail\ (e.g. more detail about how close a near-gene SNP is to a nearby gene).\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. This has been split into the 'insertion' and \ 'deletion' categories, based on location type. The location types 'range' and 'exact' are deletions\ relative to the reference assembly. The location type 'between' indicates \ insertions relative\ to the reference assembly. For the new location types, the class 'in-del' is preserved.

\ \

UCSC Annotations

\

\ In addition to presenting the dbSNP data, the following annotations are provided:\

\
    \
  • The dbSNP reference allele is compared to the UCSC reference allele, and a note is made if the \ dbSNP reference allele is the reverse complement of the UCSC reference allele.
  • \
  • Single-base substitutions where the alignments of the flanking sequences are adjacent \ or have a gap of more than one base are noted.
  • \
  • Observed alleles with an unexpected format are noted.
  • \
  • The length of observed alleles is checked for consistency with location types;\ exceptions are noted.
  • \
  • Single-base substitutions are checked to see that one of the observed alleles matches\ the reference allele; exceptions are noted.
  • \
  • Simple deletions are checked to see that the observed allele matches the reference allele;\ exceptions are noted.
  • \
  • Tri-allelic and quad-allelic single-base substitutions are noted.
  • \
  • Variants that have multiple mappings are noted.
  • \
\ \

Data Sources

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b127_SNPContigLoc_36_2.bcp.gz. \
  • b127_SNPMapInfo_36_2.bcp.gz provided the alignment weights; alignments with\ weight = 0 or weight = 10 were filtered out.\
  • Class and observed polymorphism were obtained from the shared UniVariation.bcp.gz,\ using the univar_id from SNP.bcp.gz as an index.\
  • Functional classification was obtained from b127_SNPContigLocusId_36_2.bcp.gz.\
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.\
  • The header lines in the rs_fasta files were used for molecule type.\
\ \

Orthologous Alleles (human only)

\

\ Beginning with the March 2006 human assembly, we provide a related table that \ contains orthologous alleles in the chimpanzee and rhesus macaque assemblies.\ We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are \ a filtered list that meet the criteria:\

    \
  • class = 'single'\
  • locType = 'exact'\
  • chromEnd = chromStart + 1\
  • align to just one location\
  • are not aligned to a chrN_random chrom\
  • are biallelic (not tri or quad allelic)\
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end \ position to 0 (zero).\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation. .\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\ \ varRep 1 group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP build 127)\ maxWindowToDraw 10000000\ priority 99.097\ shortLabel SNPs (127)\ track snp127\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility dense\ snp126 SNPs (126) bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 126) 1 99.098 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains\ dbSNP\ build 126, available from\ ftp.ncbi.nih.gov/snp.\

\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ The configuration categories reflect the following definitions (not all categories apply\ to this assembly):\

\
    \ \
  • \ \ Location Type: Describes the alignment of the flanking sequence
    \
      \
    • Range - the flank alignments leave a gap of 2 or more bases in the reference assembly\
    • Exact - the flank alignments leave exactly one base between them\
    • Between - the flank alignments are contiguous; the variation is an insertion\
    • RangeInsertion - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly; \ \ \ \ \ the submitted sequence is shorter\
    • RangeSubstitution - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly;\ \ \ \ \ the submitted sequence and the reference assembly sequence are of equal length\
    • RangeDeletion - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly;\ \ \ \ \ the submitted sequence is longer\
    \
  • \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion (applies to RangeInsertion, RangeSubstitution, RangeDeletion)\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name\
    • No Variation - no variation asserted for sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - validated by HapMap project\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation within 2000 bases of gene, but not in transcript\
    • Coding - Synonymous - no change in peptide for allele with respect to reference assembly\
    • Coding - Non-Synonymous - change in peptide for allele with respect to reference assembly\
    • Untranslated - variation in transcript, but not in coding region interval\
    • Intron - variation in intron, but not in first two or last two bases of intron\
    • Splice Site - variation in first two or last two bases of intron\
    • Reference - allele observed in a coding region of the reference sequence\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 3.\
    \
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to\ particular gene sets. Choose the gene sets from the list on the SNP\ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to.\ When one or more gene tracks are selected, the SNP details page\ lists all genes that the SNP hits (or is close to), with the same keywords\ used in the function category. The function usually\ agrees with NCBI's function, but can sometimes give a bit more detail\ (e.g. more detail about how close a near-gene SNP is to a nearby gene).\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. This has been split into the 'insertion' and \ 'deletion' categories, based on location type. The location types 'range' and 'exact' are deletions\ relative to the reference assembly. The location type 'between' indicates \ insertions relative\ to the reference assembly. For the new location types, the class 'in-del' is preserved.

\ \

UCSC Annotations

\

\ In addition to presenting the dbSNP data, the following annotations are provided:\

\
    \
  • The dbSNP reference allele is compared to the UCSC reference allele, and a note is made if the \ dbSNP reference allele is the reverse complement of the UCSC reference allele.
  • \
  • Single-base substitutions where the alignments of the flanking sequences are adjacent \ or have a gap of more than one base are noted.
  • \
  • Observed alleles with an unexpected format are noted.
  • \
  • The length of observed alleles is checked for consistency with location types;\ exceptions are noted.
  • \
  • Single-base substitutions are checked to see that one of the observed alleles matches\ the reference allele; exceptions are noted.
  • \
  • Simple deletions are checked to see that the observed allele matches the reference allele;\ exceptions are noted.
  • \
  • Tri-allelic and quad-allelic single-base substitutions are noted.
  • \
  • Variants that have multiple mappings are noted.
  • \
\ \

Data Sources

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b126_SNPContigLoc_36_1.bcp.gz. \
  • b126_SNPMapInfo_36_1.bcp.gz provided the alignment weights; alignments with\ weight = 0 or weight = 10 were filtered out.\
  • Class and observed polymorphism were obtained from the shared UniVariation.bcp.gz,\ using the univar_id from SNP.bcp.gz as an index.\
  • Functional classification was obtained from b126_SNPContigLocusId_36_1.bcp.gz.\
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.\
  • The header lines in the rs_fasta files were used for molecule type.\
\ \

Orthologous Alleles (human only)

\

\ Beginning with the March 2006 human assembly, we provide a related table that \ contains orthologous alleles in the chimpanzee and rhesus macaque assemblies.\ We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are \ a filtered list that meet the criteria:\

    \
  • class = 'single'\
  • locType = 'exact'\
  • chromEnd = chromStart + 1\
  • align to just one location\
  • are not aligned to a chrN_random chrom\
  • are biallelic (not tri or quad allelic)\
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end \ position to 0 (zero).\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation. .\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\ \ varRep 1 group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP build 126)\ maxWindowToDraw 10000000\ priority 99.098\ shortLabel SNPs (126)\ track snp126\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility dense\ snp125 SNPs bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 125) 1 99.099 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\

\ This track contains\ dbSNP\ build 125, available from\ ftp.ncbi.nih.gov/snp.\

\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\

\ The configuration categories reflect the following definitions (not all categories apply\ to this assembly):\

\
    \ \
  • \ \ Location Type: Describes the alignment of the flanking sequence
    \
      \
    • Range - the flank alignments leave a gap of 2 or more bases in the reference assembly\
    • Exact - the flank alignments leave exactly one base between them\
    • Between - the flank alignments are contiguous; the variation is an insertion\
    • RangeInsertion - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly; \ \ \ \ \ the submitted sequence is shorter\
    • RangeSubstitution - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly;\ \ \ \ \ the submitted sequence and the reference assembly sequence are of equal length\
    • RangeDeletion - the flank alignments surround a distinct polymorphism between\ \ the submitted sequence and reference assembly;\ \ \ \ \ the submitted sequence is longer\
    \
  • \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion (applies to RangeInsertion, RangeSubstitution, RangeDeletion)\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name\
    • No Variation - no variation asserted for sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - validated by HapMap project\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation within 2000 bases of gene, but not in transcript\
    • Coding - Synonymous - no change in peptide for allele with respect to reference assembly\
    • Coding - Non-Synonymous - change in peptide for allele with respect to reference assembly\
    • Untranslated - variation in transcript, but not in coding region interval\
    • Intron - variation in intron, but not in first two or last two bases of intron\
    • Splice Site - variation in first two or last two bases of intron\
    • Reference - allele observed in a coding region of the reference sequence\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment count
    \
      \
    • Weight can be 1, 2, 3 or 10. \
    • Weight = 10 is excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 3.\
    • Alignments to chrN_random are not included.\
    \
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to\ particular gene sets. Choose the gene sets from the list on the SNP\ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to.\ When one or more gene tracks are selected, the SNP details page\ lists all genes that the SNP hits (or is close to), with the same keywords\ used in the function category. The function usually\ agrees with NCBI's function, but can sometimes give a bit more detail\ (e.g. more detail about how close a near-gene SNP is to a nearby gene).\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. This has been split into the 'insertion' and 'deletion' categories, based on location type.\ The location types 'range' and 'exact' are deletions relative to the reference assembly.\ The location type 'between' indicates insertions relative to the reference assembly.\ For the new location types, the class 'in-del' is preserved.\ \

UCSC Annotations

\

\ In addition to presenting the dbSNP data, the following annotations are provided:\

\
    \
  • The size of the dbSNP reference allele is checked to see if it matches the coordinate\ span; exceptions are noted.
  • \
  • The dbSNP reference allele is compared to the UCSC reference allele, and a note is made\ if the dbSNP reference allele is the reverse complement of the UCSC reference allele.
  • \
  • Single-base substitutions are noted where the alignments of the\ flanking sequences are adjacent or have a gap of more than one base.
  • \
  • A note is made if the observed alleles are not available from the rs_fasta files.
  • \
  • Observed alleles with an unexpected format are noted.
  • \
  • The length of the observed alleles is checked for consistency with location types;\ exceptions are noted.
  • \
  • Single-base substitutions are checked to see that one of the observed alleles matches\ the reference allele; exceptions are noted.
  • \
  • Simple deletions are checked to see that the observed allele matches the reference allele;\ exceptions are noted.
  • \
  • Tri-allelic and quad-allelic single-base substitutions are noted.
  • \
  • Variants that have multiple mappings are noted.
  • \
\ \

Data Sources

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele \ data were obtained from b125_SNPContigLoc.bcp.gz. \
  • b125_SNPMapInfo.bcp.gz provided the alignment weights; alignments with \ weight = 10 were filtered out.\
  • Functional classification information was obtained from b125_SNPContigLocusId.bcp.gz.\
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.\
  • The header lines in the rs_fasta files were used for class, \ observed polymorphism and molecule type.\
\ \ \ \ varRep 1 group varRep\ longLabel Simple Nucleotide Polymorphisms (dbSNP build 125)\ maxWindowToDraw 10000000\ priority 99.099\ shortLabel SNPs\ track snp125\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility dense\ affyHumanExon Affy All Exon expRatio Affymetrix All Exon Chips 0 100 0 0 0 127 127 127 0 0 0

Methods

\

\ RNA (from a commercial source) from 11 tissues were hybridized to Affymetrix \ Human Exon 1.0 ST arrays. For each tissue, 3 replicate experiments were done \ for a total of 33 arrays. The arrays' raw signal intensity was normalized\ with a quantile normalization method, then run through the PLIER algorithm.\ The normalized data were then converted to log-ratios, which are displayed\ as green for negative log-ratios (underexpression), and red for positive \ (overexpression).

\

The probe set for this microarray track can be displayed by turning on\ the Affy HuEx 1.0 track.

\ \

Credits

\

\ The data for this track was provided and analyzed by Chuck Sugnet at \ Affymetrix.\

\ \

Links

\

\ expression 1 expProbeTable affyHumanExonProbeAnnot\ expScale 4.0\ expStep 0.5\ expTable affyHumanExonExps\ group expression\ groupings affyHumanExonGroups\ longLabel Affymetrix All Exon Chips\ priority 0\ shortLabel Affy All Exon\ track affyHumanExon\ type expRatio\ visibility hide\ affyAllExonSuper Affy Exon Affymetrix All Exon Microarrays 0 100 0 0 0 127 127 127 0 0 0

Overview

\ This super-track combines related tracks of the Affymetrix All Exon chip data.\ There are two member tracks:\
    \
  • Affymetrix Exon Array 1.0: Normal Tissues: This track displays data\ from 11 different sample tissues across three replicate arrays each.\
  • \
  • Affymetrix Exon Array 1.0: Probesets: This track displays probeset\ loci for the array.\
  • \
\ expression 0 group expression\ longLabel Affymetrix All Exon Microarrays\ priority 0\ shortLabel Affy Exon\ superTrack on\ track affyAllExonSuper\ affyGnf1h Affy GNF1H psl . Alignments of Affymetrix Consensus/Exemplars from GNF1H 0 100 0 0 0 127 127 127 0 0 0

Description

This track shows the location of the sequences used for the selection of\ probes on the Affymetrix GNF1H chips. This contains 11406 predicted genes that do not overlap with\ the Affy U133A chip.

\ \

Methods

The sequences were mapped to the genome using blat followed by pslReps with the\ parameters:

-minCover=0.3 -minAli=0.95 -nearTop=0.005

\ \

Credits

Thanks to the Genomics Institute of\ the Novartis Research Foundation (GNF) for the data underlying this track.

\ \

References

Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R,\ Hayakawa M, Kreiman G et al. A gene atlas of the mouse and human protein-encoding transcriptomes.\ PNAS. 2004 April 20;101(16):6062-6067.

expression 1 group expression\ longLabel Alignments of Affymetrix Consensus/Exemplars from GNF1H\ priority 0\ shortLabel Affy GNF1H\ track affyGnf1h\ type psl .\ visibility hide\ affyHuEx1 Affy HuEx 1.0 bed 6 . Affymetrix Human Exon 1.0 Probe Sets 0 100 0 0 0 127 127 127 1 0 0 http://www.affymetrix.com/analysis/netaffx/exon/probe_set.affx?pk=1:$$

Description

\

\ The Human Exon 1.0 ST GeneChip contains over 1.4 million probe \ sets\ designed to interrogate individual exons rather than the 3' ends of transcripts\ as in traditional GeneChips. Exons were derived from a variety of\ annotations that have been divided into the classes Core, Extended\ and Full. \

    \
  • Core:\ RefSeq transcripts, full-length GenBank mRNAs
  • \
  • Extended: \ dbEst alignments, Ensembl annotations, syntenic mRNA from rat and mouse, \ microRNA annotations, MITOMAP annotations, Vega genes, Vega pseudogenes\
  • Full:\ Geneid genes, Genscan genes, Genscan Subopt, Exoniphy, RNA genes, SGP genes,\ Twinscan genes\
  • \

\ \

\ Probe sets are colored by class with the Core probe sets being\ the darkest and the Full being the lightest color. Additionally, probe\ sets that do not overlap the exons of a transcript cluster, but fall\ inside of its introns, are considered bounded by that transcript\ cluster and are colored slightly lighter. Probe sets that overlap the\ coding portion of the Core class are colored slightly darker.

\

\ The microarray track using this probe set can be displayed by turning\ on the Affy All Exon track.

\ \

Credits and References

\

\ The exons interrogated by the probe sets displayed in this track are\ from the Affymetrix Human Exon 1.0 GeneChip and were derived from a\ number of sources. In addition to the millions of cDNA sequences\ contributed to the \ GenBank, \ dbEst and \ RefSeq \ databases by\ individual labs and scientists, the following annotations were used:\

\ Ensembl: \ Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J,\ Curwen V, Down T et al..\ The Ensembl genome database project.\ Nucleic Acids Research. 2002 Jan 1;30(1):38-41.

\

\ Exoniphy: Siepel, A., Haussler, D. \ Computational identification of evolutionarily conserved \ exons.\ Proc. 8th Int'l Conf. on Research in Computational Molecular Biology, \ 177-186 (2004).

\

\ Geneid Genes:\ Parra, G., Blanco, E., Guigo, R. \ Geneid in Drosophila.\ Genome Res. 10(4), 511-515 (2000).

\

\ Genscan Genes:\ Burge, C., Karlin, S. \ Prediction of Complete Gene Structures in Human Genomic DNA.\ J. Mol. Biol. 268(1), 78-94 (1997).

\

\ microRNA:\ Griffiths-Jones, S. \ The microRNA Registry. \ Nucl. Acids Res. 32, D109-D111 (2004).

\

\ MITOMAP:\ Brandon, M. C., Lott, M. T., Nguyen, K. C., Spolim, S., Navathe, S. B., \ Baldi, P. & Wallace, D. C.\ MITOMAP: a human mitochondrial genome database--2004 update\ Nucl. Acids Res. 33(Database Issue):D611-613 (2005).

\

\ RNA Genes:\ Lowe, T. M., Eddy, S. R. \ tRNAscan-SE: A Program for Improved Detection of Transfer RNA \ Genes in Genomic Sequence.\ Nucleic Acids Res., 25(5), 955-964 (1997).

\

\ SGP Genes: \ Wiehe, T., Gebauer-Jung, S., Mitchell-Olds, T., Guigo, R. \ SGP-1: prediction and validation of homologous genes based on \ sequence alignments.\ Genome Res., 11(9), 1574-83 (2001).

\

\ Twinscan Genes:\ Korf, I., Flicek, P., Duan, D., Brent, M.R. \ Integrating genomic homology into gene structure prediction.\ Bioinformatics 17, S140-148 (2001).\

\ Vega Genes \ and Pseudogenes: The HAVANA group, \ Wellcome Trust Sanger \ Institute.

\ expression 1 group expression\ longLabel Affymetrix Human Exon 1.0 Probe Sets\ priority 0\ shortLabel Affy HuEx 1.0\ track affyHuEx1\ type bed 6 .\ url http://www.affymetrix.com/analysis/netaffx/exon/probe_set.affx?pk=1:$$\ urlLabel Netaffx Link:\ useScore 1\ visibility hide\ affyTxnPhase2 Affy Txn Phase2 wig 0 1000 Affymetrix Transcriptome Project Phase 2 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays transcriptome data from tiling GeneChips produced\ by Affymetrix. For the ten chromosomes 6, 7, 13, 14,\ 19, 20, 21, 22, X, and Y, more than 74 million probes were tiled every\ 5 bp in non-repeat-masked areas and hybridized to mRNA from 11\ different cell lines (some cell lines were female and contain no data\ for chrY). For HepG2, some samples were depleted\ of polyA transcripts rather than enriched. For experimental details\ and results, see Cheng et al. in the References section\ below.

\
\
\
\ \

Display Conventions and Configuration

\

\ This annotation follows the display conventions for composite \ tracks. The subtracks within this annotation may be configured in a variety of \ ways to highlight different aspects of the displayed data. The graphical \ configuration options are shown at the top of the track description page, \ followed by a list of subtracks. For more information about the \ graphical configuration options, click the \ Graph configuration \ help link. To display only selected subtracks, uncheck the boxes next to \ the tracks you wish to hide.

\

\ Each subtrack is colored blue in areas that are thought to be transcribed\ at a statistically significant level as described in the accompanying\ transfrags (transcribed fragments) track. Transfrags that have a\ significant blat hit elsewhere in the genome are displayed in a\ lighter shade of blue, and transfrags that overlap putative\ pseudogenes are colored an even lighter shade of blue. All other\ regions of the track are colored brown. While the raw data are based\ on perfect match minus mismatch (PM - MM) probe values and may contain\ negative values, the track has a minimum value of zero for visualization\ purposes.

\ \

Methods

\

\ For each data point, probes within 30 bp on either side were used to\ improve the estimate of expression level for a particular probe. This\ helped to smooth the data and produce a more robust estimate of the\ transcription level at a particular genomic location. The following\ analysis method was used:\

    \
  1. Replicate arrays were quantile-normalized and the median\ intensity (using both PM and MM intensities) of each array was\ scaled to a target value of 44.\
  2. The expression level was estimated for each mapped probe position by \
      \
    • collecting all the probe pairs that fell within a window of ±\ 30 bp\
    • calculating all non-redundant pairwise averages of PM - MM\ values of all probe pairs in the window\
    • taking the median of all resulting pairwise averages\
    \
  3. The resulting signal value is the Hodges-Lehmann estimator\ associated with the Wilcoxon signed-rank statistic of the PM - MM\ values that lie within ± 30 bp of the sliding window centered at\ every genomic coordinate.\

\ \

Credits

\

\ Data generation and analysis was performed by the transcriptome group at \ Affymetrix:\ Bekiranov, S., Brubaker, S., Cheng, J., Dike, S., Drenkow, J., Ghosh, S., \ Gingeras, T., Helt, G., Kampa, D., Kapranov, P., Long, J., Madhavan, G., \ Manak, J., Patel, S., Piccolboni, A., Sementchenko, V. and Tammana, H.

\ \

Questions or comments about this annotation? Email Chuck Sugnet.\ \

References

\

\ Cheng et al. \ Transcriptional Maps of 10 Human Chromosomes at 5-Nucleotide \ Resolution. Science 308(5725), 1149-54 (2005).

\ expression 0 autoScale Off\ canPack off\ centerLabelsDense on\ compositeTrack on smart\ group expression\ longLabel Affymetrix Transcriptome Project Phase 2\ maxHeightPixels 100:30:10\ priority 0\ shortLabel Affy Txn Phase2\ track affyTxnPhase2\ type wig 0 1000\ viewLimits 0:150\ visibility hide\ affyU133 Affy U133 psl . Alignments of Affymetrix Consensus/Exemplars from HG-U133 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the location of the consensus and exemplar sequences used \ for the selection of probes on the Affymetrix HG-U133A and HG-U133B chips.

\ \

Methods

\

\ Consensus and exemplar sequences were downloaded from the\ Affymetrix Product Support\ and mapped to the genome using blat followed by pslReps with the \ parameters:

   -minCover=0.3 -minAli=0.95 -nearTop=0.005\

\ \

Credits

\

\ Thanks to Affymetrix \ for the data underlying this track.

\ expression 1 group expression\ longLabel Alignments of Affymetrix Consensus/Exemplars from HG-U133\ priority 0\ shortLabel Affy U133\ track affyU133\ type psl .\ visibility hide\ affyU133Plus2 Affy U133Plus2 psl . Alignments of Affymetrix Consensus/Exemplars from HG-U133 Plus 2.0 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the location of the consensus and exemplar sequences used \ for the selection of probes on the Affymetrix HG-U133 Plus 2.0 chip.

\ \

Methods

\

\ Consensus and exemplar sequences were downloaded from the\ Affymetrix Product Support\ and mapped to the genome using blat followed by pslReps with the \ parameters:

   -minCover=0.3 -minAli=0.95 -nearTop=0.005\

\ \

Credits

\ Thanks to Affymetrix \ for the data underlying this track.

\ expression 1 group expression\ longLabel Alignments of Affymetrix Consensus/Exemplars from HG-U133 Plus 2.0\ priority 0\ shortLabel Affy U133Plus2\ track affyU133Plus2\ type psl .\ visibility hide\ affyU95 Affy U95 psl . Alignments of Affymetrix Consensus/Exemplars from HG-U95 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the location of the consensus and exemplar sequences used \ for the selection of probes on the Affymetrix HG-U95Av2 chip. For this chip, \ probes are predominantly designed from consensus sequences.

\ \

Methods

\

\ Consensus and exemplar sequences were downloaded from the\ Affymetrix Product Support\ and mapped to the genome using blat followed by pslReps with the \ parameters:

   -minCover=0.3 -minAli=0.95 -nearTop=0.005\

\ \

Credits

\

\ Thanks to Affymetrix \ for the data underlying this track.

\ expression 1 group expression\ longLabel Alignments of Affymetrix Consensus/Exemplars from HG-U95\ priority 0\ shortLabel Affy U95\ track affyU95\ type psl .\ visibility hide\ burgeRnaSeqGemMapperAlignViewAlignments Alignments bed 12 Burge lab RNA-seq aligned by GEM Mapper 1 100 12 12 120 133 133 187 0 0 0 expression 1 color 12,12,120\ maxWindowToDraw 50000000\ parent burgeRnaSeqGemMapperAlign\ shortLabel Alignments\ track burgeRnaSeqGemMapperAlignViewAlignments\ type bed 12\ view Alignments\ visibility dense\ burgeRnaSeqGemMapperAlignViewRawSignal All Raw Signal bedGraph 4 Burge lab RNA-seq aligned by GEM Mapper 2 100 46 0 184 150 127 219 0 0 0 expression 0 autoScale on\ color 46,0,184\ maxHeightPixels 100:24:16\ parent burgeRnaSeqGemMapperAlign\ shortLabel All Raw Signal\ track burgeRnaSeqGemMapperAlignViewRawSignal\ transformFunc NONE\ type bedGraph 4\ view RawSignal\ viewLimits 0:1000\ visibility full\ windowingFunction maximum\ allenBrainAli Allen Brain psl . Allen Brain Atlas Probes 0 100 50 0 100 152 127 177 0 0 0

Description

\

\ This track provides a link into the \ Allen Brain Atlas (ABA)\ images for this probe. The ABA is an extensive\ database of high resolution in-situ hybridization images of adult\ male mouse brains covering the majority of genes.

\ \

Methods

\

\ The ABA created a platform for high-throughput in situ hybridization \ (ISH) that allows a highly systematic approach to analyzing gene expression in \ the brain. ISH is a technique that allows the cellular localization of mRNA \ transcripts for specific genes. Labeled antisense probes, specific to a \ particular gene, are hybridized to cellular (sense) transcripts and subsequent \ detection of the bound probe produces specific labeling in those cells \ expressing the particular gene. This method involves tagged nucleotides \ detected by colorimetric methods.

\ \

The platform used for the ABA utilizes this non-isotopic approach, with \ digoxigenin-labeled nucleotides incorporated into a riboprobe produced by in\ vitro transcription. This method produces a label that fills the cell body,\ in contrast to autoradiography that produces scattered silver grains surrounding\ each labeled cell. To enhance the ability to detect low level expression, the \ ABA has incorporated a tyramide signal amplification step into the protocol that\ greatly increases sensitivity. The specific methodology is described in detail \ within the ABA Data Production Processes document.

\ \

Credits

\

\ Thanks to the Allen \ Institute for Brain Science in general, and Susan \ Sunkin in particular, for coordinating with UCSC on this annotation.

\ \ expression 1 color 50,0,100\ group expression\ longLabel Allen Brain Atlas Probes\ priority 0\ shortLabel Allen Brain\ track allenBrainAli\ type psl .\ visibility hide\ yaleBertoneTars Bertone Yale TAR psl . Yale Transcriptionally Active Regions (TARs) (Bertone data) 0 100 50 100 50 152 177 152 0 0 0 http://dart.gersteinlab.org/cgi-bin/ar/lookup.cgi?acc=$$

Description

\

\ This track shows the locations of transcriptionally active regions \ (TARs)/transcribed fragments (transfrags) hybridized to an oligonucleotide \ microarray with a design based on human assembly hg13 (NCBI Build 31)\ (Bertone et al., 2004).

\ \

Methods

\

\ Microarrays were designed using sequence from the human hg13 assembly. The\ genome sequence was screened for repetitive elements and low-complexity DNA\ using RepeatMasker in the sensitive mode. Additional low-complexity filtering\ was performed using the NSEG (segment sequence(s) by local complexity) \ program using a minimum segment length of 21 nucleotides to determine \ low complexity segments of lowest probability. After filtering, 1.5 Gb of \ nonrepetitive DNA remained and microarray probes were chosen using the NASA \ Oligonucleotide Probe Selection Algorithm (NOPSA).

\

\ NOPSA is designed to find the optimal probes for hybridization. A\ database of the frequency of every 18-mer in the genome is created using a\ hash algorithm. Chaining was used to resolve collisions. Average frequencies\ of 36-mers in the genome were determined from the frequencies of each 18-mer\ subsequence in the 36-mer and its reverse complement. 36-mer oligonucleotides\ with a frequency equal to one are selected as potential probes for the\ microarray (from supporting online material for Stolc et al., 2004) \

\

\ This resulted in probe selection based on several criteria:\

    \
  • Every 36-mer in the genome is unique. \
  • Sequences that could form a loop with a stem of > 7 bp were excluded. \
  • Factors such as sequence length, extent of complementarity and base \ composition were also considered. \

\

A total of 51,874,388 36-mer \ oligonucleotide probes were selected from both the sense and antisense strands\ at an average resolution of 46 bp to cover the non-repetitive sequence from\ the whole genome. Probes were spaced every 10 nucleotides on average. The \ probes were synthesized via maskless photolithography at a feature density of \ approximately 390,000 probes per slide.

\

\ Biological samples that were hybridized to the arrays consisted of \ triple-selected human liver poly(A)+ RNA pooled from several individuals \ (supplied by Ambion). One biological replicate was carried out.

\

\ See this NCBI \ GEO accession for details of experimental protocols.

\

\ The TARs identified for hg13 (NCBI Build 31) were mapped to this\ assembly using Blat. The program pslCDnaFilter was used to filter \ alignments using the parameters\ -minId=0.96, -minCover=0.25,\ -localNearBest=0.001,-minQSize=20,\ -minNonRepSize=16, -ignoreNs, -bestOverlap.

\ \

Display Conventions

\

\ TARs are represented by blocks in the graphical display. The numeric part of \ the ID displayed when the track has pack or full visibility is the ID used\ by the Yale Database for Active Regions with Tools \ (DART). A link to \ this database is provided on the details page for each TAR.

\ \

Data Analysis

\

\ Two groups of TARs were identified: Normal and Poly(A)-associated.

\ \

Normal TARs:

\

\ Clusters of transcription units were identified that consisted of at least \ five consectutive probes with fluorescence intensities in the top \ 90th intensity percentile and with genomic coordinates within a 250-nt \ window. After collecting these regions genome-wide, their locations were \ compared to those of annotated components of genes. As a result, a\ total of 13,889 transcription units, ranging in size from 209 to 3,438 \ nucleotides, \ were identified. Under the null hypothesis of zero transcription, only 400\ were expected to be found. Of those regions identified, one-third (4,931) \ correspond to previously annotated exons while the other 8,958 are new \ transcribed sequences that are referred to as TARs.

\ \

Poly(A)-associated TARs:

\

\ Another set of criteria was used to find TARs in which the probe hybridization \ intensities were correlated with the presence of a polyadenylation signal 3' \ to the TAR. Transcription units are five consecutive probes with fluoroscence \ intensities in the top 80th intensity percentile and in a window of 250 \ nucleotides.\ The 3' region also must contain or be close to a polyadenylation signal.\ Transcription units with an associated polyadenylation signal of \ "AATAAA" were assigned to a type I group, while those with \ "ATTAAA" were type II. Only 100 of \ these should occur at random in the genome under the null hypothesis of zero\ transcription. The majority (1,991) were found to be within annotated exons, \ and 952 were located more than 10 kb from an annotated gene. A total of 1,371 \ type I and 674 type II poly(A) sequences were identified within exons of \ known genes. 1,289 (94%) of type I and 607 (90%) of type II instances were \ found to be in the 3' exon of the gene.

\ \

Verification

\

\ The TARs were validated using RT-PCR on human liver poly(A)+ RNA. Forty-eight \ poly(A)-associated and 48 non-poly(A)-associated TARs were investigated. \ In 94% (90/96) of cases, the PCR products were found to be of the expected \ size in a single-pass assay.

\ \

Credits

\

\ These data were generated and analyzed by a collaboration between the labs of \ Michael Snyder, \ Mark Gerstein, \ and Sherman Weissman at Yale University and with \ NASA Ames Research Center (Moffett Field, California) and Eloret Corporation \ (Sunnyvale, California).

\ \

References

\

\ Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S et al.\ Global identification of human transcribed sequences with \ genome tiling arrays. \ Science. 2004 Dec 24;306(5705):2242-6.

\

\ Stolc V, Gauhar Z, Mason C, Halasz G, van Batenburg MF, Rifkin SA, Hua S, \ Herreman T, Tongprasit W, Barbano PE et al. \ A gene expression map for the euchromatic genome of Drosophilamelanogaster.\ Science. 2004 Oct 22;306(5696):655-60.

\ expression 1 color 50,100,50\ group expression\ longLabel Yale Transcriptionally Active Regions (TARs) (Bertone data)\ priority 0\ shortLabel Bertone Yale TAR\ track yaleBertoneTars\ type psl .\ url http://dart.gersteinlab.org/cgi-bin/ar/lookup.cgi?acc=$$\ urlLabel Yale DART Link:\ visibility hide\ burgeRnaSeqGemMapperAlign Burge RNA-seq bed 12 Burge lab RNA-seq aligned by GEM Mapper 0 100 0 0 0 127 127 127 0 0 0 expression 1 compositeTrack on\ configurable on\ dimensions dimensionY=tissueType\ dragAndDrop subTracks\ group expression\ longLabel Burge lab RNA-seq aligned by GEM Mapper\ noInherit on\ priority 0\ shortLabel Burge RNA-seq\ sortOrder view=+ tissueType=+\ subGroup1 view Views RawSignal=Raw_Signal Alignments=Alignments\ subGroup2 tissueType Tissue_Type BT474=BT474 HME=HME MB435=MB435 MCF7=MCF7 T47D=T47D adipose=Adipose brain=Brain breast=Breast colon=Colon heart=Heart liver=Liver lymphNode=LymphNode skelMuscle=SkelMuscle testes=Testes\ track burgeRnaSeqGemMapperAlign\ type bed 12\ visibility hide\ cghNci60 CGH NCI60 bed 15 + Comparative Genomic Hybridization Experiments for NCI 60 Cell Lines 0 100 0 0 0 127 127 127 0 0 0 \

Description

\ \

The data are shown in a tabular format in which each column of\ colored boxes represents the variation in genomic DNA levels from a \ normal cell line for a\ given clone across all of the NCI60 cell lines, and each row\ represents the measured genomic DNA levels for clones in a single\ sample. The variation in genomic DNA levels for each clone is\ represented by a color scale, in which green indicates an increase in\ genomic DNA levels, and red indicates a decrease in genomic DNA\ levels, relative to the reference sample. The saturation of the color\ corresponds to the magnitude of transcript variation. A black color\ indicates an undetectable change in genomic DNA, while a gray box\ indicates missing data.\ \

Display Options

\ This track has options to customize tissue types presented and\ the color of the display.\ \
Cell Line: This option is only valid when the track is \ displayed in full. It determines how the experiments are displayed. The\ options are:\
    \
  • Tissue Averages: Displays the average of the log ratio scores of all cell lines \ from the different tissue types.
  • \
  • All Cell Lines: Displays the log ratio score for all cell line experiments.\
  • \
  • Specific Tissues: Displays the log ratio score for all cell lines belonging\ to a given tissue type.
  • \
\ Color Scheme: \ Data are presented using two color false display. By default\ the colors of green -> positive log ratio, red -> negative log ratio are\ used.\ However, blue can be substituted for green for those who are color blind.\ \

Details Page

\ On the details page the probes presented\ correspond to those contained in window range seen on the Genome\ Browser, the exon probe and experiment selected are highlighted in\ blue.\ regulation 1 group regulation\ longLabel Comparative Genomic Hybridization Experiments for NCI 60 Cell Lines\ priority 0\ shortLabel CGH NCI60\ track cghNci60\ type bed 15 +\ visibility hide\ hapmapAllelesChimp Chimp Alleles bed 6 + Orthologous Alleles from Chimp (panTro2) 0 100 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Orthologous Alleles from Chimp (panTro2)\ parent hapmapSnps\ priority 100\ shortLabel Chimp Alleles\ track hapmapAllelesChimp\ mzPt1Mm3Rn3Gg2_pHMM Conservation wigMaf 0.0 1.0 Human/Chimp/Mouse/Rat/Chicken Multiz Alignments & PhyloHMM Cons 3 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows a measure of evolutionary conservation in human, chimp, mouse, \ rat, and chicken based on a phylogenetic hidden Markov model (phylo-HMM).\ The following multiz alignments were used to generate the annotation:\

    \
  • human July 2003 (NCBI34/hg16) (hg16)\
  • chimpanzee Nov. 2003 (panTro1)\
  • mouse Feb. 2003 (mm3)\
  • rat Jun. 2003 (rn3)\
  • chicken Feb. 2004 (galGal2) \

\

\ In "full" visibility mode, this track displays pairwise alignments \ of chimp, mouse, rat, and chicken, each aligned to the human genome. The \ pairwise \ alignments are displayed in the standard UCSC browser "dense" mode \ using a greyscale \ density gradient. The checkboxes in the track configuration section allow\ the exclusion of species from the pairwise display; however, this does not\ remove them from the conservation score display.

\

\ When zoomed-in to the base-display level, the track shows the base \ composition of each alignment. The numbers and symbols on the Gaps\ line indicate the lengths of gaps in the human sequence at those \ alignment positions relative to the longest non-human sequence. \ If there is sufficient space in the display, the size of the gap is shown; \ if not, and if the gap size is a multiple of 3, a "*" is displayed, \ otherwise "+" is shown. \ To view detailed information about the alignments at a specific position,\ zoom in the display to 30,000 or fewer bases, then click on the alignment.

\

\ This track may be configured in a variety of ways to highlight different aspects\ of the displayed information. Click the \ Graph \ configuration help link for an explanation of the configuration options.

\ \

Methods

\

\ Best-in-genome blastz pairwise alignments of human-mouse and\ human-rat were multiply aligned using a program called humor\ (HUman-MOuse-Rat), which is a special variant of the Multiz\ program. Multiz was used first to align the humor results with\ reciprocal best human-chimp alignments, and then to align the \ human-chimp-mouse-rat multiple alignment with best-in-genome blastz \ human-chicken alignments. The resulting\ human-chimp-mouse-rat-chicken multiple alignments were then assigned \ conservation scores by phylo-HMM.

\

\ A phylo-HMM is a probabilistic model that describes both the process\ of DNA substitution at each site in a genome, and the way this process\ changes from one site to the next (Felsenstein and Churchill 1996,\ Yang 1995, Siepel and Haussler 2003, Siepel and Haussler 2004). \ A phylo-HMM can be thought\ of as a machine that generates a multiple alignment, in the same way\ that an ordinary hidden Markov model (HMM) generates an individual\ sequence. While the states of an ordinary HMM are associated with\ simple multinomial probability distributions, the states of a\ phylo-HMM are associated with more complex distributions defined by\ probabilistic phylogenetic models. These distributions can capture\ differences in the rates and patterns of nucleotide\ substitution observed in different types of genomic regions (e.g., coding\ or noncoding regions, conserved or nonconserved regions).

\

\ To compute a conservation score, we use a\ k-state phylo-HMM, whose k associated phylogenetic\ models differ only in overall evolutionary rate (Felsenstein and\ Churchill 1996, Yang 1995). In the image at right, there are three\ k states, \ S1, S2, and S3, but in practice we \ use k = 10. \ A phylogenetic model is estimated globally, using the discrete gamma model\ for rate variation (Yang 1994), then a scaled version of the estimated model\ is associated with each state in a phylo-HMM. There is a\ separate "rate constant", ri, for each state i, \ which is multiplied by all branch lengths in the globally estimated model.\ The transition probabilities between states allow for autocorrelation of\ substitution rates, i.e., for adjacent sites to tend to exhibit similar\ overall substitution rates. A single parameter, lambda, describes the\ degree of autocorrelation and defines all transition probabilities. \ Here, we have estimated the rate constants from the data,\ similarly to Yang (1995) (Siepel and Haussler 2003), but have\ allowed lambda to be treated as a tuning parameter. For the\ conservation score, we use the posterior probability that each site was\ "generated" by the state having the smallest rate constant. Because of\ the way the rate categories are defined, the plotted values can be\ thought of as approximately representing the posterior probability that\ each site is among the 10% most conserved sites in the data set\ (allowing for autocorrelation of substitution rates).

\

\ In this case, the general reversible (REV) substitution model was\ used in parameter estimation, and lambda was set to 0.9. Alignment\ gaps were treated as missing data, which sometimes has the effect of\ producing undesirably high posterior probabilities in gappy regions of\ the alignment. We are looking at several possible ways of improving\ the handling of alignment gaps.

\ \

Credits

\

\ This track was created at UCSC using the following programs:\

    \
  • \ Blastz and multiz from Minmei Hou, Scott Schwartz and Webb Miller of the \ Penn State Bioinformatics \ Group. \
  • \ AxtBest, axtChain, chainNet, netSyntenic, and netClass \ developed by Jim Kent at UCSC. \
  • Tree estimation and phylo-HMM software by Adam Siepel at Cornell University.\
  • "Wiggle track" plotting software by Hiram Clawson at UCSC.\
\

\

The phylogenetic tree is based on Murphy et al. (2001) and general\ consensus in the vertebrate phylogeny community.\

\ \

References

\ \

Phylo-HMMs and phastCons

\

\ Felsenstein, J. and Churchill, G.A.\ A hidden Markov model approach to\ variation among sites in rate of evolution.\ Mol Biol Evol 13, 93-104 (1996).

\

\ Siepel, A. and Haussler, D. Phylogenetic hidden Markov models.\ In R. Nielsen, ed., Statistical Methods in Molecular Evolution,\ pp. 325-351, Springer, New York (2005).

\

\ Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom,\ K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M.,\ Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D.\ Evolutionarily conserved elements in vertebrate, insect, worm,\ and yeast genomes.\ Genome Res. 15, 1034-1050 (2005).

\

\ Yang, Z.\ A space-time process model for the evolution of DNA\ sequences. Genetics, 139, 993-1005 (1995).

\ \

Chain/Net:

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron:\ Duplication, deletion, and rearrangement in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).\ \

Multiz:

\

\ Blanchette, M., Kent, W.J., Riemer, C., Elnitski, .L, Smit, A.F.A., Roskin,\ K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D.,\ Miller, W.\ Aligning Multiple Genomic Sequences with the Threaded Blockset Aligner.\ Genome Res. 14(4), 708-15 (2004).\ \

Blastz:

\

\ Chiaromonte, F., Yap, V.B., and Miller, W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \

Phylogenetic Tree:

\

\ Murphy, W.J., et al.\ Resolution of the early placental mammal radiation using Bayesian phylogenetics.\ Science 294(5550), 2348-51 (2001).

\ compGeno 1 autoScale Off\ group compGeno\ longLabel Human/Chimp/Mouse/Rat/Chicken Multiz Alignments & PhyloHMM Cons\ maxHeightPixels 100:40:11\ pairwise hmrg\ priority 100\ shortLabel Conservation\ spanList 1\ speciesOrder panTro1 mm3 rn3 galGal2\ track mzPt1Mm3Rn3Gg2_pHMM\ treeImage phylo/hg16_5way.gif\ type wigMaf 0.0 1.0\ visibility pack\ wiggle mzPt1Mm3Rn3Gg2_pHMM_wig\ yLineOnOff Off\ cpgIslandExt CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 0 100 0 100 0 128 228 128 0 0 0

Description

\

\ CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time,\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some other reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.\

\ \

Methods

\

\ CpG islands were predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment was then\ evaluated for the following criteria:\

    \
  • GC content of 50% or greater \
  • length greater than 200 bp\
  • ratio greater than 0.6 of observed number of CG dinucleotides to the \ expected number on the basis of the number of Gs and Cs in the segment \
\

\ The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length. The ratio of observed to expected \ CpG is calculated according to the formula cited in \ Gardiner-Garden et al. (1987) in the References section below: \

\
    Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)\
\ where N = length of sequence.\

\ \

Credits

\

\ This track was generated using a\ modification of a program developed by G. Miklem and L. Hillier (unpublished).

\ \

References

\

\ Gardiner-Garden M, Frommer M. \ CpG islands in vertebrate genomes.\ J. Mol. Biol. 1987 Jul 20;196(2):261-82.

\ regulation 1 altColor 128,228,128\ color 0,100,0\ group regulation\ longLabel CpG Islands (Islands < 300 Bases are Light Green)\ priority 0\ shortLabel CpG Islands\ track cpgIslandExt\ type bed 4 +\ visibility hide\ eioJcviNAS EIO/JCVI NAS bed 3 . Eur. Inst. Oncology/J. C. Venter Inst. Nuclease Accessible Sites 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ Genes in metazoa are controlled by a complex array of cis-regulatory elements \ that include core and distal promoters, enhancers, insulators, silencers, etc. \ (Levine and Tjian, 2003). In living cells, functionally active cis-regulatory \ elements bear a unifying feature, which is a chromatin-based epigenetic \ signature known as nuclease hypersensitivity (Elgin, 1988; Gross and Garrard, \ 1988; Wolffe, 1998). This track presents the results of a collaboration \ between J. Craig Venter Institute (JCVI, Rockville MD) and the European \ Institute of Oncology (Milan, Italy) to isolate nuclease accessible sites \ (NAS) from primary human CD34+ hematopoietic stem and progenitor cells, and \ from CD34- cells, maturating myeloid cells generated by in vitro \ differentiation of CD34+ cells (Gargiulo et al., submitted). This effort made \ use of a method (originally developed at Sangamo BioSciences, Richmond, CA) to \ isolate such NAS from living cells using restriction enzymes (RE), leading to \ minimal, if any, contamination from bulk DNA. High throughput 454 sequencing \ was then used to generate NAS libraries in CD34+ and CD34- cells: this \ technology has been named "NA-Seq" (Gargiulo et al., submitted).\

Display Conventions

\

\ The track annotates the location of NAS in the genome of human CD34+ and CD34- \ cells in the form of tags, generated by NA-Seq and obtained by merging NAS \ within 600 bp. Note that the method identifies a specific position in chromatin \ that is sensitive to nucleases, but does not map the boundaries of a \ regulatory element per se. A conservative estimate of element size would be \ the space occupied by one nucleosome, i.e., 180 - 200 bp surrounding the tag, \ although there is precedent in the literature for nuclease hypersensitive \ sites that span more than the length of one nucleosome (Turner, 2001; Wolffe, \ 1998; Boyle, 2008).\

Methods

\

\ CD34+ cells (enriched in hematopoietic stem and progenitor cells) were \ prepared from healthy donors following guidelines established by the Ethics \ Committee of the European Institute of Oncology (IEO), Milan. Mobilization of \ CD34+ cells to the peripheral blood was stimulated by G-CSF treatment \ according to standard procedures. After mobilization, donors were subjected to \ leukaphereses, and <10% of the sample was used in the experiment. CD34+ \ cells were purified using a magnetic positive selection procedure ("EASYSEP"; \ Stemcell, Vancouver, Canada). Purity of separation was evaluated by FACS \ after staining with an anti-Human CD34 FITC-conjugate antibody (Stemcell). \ Upon purification, the cell cycle status of the CD34+ cells was monitored by \ propidium iodide staining and FACS analysis. G0/G1 cells varied from \ approximately 90% to >95% of the total cells. Cells were immediately used \ for the isolation of NAS using the nuclease hypersensitive site isolation \ protocol (Gargiulo et al., submitted).\

Verification

\

\ The method was initially validated on human tissue culture cells by examining \ the colocalization of DNA fragments isolated from cells with experimentally \ determined nuclease hypersensitive sites in chromatin as mapped by indirect \ end-labeling and Southern blotting (Nedospasov and Georgiev, 1980; Wu, 1980). \ Nineteen out of nineteen randomly chosen clones from those libraries \ represented bona fide DNAse I hypersensitive sites in chromatin (Fyodor Urnov, \ unpublished results). These data confirmed that the method yields very \ high-content libraries of active cis-regulatory DNA elements, supporting its \ application to human CD34+ cells. In collaboration with scientists at the J. \ Craig Venter Institute and the European Institute of Oncology, libraries of \ NAS were prepared using this method in HT 454 sequencing from CD34+ and CD34- \ cells, and showed that 41 out of 51 randomly chosen clones - >80% - \ coincided with DNAse I hypersensitive sites (Gargiulo et al., submitted).\

Credits

\

\ The library of Nuclease Accessible sites (NAS) from human CD34+/CD34- cells \ was prepared and validated by Saverio Minucci and colleagues at the European \ Institute of Oncology. Sequencing was performed by Sam Levy and colleagues \ (J. Craig Venter Institute). This method was initially developed and validated \ by Fyodor Urnov, Alan Wolffe, and colleagues at Sangamo BioSciences, Inc. \

References

\

\ Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, Furey TS, \ Crawford GE.\ \ High-resolution mapping and characterization of open chromatin across the \ genome. Cell. 25 Jan 2008;132(2):311-22.

\

\ Elgin SC. \ \ The formation and function of DNase I hypersensitive sites in the process \ of gene activation.\ J Biol Chem. 25 Dec 1988;263(36):19259-62.

\

\ Gargiulo G, Levy S, et al. A Global Analysis of chromatin Accessibility and \ Dynamics during Hematopoietic Differentiation. Submitted.

\

\ Gross DS, Garrard WT. \ \ Nuclease hypersensitive sites in chromatin. \ Ann Rev Biochem. Jul 1988;57:159-97.

\

\ Levine M, Tjian R. \ \ Transcription regulation and animal diversity. \ Nature. 10 Jul 2003;424(6945):147-51.

\

\ Nedospasov SA, Georgiev GP. \ \ Non-random cleavage of SV40 DNA in the compact minichromosome and free in \ solution by micrococcal nuclease. \ Biochem Biophys Res Commun. 29 Jan 1980;92(2):532-9.

\

\ Turner BM. \ \ Chromatin and Gene Regulation: Mechanisms in Epigenetics.\ Blackwell Science Ltd., Oxford. 2001.

\

\ Wolffe AP. Chromatin: Structure and Function. \ Academic Press, San Diego, CA. 1998.

\

\ Wu C. \ \ The 5' ends of Drosophila heat shock genes in chromatin are \ hypersensitive to DNase I. \ Nature. 1980 Aug 28;286(5776):854-60.

\ regulation 1 compositeTrack on\ group regulation\ longLabel Eur. Inst. Oncology/J. C. Venter Inst. Nuclease Accessible Sites\ priority 0\ shortLabel EIO/JCVI NAS\ track eioJcviNAS\ type bed 3 .\ visibility hide\ eponine Eponine TSS bed 4 + Eponine Predicted Transcription Start Sites 0 100 0 100 100 127 177 177 0 0 0

Description

\

\ The Eponine program provides a probabilistic method for detecting \ transcription start sites (TSS) in mammalian genomic sequence, with \ good specificity and excellent positional accuracy.

\ \

Methods

\

\ Eponine models consist of a set of DNA weight matrices recognizing\ specific sequence motifs. Each of these is associated with a position\ distribution relative to the TSS.

\ \

\ Eponine has been tested by comparing the output with annotated mRNAs\ from human chromosome 22. From this work, we estimate that using the\ default threshold (0.999) it detects >50% of transcription start\ sites with approximately 70% specificity. However, it does not always\ predict the direction of transcription correctly—an effect that\ seems to be common among computational TSS finders.

\ \

Credits

\

\ Thanks to Thomas Down at the \ Sanger Institute \ for providing the \ Eponine program (version 2, March 6, 2002) which was run \ at UCSC to produce this track.

\ \

References

\

\ Down TA, Hubbard TJP. \ \ Computational detection and location of transcription start sites \ in mammalian genomic DNA. \ Genome Res. 2002 Mar;12(3):458-61.

\ regulation 1 color 0,100,100\ group regulation\ longLabel Eponine Predicted Transcription Start Sites\ priority 0\ shortLabel Eponine TSS\ track eponine\ type bed 4 +\ visibility hide\ firstEF FirstEF bed 6 . FirstEF: First-Exon and Promoter Prediction 0 100 0 0 0 127 127 127 1 0 0 http://rulai.cshl.org/tools/FirstEF/Readme/README.html

Description

\ \

This track shows predictions from\ the FirstEF\ (First Exon Finder) program.

\ \

Three types of predictions are displayed: exon, promoter and CpG window. \ If two consecutive predictions are separated by less than 1000 bp, \ FirstEF treats them as one cluster of alternative first exons that may \ belong to same gene. The cluster number is displayed in the parentheses \ of each item. For example, "exon(405-)" \ represents the exon prediction in cluster number 405 on the minus strand. \ The exon, promoter and CpG-window are interconnected by this cluster number. \ Alternative predictions within the same cluster are denoted by "#N" \ where "N" is the serial number of an alternative prediction in the \ cluster.

\ \

Each predicted exon is either CpG-related or non-CpG-related, based on\ a score of the frequency of CpG dinucleotides.\ An exon is classified as CpG-related if the CpG score is greater \ than a threshold value, and non-CpG-related if less than the threshold. If an \ exon is CpG-related, \ its associated CpG-window is displayed. The browser displays features with higher\ scores in darker shades of gray/black.

\ \

Method

\ \

FirstEF is a 5' terminal exon and promoter\ prediction program. It consists of different discriminant functions structured\ as a decision tree. The probabilistic models are optimized to find potential\ first donor sites and CpG-related and non-CpG-related promoter regions based on\ discriminant analysis. For every potential first donor site (GT) and an upstream\ promoter region, FirstEF decides whether or not the intermediate region can be\ a potential first exon, based on a set of quadratic discriminant functions.\ FirstEF calculates the a posteriori probabilities of exon, donor, and\ promoter for a given GT and an upstream window of length 570 bp.

\ \

For a description of the FirstEF program and the underlying classification \ models, refer to Davuluri et al., 2001. \ \

Credits

\ \

The predictions for this track are produced by Ramana V.\ Davuluri of Ohio State University and Ivo Grosse and\ Michael Q. Zhang of Cold Spring Harbor Lab.\ \

References

\

\ Davuluri RV, Grosse I, Zhang MQ.\ Computational identification of promoters and first exons in the \ human genome. \ Nat Genet. 2001 Dec;29(4):412-7.

\ regulation 1 group regulation\ longLabel FirstEF: First-Exon and Promoter Prediction\ priority 0\ scoreMax 1000\ scoreMin 500\ shortLabel FirstEF\ spectrum on\ track firstEF\ type bed 6 .\ url http://rulai.cshl.org/tools/FirstEF/Readme/README.html\ visibility hide\ affy GNF bed 15 + GNF Gene Expression Atlas Using Affymetrix GeneChips 0 100 0 0 0 127 127 127 0 0 0

GNF Gene Expression Atlas Experiments using Affymetrix GeneChips

\ \

A series of experiments using different normal tissues using Affymetrix GeneChips performed\ by GNF (The Genomics Institute of the\ Novartis Research Foundation) . Alignments displayed on the\ track correspond the the target sequences used by Affymetrix from\ which to choose probes. Color denotes denotes signal intensity on a\ log base 2 scale with darker colors corresponding to lower signal and\ lighter colors corresponding to higher signal. Please note that this\ track is under construction and will not be official until the GNF\ publishes their results.\ \

Track options include the ability to group results by chip type,\ tissue average, and individual chip identification numbers.\ expression 1 expTable affyExps\ group expression\ longLabel GNF Gene Expression Atlas Using Affymetrix GeneChips\ priority 0\ shortLabel GNF\ track affy\ type bed 15 +\ visibility hide\ gnfAtlas2 GNF Atlas 2 expRatio GNF Expression Atlas 2 0 100 0 0 0 127 127 127 0 0 0

Description

\

This track shows expression data from the GNF Gene Expression\ Atlas 2. This contains two replicates each of 79 human\ tissues run over Affymetrix microarrays. \ By default, averages of related tissues are shown. Display all tissues\ by selecting "All Arrays" from the "Combine arrays" menu\ on the track settings page.\ As is standard with microarray data red indicates overexpression in the \ tissue, and green indicates underexpression. You may want to view gene\ expression with the Gene Sorter as well as the Genome Browser.

\ \

Credits

\ Thanks to the \ Genomics Institute of the Novartis \ Research Foundation (GNF) for the data underlying this track. \ \

References

\ \ Su AI et al. \ A \ gene atlas of the mouse and human protein-encoding transcriptomes.\ PNAS 2004;101(16):6062-7.\ expression 1 expDrawExons on\ expScale 4.0\ expStep 0.5\ expTable gnfHumanAtlas2MedianExps\ group expression\ groupings gnfHumanAtlas2Groups\ longLabel GNF Expression Atlas 2\ priority 0\ shortLabel GNF Atlas 2\ track gnfAtlas2\ type expRatio\ visibility hide\ affyRatio GNF Ratio expRatio GNF Gene Expression Atlas Ratios Using Affymetrix GeneChips 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows expression data from GNF (The Genomics Institute of the Novartis Research \ Foundation) using Affymetrix GeneChips. The chip types, chip IDs or tissue \ averages associated with experiments can be displayed by selecting the \ appropriate option from the Experiment Display menu on the track \ description page. For more information, see the Track Configuration section.\

\ \

Methods

\

\ For detailed information about the experiments, see Su et al. 2002 \ in the References section below. Alignments displayed on the track correspond \ to the target sequences used by Affymetrix to choose probes.

\

\ In dense display mode, the track color denotes the average signal over all\ experiments on a log base 2 scale. Lighter colors correspond to lower \ signals and darker colors correspond to higher signals. In full display\ mode, the color of each item represents the log base 2 ratio of the signal \ of that particular experiment to the median signal of all experiments for \ that probe.

\

\ More information about individual probes and probe sets is available on the\ Affymetrix website.

\ \

Track Configuration

\

\ This track may be configured to change the display mode and colors or \ vary the type of experiment information shown. The configuration controls are\ located at the top of the track description page, which is accessed via \ the small button to the left of the track's graphical display or the link \ on the track's control menu. \

    \
  • Display mode: To change the display mode for the track, select \ the desired display setting from the Display Mode pulldown list.\
  • Combine Arrays: All arrays may be displayed with either the chip ID \ or the tissue type as the label. Replicate arrays may also be combined by\ expression medians.\

\

\ When you have finished making changes, click the Submit button to\ commit your changes and return to the Genome Browser tracks display.

\ \

Credits

\

Thanks to GNF for providing these data.

\ \

References

\

\ Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., \ Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A. et al. \ Large-scale analysis of the human and mouse transcriptomes. \ Proc Natl Acad Sci USA 99(7), 4465-70 (2002).

\ expression 1 expDrawExons on\ expScale 3.0\ expStep 0.5\ expTable affyExps\ group expression\ groupings affyRatioGroups\ longLabel GNF Gene Expression Atlas Ratios Using Affymetrix GeneChips\ priority 0\ shortLabel GNF Ratio\ track affyRatio\ type expRatio\ visibility hide\ gladHumES Gstone Arrays expRatio Gladstone Microarray Data 0 100 0 0 0 127 127 127 0 0 0

Description

\

This track shows expression data from the Gladstone Institute. \ In full mode all tissues are\ displayed. In packed or dense mode averages of related tissues are shown.\ As is standard with microarray data, red indicates overexpression in the \ tissue, and green indicates underexpression. The data are also available\ in the Gene Sorter.\ \

Methods

\

For detailed information about the experiments, see\ Abeyta M.J. et al. (2004),\ Unique gene expression signatures of independently derived human embryonic stem cell lines, \ Hum. Mol. Gen.13(6):601-608.\

When calculating expression ratios, the overall expression level in the\ denominator were calculated by first taking the median of replicants for\ each tissue, and then taking the median of these medians.\

\ \

Credits

\ The data for this column was kindly provided by \ Bruce Conklin lab\ at the Gladstone Institute at UCSF.\ \ expression 1 expScale 4.0\ expStep 0.5\ expTable gladHumESExps\ group expression\ longLabel Gladstone Microarray Data\ priority 0\ shortLabel Gstone Arrays\ track gladHumES\ type expRatio\ visibility hide\ humanNormal Human Normal expRatio Expression from Normal Human Tissue 0 100 0 0 0 127 127 127 0 0 0 expression 1 expScale 3.0\ expStep 0.5\ expTable humanNormalExps\ group expression\ groupings humanNormalGroups\ longLabel Expression from Normal Human Tissue\ priority 0\ shortLabel Human Normal\ track humanNormal\ type expRatio\ visibility hide\ illuminaProbes Illumina WG-6 bed 12 . Alignments of Illumina WG-6 3.0 Probe set 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays the probes from the Illumina \ WG-6 3.0 BeadChip.\ The WG-6 BeadChip contains probes for the following set of RNA \ transcripts:\

\ \ \ \ \ \ \ \ \ \ \ \ \ \
Probe sourceNumber of probesNumber of unique probe sources
RefSeq NM (well-established coding transcript)27,45422,435
RefSeq XM (provisional coding transcript)7,8707,518
RefSeq NR (well-established non-coding transcript)446358
RefSeq XR (provisional non-coding transcript)196190
UniGene ESTs12,83712,837
TOTAL48,80343,338
\

\ \

Display

\

\ The track shows the location of the probes on the genome after the RNAs\ they correspond to were all aligned to the genome using BLAT. Alignment\ scores range from 0 to 1000, where 1000 is a perfect score. In the \ display, darker browns are for higher-scoring alignments. \

\

Click on a probe track item to see detailed information about that probe ID.\ View the base-by-base alignment for that probe by clicking the \ "View Alignment" link on the details page.\

\ \

Methods

\

\ The probe set was collected from the NCBI \ GEO \ (Gene Expression Omnibus), and the\ 43,338 RNA sequences were collected from Genbank using NCBI's EUtils interface\ to Entrez. These RNAs were aligned to the genome using BLAT, and 43,224 of them\ aligned well to 46,432 locations on the genome. The single best alignment was\ used, except in 1,789 cases where the RNA mapped equally well to two or more \ locations. The probes were then aligned to their respective RNAs using BLAT,\ and if a good alignment resulted, the probe was then mapped through to the\ genome using the combination of the probe-on-RNA and the RNA-on-genome\ alignments. Of the 48,803 original probes, 40,852 map well\ through this procedure to 44,163 locations on the genome.\

\ expression 1 group expression\ longLabel Alignments of Illumina WG-6 3.0 Probe set\ priority 0\ pslTable illuminaProbesAlign\ seqTable illuminaProbesSeq\ shortLabel Illumina WG-6\ track illuminaProbes\ type bed 12 .\ visibility hide\ transcriptomeJurkat Jurkat wig -600 2000 Expression in the Jurkat Cell Line 0 100 0 128 255 255 128 0 0 0 2 chr7,chr11, expression 0 altColor 255,128,0\ autoScale Off\ chromosomes chr7,chr11,\ color 0,128,255\ group expression\ longLabel Expression in the Jurkat Cell Line\ maxHeightPixels 128\ priority 0\ shortLabel Jurkat\ track transcriptomeJurkat\ type wig -600 2000\ visibility hide\ nci60 NCI60 expRatio Microarray Experiments for NCI 60 Cell Lines 0 100 0 0 0 127 127 127 0 0 0 \

Description

\ \

Expression data from "Systematic variation in gene expression \ patterns in human cancer cell lines" \ [pubmed], Ross et al., Nature Genetics 2000 Mar; 24(3):227-35. \ cDNA microarrays were\ used to explore the variation in expression of approximately 8,000\ unique genes among the 60 cell lines used in the National Cancer\ Institute's screen for anti-cancer drugs. The authors have provided a\ web supplement \ where more data and experimental description can be obtained. cDNA\ probes were placed on the draft human genome using genebank sequences\ referenced by the IMAGE clone ids. \ \

The data are shown in a tabular format in which each column of\ colored boxes represents the variation in transcript levels for a\ given cDNA across all of the array experiments, and each row\ represents the measured transcript levels for all genes in a single\ sample. The variation in transcript levels for each gene is\ represented by a color scale, in which red indicates an increase in\ transcript levels, and green indicates a decrease in transcript\ levels, relative to the reference sample. The saturation of the color\ corresponds to the magnitude of transcript variation. A black color\ indicates an undetectable change in expression, while a gray box\ indicates missing data.\ \

Display Options

\ This track has filter options to customize tissue types presented and\ the color of the display.\ \

Combine Arrays: This option is only valid when the track is \ displayed in full. It determines how the experiments are displayed. The\ options are:\

    \
  • Arrays Grouped by Tissue Median: (default) Displays the median of the log ratio scores of all cell lines \ from the different tissue types.
  • \
  • Arrays Grouped by Tissue Mean: Displays the mean of the log ratio scores of all cell lines \ from the different tissue types.
  • \
  • All Arrays (experiments): Displays the log ratio score for all cell line experiments.\
  • \
\ Color Scheme: \ Data are presented using two color false display. By default\ the Brown/Botstein colors of red -> positive log ratio, green -> negative log ratio are used.\ However, a yellow/blue option can be selected for those who are colorblind.\ \

Details Page

\ On the details page, the probes presented\ correspond to those contained in the window range displayed on the Genome\ Browser. The exon probe and experiment selected are highlighted in\ blue.\ expression 1 expDrawExons on\ expScale 3.0\ expStep 0.5\ expTable nci60Exps\ group expression\ groupings nci60Groups\ longLabel Microarray Experiments for NCI 60 Cell Lines\ priority 0\ shortLabel NCI60\ track nci60\ type expRatio\ visibility hide\ nhgriDnaseHs NHGRI DNaseI-HS bed 5 . NHGRI DNaseI-Hypersensitive Sites 0 100 0 0 0 127 127 127 1 0 0

Description

\

\ This track displays DNaseI-hypersensitive sites in CD4+ T-cells.\ DNaseI-hypersensitive sites are\ associated with gene regulatory regions, particularly for upregulated\ genes. CD4+ T-cells, also known as helper or inducer T cells, are\ involved in generating an immune response. CD4+ T-cells are also\ one of the primary targets of the HIV virus.

\ \

Display Conventions and Configuration

\

\ Gray and black blocks (which appear as vertical lines \ when the display is zoomed-out) represent probable hypersensitive sites.\ The darker the blocks, the more likely the site is to be hypersensitive.

\

\ The display may be filtered to show only those items\ with unnormalized scores that meet or exceed a certain threshhold. To set a\ threshhold, type the minimum score into the text box at the top of the \ description page.

\ \

Methods

\

\ DNaseI-hypersensitive sites were cloned from primary human CD4+ T cells\ and sequenced using massively parallel signature sequencing \ (Brenner et al., 2000; Crawford et al., 2006). \ Only those clusters of multiple DNaseI library sequences that map within 500 \ bases of each other are displayed.\ Each cluster has a unique identifier, visible when the track is displayed\ in full or packed mode. The last digit of each identifier represents the \ number of sequences that map within that particular cluster. The sequence number\ is also reflected in the score, e.g. a cluster of two sequences scores \ 500, three sequences scores 750 and four or more sequences scores 1000.

\ \ Real-time PCR assay was used to verify valid\ DNaseI-hypersensitive sites. Approximately 50% of\ clusters of two sequences are valid. These clusters are shown\ in light gray. 80% of clusters of three sequences are valid, indicated by\ dark gray. 100% of clusters of four or more\ sequences are valid, shown in black.

\ \

Credits

\

\ These data were produced at the \ Collins Lab \ at NHGRI. Thanks to Gregory E. Crawford and Francis S. Collins for supplying \ the information for this track.

\ \

References

\

\ Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, \ McCurdy S, Foy M, Ewan M et al.\ Gene expression analysis by massively parallel signature \ sequencing (MPSS) on microbead arrays.\ Nat. Biotechnol. 2000 Jun;18(6):597-8.\

\ Crawford GE, Holt IE, Mullikin JC, Tai D, Blakesley R, Bouffard G, Young A, \ Masiello C, Green ED, Wolfsberg TG et al.\ Identifying gene regulatory elements by genome-wide recovery of \ DNase hypersensitive sites.\ Proc. Natl. Acad. Sci. USA. 2004 Jan 27;101(4):992-7.

\

\ Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, \ Bernat JA, Ginsburg D et al.\ Genome-wide mapping of DNase hypersensitive \ sites using massively parallel signature sequencing (MPSS).\ Genome Res. 2006 Jan;16(1):123-31.\ (See also NHGRI's\ data site for the project.)

\

\ McArthur M, Gerum S, Stamatoyannopoulos G.\ Quantification of DNaseI-sensitivity by real-time PCR: \ quantitative analysis of DNaseI-hypersensitivity of the mouse beta-globin \ LCR.\ J. Mol. Biol. 2001 Oct 12;313(1):27-34.

\ regulation 1 group regulation\ longLabel NHGRI DNaseI-Hypersensitive Sites\ origAssembly hg16\ priority 0\ shortLabel NHGRI DNaseI-HS\ track nhgriDnaseHs\ type bed 5 .\ useScore 1\ visibility hide\ nibbImageProbes NIBB Frog Images psl xeno Xenopus Laevis In Situ mRNA Probes from NIBB 0 100 50 0 100 152 127 177 0 0 0 expression 1 color 50,0,100\ group expression\ longLabel Xenopus Laevis In Situ mRNA Probes from NIBB\ priority 0\ shortLabel NIBB Frog Images\ track nibbImageProbes\ type psl xeno\ visibility hide\ oreganno ORegAnno bed 4 + Regulatory elements from ORegAnno 0 100 102 102 0 178 178 127 0 0 0

Description

\

\ This track displays literature-curated regulatory regions, transcription\ factor binding sites, and regulatory polymorphisms from\ ORegAnno (Open Regulatory Annotation). For more detailed\ information on a particular regulatory element, follow the link to ORegAnno\ from the details page. \ \

\ \

Display Conventions and Configuration

\ \

The display may be filtered to show only selected region types, such as:

\ \
    \
  • regulatory regions (shown in dark green)
  • \
  • regulatory polymorphisms (shown in red)
  • \
  • transcription factor binding sites (shown in light green)
  • \
\ \

To exclude a region type, uncheck the appropriate box in the list at the top of \ the Track Settings page.

\ \

Methods

\

\ An ORegAnno record describes an experimentally proven and published regulatory\ region (promoter, enhancer, etc.), transcription factor binding site, or\ regulatory polymorphism. Each annotation must have the following attributes:\

    \
  • A stable ORegAnno identifier.\
  • A valid taxonomy ID from the NCBI taxonomy database.\
  • A valid PubMed reference. \
  • A target gene that is either user-defined, in Entrez Gene or in EnsEMBL.\
  • A sequence with at least 40 flanking bases (preferably more) to allow the\ site to be mapped to any release of an associated genome.\
  • At least one piece of specific experimental evidence, including the\ biological technique used to discover the regulatory sequence. (Currently\ only the evidence subtypes are supplied with the UCSC track.)\
  • A positive, neutral or negative outcome based on the experimental results\ from the primary reference. (Only records with a positive outcome are currently\ included in the UCSC track.)\
\ The following attributes are optionally included:\
    \
  • A transcription factor that is either user-defined, in Entrez Gene\ or in EnsEMBL.\
  • A specific cell type for each piece of experimental evidence, using the\ eVOC cell type ontology.\
  • A specific dataset identifier (e.g. the REDfly dataset) that allows\ external curators to manage particular annotation sets using ORegAnno's\ curation tools.\
  • A "search space" sequence that specifies the region that was\ assayed, not just the regulatory sequence. \
  • A dbSNP identifier and type of variant (germline, somatic or artificial)\ for regulatory polymorphisms.\
\ Mapping to genome coordinates is performed periodically to current genome\ builds by BLAST sequence alignment. \ The information provided in this track represents an abbreviated summary of the \ details for each ORegAnno record. Please visit the official ORegAnno entry\ (by clicking on the ORegAnno link on the details page of a specific regulatory\ element) for complete details such as evidence descriptions, comments,\ validation score history, etc.\

\ \

Credits

\

\ ORegAnno core team and principal contacts: Stephen Montgomery, Obi Griffith, \ and Steven Jones from Canada's Michael Smith Genome Sciences Centre, Vancouver, \ British Columbia, Canada.

\

\ The ORegAnno community (please see individual citations for various\ features): ORegAnno Citation.\ \

References

\

\ Griffith OL, Montgomery SB, Bernier B, Chu B, Kasaian K, Aerts S, Mahony\ S, Sleumer MC, Bilenky M, Haeussler M, et al.\ ORegAnno: an open-access community-driven resource for regulatory annotation. \ Nucleic Acids Res. 2008 Jan;36(Database issue):D107-13.\

\

\ Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M, Pleasance ED, \ Prychyna Y, Zhang X, Jones SJ. \ ORegAnno: an open access database and curation system for \ literature-derived promoters, transcription factor binding sites and \ regulatory variation.\ Bioinformatics. 2006 Mar 1;22(5):637-40.\

\ regulation 1 color 102,102,0\ group regulation\ longLabel Regulatory elements from ORegAnno\ priority 0\ shortLabel ORegAnno\ track oreganno\ type bed 4 +\ visibility hide\ transcriptomePC3 PC3 wig -600 2000 Expression in the PC3 Cell Line 0 100 0 128 255 255 128 0 0 0 2 chr7,chr11, expression 0 altColor 255,128,0\ autoScale Off\ chromosomes chr7,chr11,\ color 0,128,255\ group expression\ longLabel Expression in the PC3 Cell Line\ maxHeightPixels 128\ priority 0\ shortLabel PC3\ track transcriptomePC3\ type wig -600 2000\ visibility hide\ regpotent Reg. Potential sample 0 8 Human/Mouse Regulatory Potential Score 0 100 100 50 0 50 150 128 0 0 0 regulation 0 altColor 50,150,128\ color 100,50,0\ group regulation\ longLabel Human/Mouse Regulatory Potential Score\ priority 0\ shortLabel Reg. Potential\ track regpotent\ type sample 0 8\ visibility hide\ sestanBrainAtlas Sestan Brain expRatio Sestan Lab Human Brain Atlas Microarrays 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays exon microarray expression data from the late mid-fetal\ human brain, generated by the Sestan Lab at Yale University. The data represent\ 13 brain regions, including nine areas of neocortex, and both hemispheres. By\ default, arrays are grouped by the median for each brain region, including each\ neocortical area. Alternatively, neocortex areas can be grouped together;\ arrays can be grouped by mean; or all 95 arrays can be shown individually. \

\

Methods

\

\ RNA was isolated from 13 brain regions, from both hemispheres, of four late\ mid-fetal human brains, with a total PMI of less than one hour, and hybridized\ to Affymetrix Human Exon 1.0 ST arrays. Affymetrix CEL files were imported into\ Partek GS using Robust Multichip Average (RMA) background correction, quantile\ normalization, and GC content correction. The normalized data were then\ converted to log-ratios, relative to arrays hybridized with RNA pooled from all\ regions of the same brain. Signal log-ratios are displayed here as green for\ negative (underexpression) and red for positive (overexpression). \

\ The probe set for this microarray track can be displayed by turning on the\ Affy HuEx 1.0 track. Core, extended, and full probe sets are shown.\ "Bounded" probe sets - exons that lie within the intron of more than\ one gene - and potentially cross-hybridizing probe sets were filtered from this\ dataset, leaving ~875K probe sets.\

\

Credits

\

\ The data for this track were generated and analyzed by Matthew B. Johnson,\ Yuka Imamura Kawasawa, Christopher Mason, and the \ Yale \ Neuroscience Microarray Center.\

\

Links

\

\ The raw microarray data are available via the\ NCBI Gene Expression\ Omnibus.\

\ More information is available at\ www.humanbrainatlas.org.\

\ expression 1 expScale 3.0\ expStep 0.5\ expTable sestanBrainAtlasExps\ group expression\ groupings sestanBrainAtlasGroups\ longLabel Sestan Lab Human Brain Atlas Microarrays\ priority 0\ shortLabel Sestan Brain\ track sestanBrainAtlas\ type expRatio\ visibility hide\ snpMap SNP Map bed 4 . Simple Nucleotide Polymorphisms (SNPs) 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track consolidates all the Simple Nucleotide Polymorphisms\ into a single track. It is the union of the Overlap SNPs,\ Random SNPs, Affymetrix 120K SNP, and Affymetrix 10K SNP tracks that\ previously existed in the Genome Browser.

\ \ \ Variant Sources\
    \
  • \ The Random variants were detected by aligning reads from random\ genomic clones from a diverse pool of human DNA against the\ genome.
          (These were previously located in \ the Random SNPs track.)\
  • \ The Mixed variants were detected by both clone overlaps and the\ random genomic clones.
          (These were \ previously located in the Random SNPs track.)\
  • \ The BAC Overlaps variants were detected primarily by\ looking at overlaps between clones that cover the same region of the\ genome.
          (These were previously located in \ the Overlap SNPs track.)\
  • \ The Other variants were found using alternative\ methods.
          (These were previously located\ in the Overlap SNPs track.)\
  • \ The Affymetrix Genotyping Array 10K variants are those Single\ Nucleotide Polymorphisms that have been included in the Affymetrix\ 10K SNP Genotyping Array.
          (These were \ previously located in the Affymetrix 10K SNP track.)\
  • \ The Affymetrix Genotyping Array 120K variants are those Single\ Nucleotide Polymorphisms that have been included in the Affymetrix\ 120K SNP Genotyping Array.
          (These were \ previously located in the Affymetrix 120K SNP track.)\
\ \
\ Variant Types\
    \
  • Single Nucleotide Polymorphisms: variants where the two alleles are each a single base long.\
  • Insertions and Deletions: variants where one allele contains no bases.\
  • Segmental Mutations: variants with multiple nucleotide differences.\
\ \

Filtering

\

\ The SNPs in this track include all known polymorphisms that\ can be mapped against the current assembly. These include known point\ mutations (Single Nucleotide Polymorphisms), insertions, deletions,\ and segmental mutations from the current build of \ dbSnp, which is \ shown in the Genome Browser release log.\

\

\ There are three major cases that are not mapped and/or annotated:\

    \
  • \ Submissions that are completely masked as repetitive elements. \ These are dropped from any further computations. This set of\ reference SNPs is can be found in chromosome "rs_chMasked"\ on the dbSNP ftp\ site.\
  • \ Submissions that are defined in a cDNA context with extensive\ splicing. These SNPs are typically annotated on refSeq mRNAs through a\ separate annotation process. Effort is being made to reverse map these\ variations back to contig coordinates, but that has not been\ implemented. For now, you can find this set of variations in\ "rs_chNotOn" on the dbSNP ftp site.\
  • \ Submissions with excessive hits to the genome. Variations with 3+ hits\ to the genome are not included in the tracks, but are available in\ "rs_chMulti" on the dbSNP ftp site.\
\

\

The heuristics for the non-SNP variations (i.e. named elements and\ STRs) are quite conservative; therefore, some of these are probably lost. This\ approach was chosen to avoid false annotation of variation in\ inappropriate locations.

\ \

Supporting Details

\

\ Positional information can be found in the annotations section\ of the Genome Browser \ downloads page, \ which is organized by species and assembly. Non-positional information\ can be found in the \ shared\ data section of the same page, where it is split into tables by\ organism: \ dbSnpRsHg for Human, \ dbSnpRsMm for Mouse, and \ dbSnpRsRn for Rat.\ \

Credits

\

\ Thanks to the SNP\ Consortium and NIH for providing the public data, which are available from \ dbSnp at \ NCBI.

\

\ Thanks to Perlegen Sciences, \ Inc. for providing additional SNPs from their database.\ Additional information about the Perlegen SNP discovery process can be\ found in Patil, N. (2001) \ \ Blocks of Limited Haplotype Diversity Revealed by High-Resolution\ Scanning of Human Chromosome 21. Science 294:1719-1723.\

\

\ Thanks to Affymetrix, Inc. for developing the genotyping\ arrays. For more details on this genotyping assay, please see the\ supplemental information on the \ Affymetrix 10K SNP and \ Affymetrix 120K SNP products. Additional information, \ including genotyping data, is available from the details pages for the \ Affymetrix 120K SNP and Affymetrix 10K SNP tracks.

\ \

Terms of Use for the Affymetrix data

\

Please see the Terms and Conditions page on the Affymetrix website for \ restrictions on the use of their data. \ \ \ \ varRep 1 group varRep\ longLabel Simple Nucleotide Polymorphisms (SNPs)\ priority 100\ shortLabel SNP Map\ track snpMap\ type bed 4 .\ visibility hide\ snp SNPs bed 6 + Simple Nucleotide Polymorphisms (SNPs) 1 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track consolidates all the Simple Nucleotide Polymorphisms (SNPs) into\ a single track. This represents data from dbSnp and commercially-available \ genotyping arrays.\

\

\ Please be aware that some mapping inconsistencies are known to exist in \ the dbSnp data set. If you encounter information that seems incorrect on \ the details page for a variant, we advise you to verify the record information\ on the dbSnp website using the provided link. In some\ known instances, the size of the variant does not match the size of its \ genomic location; UCSC is working with dbSnp to correct these errors in\ the data set. \

\

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\

\ When the start coordinate for a SNP is shown as chromStart = chromEnd+1 on \ the SNP's details page, this is generally not an \ error; rather, it indicates that the variant is an insertion at this genomic\ position. In these instances, the location type will be set to \ "between". Note that insertions are represented as chromStart = \ chromEnd in the snp table accessible from the Table Browser \ or downloads server, due to the half-open zero-based representation of\ data in the underlying database. \

\

\ The colors of variants in the display may be changed to highlight\ their source, molecule type, variant class, validation status, or\ functional classification. Variants can be excluded from the display\ based on these same criteria or if they fall below the\ user-specified minimum \ \ average heterozygosity. The track configuration options are\ located at the top of the SNPs track\ description page. By default variants are colored by functional\ classification, with SNPs likely to cause a phenotype in red\ (non-synonymous and splice site mutations).\

\

\ The following configuration categories reflect the following definitions defined\ in the document type definition (DTD) that describes the \ dbSnp XML format. \

\
    \
  • \ \ Source: Origin of this data
    \
      \
    • dbSnp - From the current build of dbSnp\
    • Affymetrix Genotyping Array 10K - SNPs on the commercial array\
    • Affymetrix Genotyping Array 10K v2 - SNPs on the commercial array\
    • Affymetrix Genotyping Array 50K HindIII - SNPs on the commercial array\
    • Affymetrix Genotyping Array 50K XbaI - SNPs on the commercial array\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Unknown - sample type not known\
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Mitochondrial - variant discovered using a mitochondrial template\
    • Chloroplast - variant discovered using a chloroplast template\
    \
  • \
  • \ \ Variant Class: Variant classification
    \
      \
    • Unknown - no classification provided by data contributor\
    • Single Nucleotide Polymorphism - single nucleotide \ \ variation: alleles of length = 1 and from set of {A,T,C,G}\
    • Insertion/deletion - insertion/deletion variation: alleles \ \ of different length or include '-' character\
    • Heterozygous - heterozygous (undetermined) variation: \ \ allele contains string '(heterozygous)'\
    • Microsatellite - microsatellite variation: allele string \ \ contains numbers and '(motif)' pattern\
    • Named - insertion/deletion of named object (length unknown)\
    • No Variation - no variation asserted for sequence\
    • Mixed - mixed class\
    • Multiple Nucleotide Polymorphism - alleles of the same \ \ length, length > 1, and from set of {A,T,C,G}\
    \
  • \
  • \ \ Validation Status: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • Unknown - no validation has been reported for this refSNP\
    • Other Population - at least one ss in cluster was validated\ \ by independent assay\
    • By Frequency - at least one subsnp in cluster has frequency\ \ data submitted\
    • By Cluster - cluster has 2+ submissions, with 1+ \ \ submissions assayed with a non-computational method\
    • By 2 Hit/2 Allele - all alleles have been observed in 2+ \ \ chromosomes\
    • By HapMap - validated by HapMap project\
    • By Genotype - at least one genotype reported for this refSNP\
    \
  • \
  • \ \ Function: Predicted functional role (each \ \ variant may have more than one functional role)
    \
      \
    • Unknown - no known functional classification\
    • Locus Region - variation in region of gene, but not in \ \ transcript\
    • Coding - variation in coding region of gene, assigned if \ \ allele-specific class unknown\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to contig seq\
    • Coding - Non-Synonymous - change in peptide with respect to\ \ contig sequence\
    • mRNA/UTR - variation in transcript, but not in coding \ \ region interval\
    • Intron - variation in intron, but not in first two or last \ \ two bases of intron\
    • Splice Site - variation in first two or last two bases of \ \ intron\
    • Reference - allele observed in reference contig sequence\
    • Exception - variation in coding region with exception \ \ raised on alignment. This occurs when protein with gap in sequence is \ \ aligned back to contig sequence. Variations that are on the 3' side \ \ of the gap have undefined functional inference.\
    \
  • \
  • \ \ Location Type: Describes how a segment of the reference assembly \ \ must be altered to represent the variant SNP allele
    \
      \
    • Unknown - undefined or error\
    • Range - a range of two or more bases in the reference \ \ assembly must be altered. This occurs, for example, when the variant\ \ allele is a deletion of two or more bases relative to the allele \ \ represented by the reference assembly.\
    • Exact - one base in the reference assembly must be altered.\ \ This occurs when the variant allele is a single-base substitution\ \ relative to the reference genome or when the variant allele is a \ \ deletion of a single base.\
    • Between - no reference assembly bases must be altered.\ \ This occurs when the variant allele is an insertion of one or more\ \ bases relative to the allele represented by the reference assembly.\
    \
  • \
\ \ \

Large Scale SNP Annotation at UCSF

\

\ LS-SNP is a database of functional and structural SNP annotations\ with links to protein structure models. Annotations are based on a\ variety of features extracted from protein structure, sequence, and\ evolution. Currently only coding non-synonomous SNPs are included.\ LS-SNP at UCSF.\

\ \

Data Filtering

\

\ The SNPs in this track include all known polymorphisms available in the\ current build of dbSnp that can be mapped against the current assembly. \ The version of dbSnp from which these data were obtained can be found in the\ SNP track entry in the Genome Browser \ release log.\

\

\ There are two reasons that some variants may not be mapped and/or\ annotated in this track:\

\
    \
  • \ Submissions are completely masked as repetitive elements.\ These are dropped from any further computations. This set of\ reference SNPs is found in chromosome "rs_chMasked" on\ the dbSNP\ ftp site.\
  • \
  • \ Submissions are defined in a cDNA context with extensive\ splicing. These SNPs are typically annotated on refSeq mRNAs\ through a separate annotation process. Effort is being made to\ reverse map these variations back to contig coordinates, but\ that has not been implemented. For now, you can find this set of\ variations in "rs_chNotOn" on the dbSNP ftp\ site. \
  • \
\

\ The heuristics for the non-SNP variations (i.e. named elements and\ short tandem repeats (STRs)) are quite conservative; therefore, some of \ these are probably lost. This approach was chosen to avoid false \ annotation of variation in inappropriate locations.\

\ \

Credits and Data Use Restrictions

\

\ Thanks to the SNP\ Consortium and NIH for providing the public data, which are\ available from dbSnp at NCBI.\

\

\ Thanks to Affymetrix, Inc. \ for developing the genotyping arrays. Please see the \ Terms and Conditions page on the Affymetrix\ website for restrictions on the use of their data.\ For more details on the Affymetrix genotyping assay, see the supplemental \ information on the \ Affymetrix 10K SNP and \ Affymetrix Genotyping Array products. Additional \ information, including genotyping data, is available on those pages.\

\

\ Karchin, R., Diekhans, M., Kelly, L., Thomas, D.J., Pieper, U., Eswar, N.,\ Haussler, D. and Sali, A.\ LS-SNP: large-scale annotation of coding non-synonymous SNPs based on \ multiple information sources. \ Bioinformatics 21:2814-2820; April 12, 2005.\

\ varRep 1 group varRep\ longLabel Simple Nucleotide Polymorphisms (SNPs)\ priority 100\ shortLabel SNPs\ track snp\ type bed 6 +\ visibility dense\ stanfordChip Stanf ChIP bedGraph 4 Stanford ChIP-chip (GMO6990, HeLa, HepG2, Jurkat, K562 cells; GABP, SRF, TAF, NRST/REST ChIP) 0 100 120 0 20 150 0 25 0 0 22 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX,

Description

\

\ \ This track represents binding events for the GA-binding protein as assayed by chromatin immunoprecipation on\ Affymetrix whole genome tiling 2.0R arrays in various cell types.\ \

Methods

\ Chromatin IP was performed as described in Johnson et al. (2007) using the monoclonal GABPa antibody (G1, Santa Cruz Biotechnology).\ Libraries were prepared from the ChIP DNA using the standard Affymetrix amplification protocol.\ The amplified ChIP libraries were then fractionated and hybridized to human Affymetrix whole genome 2.0R arrays in biological triplicate.\

\ The data represent peaks based on the array measurements and called by the MAT software (Johnson et al., 2006).\ The following settings were used: BandWidth = 300, MaxGap = 300, MinProbe = 10, Trim = 0.1, pvalue=1e-5.\ \ \

Verification

\

\ \ These data were performed in biological triplicate.\ The authors performed Western blots to test the specificity of the \ antibody, gel electrophoresis to test the proper\ shearing of chromatin, and qPCR assays of several loci before hybridization to arrays.\ \ \

Credits

\

\ \ Myers Group: David Johnson, Betsy Anton, Loan Nguyen, Cat Medina, Richard Myers.\ \

References

\

\ Johnson DS, Mortazavi A, Myers RM, Wold B.\ \ Genome-Wide Mapping of in Vivo Protein-DNA Interactions.\ Science 2007 June;316:1497-1502.\ \

Johnson WE, Li W, Meyer CA, Gottardo R, Carroll JS, Brown M, Liu XS.\ \ Model-based analysis of tiling-arrays for ChIP-chip.\ Proc. Natl. Acad. Sci.2006;103:12457-62.\ \ regulation 0 altColor 150,0,25\ autoScale off\ chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ color 120,0,20\ compositeTrack on\ dataVersion Mar 2007\ group regulation\ longLabel Stanford ChIP-chip (GMO6990, HeLa, HepG2, Jurkat, K562 cells; GABP, SRF, TAF, NRST/REST ChIP)\ maxHeightPixels 128:16:16\ maxLimit 1000\ minLimit 500\ origAssembly hg17\ priority 0\ shortLabel Stanf ChIP\ track stanfordChip\ type bedGraph 4\ viewLimits 0:10\ visibility hide\ promoterStanfordGene Stanford Gene Model bed 12 . Stanford ENCODE Gene Models 0 100 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, regulation 1 chromosomes chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX\ group regulation\ longLabel Stanford ENCODE Gene Models\ priority 0\ shortLabel Stanford Gene Model\ track promoterStanfordGene\ type bed 12 .\ visibility hide\ promoterStanford Stanford Promoters bed 6 . Stanford Promoters 0 100 0 0 0 127 127 127 0 0 0 regulation 1 group regulation\ longLabel Stanford Promoters\ priority 0\ shortLabel Stanford Promoters\ track promoterStanford\ type bed 6 .\ visibility hide\ switchDbTss SwitchGear TSS bed 6 + SwitchGear Genomics Transcription Start Sites 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track describes the location of transcription start sites (TSS) throughout\ the human genome along with a confidence measure for each TSS based on\ experimental evidence. The TSSs of a gene are important landmarks that help \ define the promoter regions of a gene. These TSSs were determined by \ SwitchGear Genomics\ by integrating experimental data using an empirically derived scoring \ function. Each TSS has a unique identifier that associates it with a gene model \ (see details below), and each TSS is color-coded to reflect its confidence\ score.\

\ \

\ These TSSs are also available in a searchable format at \ SwitchDB,\ an open-access online database of human TSSs. Expermental tools are available \ through \ SwitchGear\ to study the function of the promoter regions associated with\ these TSSs.

\ \

Methods

\

\ The predicted TSSs are associated with a genome-wide set of gene models.\ SwitchGear gene models are defined as clusters of cDNA alignments that have\ overlapping exons on the same strand. These gene models were created from over\ 250,000 human cDNA alignments to construct a genome-wide set of ~37,000 gene\ models. Each gene model is identified by its chromosome number, strand, and\ unique identifier. For example, ID CHR7_P0362 \ indicates a cDNA cluster (0362) aligning to the plus strand (P) of\ chromosome 7 (CHR7). Existing gene annotation is mapped to the gene models \ through the NCBI annotation associated with Refseq accession numbers.\

\ \

\ The SwitchGear TSS prediction algorithm identifies the most likely sites of\ transcription initiation for each gene model. The algorithm employs a scoring\ metric to assign a confidence level to each TSS prediction based on existing\ experimental evidence. In addition to the ~250,000 human cDNAs listed in \ Genbank, more than 5 million additional 5' human cDNA sequence tags have been\ generated using a combination of approaches. While these short sequence reads do\ not reveal gene structure, they provide a significant amount of experimental\ evidence for identifying transcript start sites. For each gene model, the\ algorithm counts the number of TSSs (defined as the 5' end of a cDNA) within\ 200 bp of one another. The TSS score is based on the total number of TSSs\ identified within this window, with each TSS weighted according to several\ discriminating features: cDNA library source, relative location within the gene\ model, and exon structure of the transcript. Furthermore, the TSSs for each\ gene model are ranked to identify the TSS representing the most likely\ transcription initiation site for a gene model. Rankings are indicated in the\ TSS unique identifier by the addition of a suffix (i.e. CHR7_P0362_R1 or \ CHR7_P0362_R2).\

\ \

Using the Filter

\

\ This track has a filter that can be used to change the TSS elements displayed \ by the browser. This filter is based on the score of the TSS element. The \ filter is located at the top of the track description page, which is accessed \ via the small button to the left of the track's graphical display or through \ the link on the track's control menu. By default the track displays only those \ TSSs with a score of 10 or above.\

\ \

\ By default, the TSSs for predicted pseudogenes are not displayed. If you would \ like to display them, check the box next to the Include TSSs for predicted \ pseudogenes label.\

\ \

\ When you have finished configuring the filter, click the Submit button.\

\ \

Credits

\

\ This track was created by Nathan Trinklein and Shelley Force Aldred of \ SwitchGear Genomics.\

\ regulation 1 group regulation\ longLabel SwitchGear Genomics Transcription Start Sites\ origAssembly hg17\ priority 0\ shortLabel SwitchGear TSS\ track switchDbTss\ type bed 6 +\ visibility hide\ tfbsCons TFBS Conserved bed 6 + HMR Conserved Transcription Factor Binding Sites 0 100 0 0 0 127 127 127 1 0 0 http://www.gene-regulation.com/cgi-bin/pub/databases/transfac/getTF.cgi?AC=$$

Description

\

\ This track contains the location and score of transcription factor\ binding sites conserved in the human/mouse/rat alignment. A binding\ site is considered to be conserved across the alignment if its score\ meets the threshold score for that binding site in all 3 species.\ The score and threshold are computed with the Transfac Matrix Database (v4.0) created by\ Biobase. \ The data are purely computational, and as such not all binding sites\ listed here are biologically functional binding sites.

\

\ In the graphical display, each box represents one conserved tfbs. The\ darker the box, the better the match of the binding site. Clicking on\ a box brings up detailed information on the binding site, namely its\ Transfac I.D., a link to its Transfac Matrix (free registration with \ Transfac required), its location in the human genome (chromosome, start, end,\ and strand), its length in bases, and its score.

\

\ All binding factors that are known to bind to the particular binding site\ are listed along with their species, SwissProt ID, and a link to that\ factor's page on the UCSC Protein Browser if such an entry exists.

\ \

Methods

\

\ A binding site is considered to be conserved across the alignment if\ its score meets the threshold score for that binding site at exactly the\ same position in the alignment in all 3 species. If there is no orthologous \ sequence in the mouse or the rat, no prediction is made.\ The following is a brief discussion of the scoring and threshold system\ used for these data.

\

\ The Transfac Matrix Database contains position-weight matrices for \ 336 transcription factor binding sites, as characterized through\ experimental results in the scientific literature. A typical (in this\ case ficticious) matrix will look something like:

\
\
\
        A      C      G      T\
01     15     15     15     15      N\
02     20     10     15     15      N\
03      0      0     60      0      G\
04     60      0      0      0      A\
05      0      0      0     60      T\
\
\ The above matrix specifies the results of 60 (the sum of each row)\ experiments. In the experiments, the first position of the binding site\ was A 15 times, C 15 times, G 15 times, and T 15 times (and so on for\ each position.) The consensus sequence of the above binding site as\ characterized by the matrix is NNGAT. The format of the consensus sequence\ is the deduced consensus in the IUPAC 15-letter code.

\

\ The score of a segment of DNA is computed in relation to a matrix as \ follows:\

\
\
score = SUM over each position in the matrix of\
matrix[position][nucleotide_in_segment_at_this_position].\
\
\ \ For example, the sequence "CCGAT" would have a score of:\ 15 + 10 + 60 + 60 + 60 = 205 for the above matrix.\ \ A score in relation to a matrix of length n can be computed for every \ DNA segment of length n.

\ \

\ The threshold for a binding site is computed from its Transfac Matrix\ Database entry as follows:\ \

\
\
          St = Smin + ((Smax - Smin) * C)\
                                                                               \
where     St is the target threshold score\
          Smin is the minimum possible score\
          Smax is the maximum possible score\
          C is the cutoff value used by the scoring function\
\
\ \ For example, the above matrix has a minimum score of \ 15 + 10 + 0 + 0 + 0 = 25 and a maximum score of 15 + 20+ 60 + 60 + 60 = 215.\ Using a cutoff value of 0.85 (the value used for this track), the threshold \ for the above matrix is:\
\
\
25 + ((215 - 25) * 0.85) = 186.5\
\
\ \ As such the sequence "CCGAT" from above would be recorded as a hit with a \ cutoff value of 0.85, since its score (215) exceeds the threshold for this \ particular binding site (186.5.)

\

\ The final score reported is the minimum cutoff value that the position would \ have been recorded as a hit (multiplied by 1000.) The final score of the \ above example is therefore:\ \

\
\
((Score - Smin) / (Smax - Smin)) * 1000 = (205 - 25) / (215 - 25)) = 0.947 * 1000 = 947.\
\
\ Therefore, the final score for the sequence "CCGAT" would be 947.\ Although the scores of all three species in the alignment must exceed the\ threshold, the only final score that is reported for this track is the \ final score of the binding site in the human sequence.

\

\ It should be noted that the positions of many of these conserved binding\ sites coincide with known exons and other highly conserved regions.\ Regions such as these are more likely to contain false positive matches,\ as the high sequence identity across the alignment increases the likelihood of\ a short motif that looks like a binding site to be conserved. Conversely,\ matches found in introns and intergenic regions are more likely to be real\ binding sites, since these regions are mostly poorly conserved.\

\

\ These data were obtained by running the program tfloc (Transcription\ Factor binding site LOCater) on multiz humor alignments of the Feb. 2003 mouse\ draft assembly (mm3) and the June 2003 rat assembly (rn3) to the July 2003 human \ genome assembly (hg16.) Tfloc was run on the subset of the Transfac Matrix\ Database containing human, mouse, and rat related binding sites (164 total.)\ Transcription factor information was culled from the Transfac Factor database.

\ \

\

Credits

\

\ These data were generated using the Transfac Matrix and Factor databases created by\ Biobase.\

\ The tfloc program was developed at The Pennsylvania State University \ by Matt Weirauch.

\

\ This track was created by Matt Weirauch and Brian Raney at The\ University of California at Santa Cruz.

\ regulation 1 group regulation\ longLabel HMR Conserved Transcription Factor Binding Sites\ priority 0\ scoreMax 1000\ scoreMin 830\ shortLabel TFBS Conserved\ spectrum on\ track tfbsCons\ type bed 6 +\ url http://www.gene-regulation.com/cgi-bin/pub/databases/transfac/getTF.cgi?AC=$$\ urlLabel Transfac matrix link:\ visibility hide\ affyTranscription Transcription wig 0 4396.07 Affy. SK-N-AS Transcript Abundance 0 100 175 150 128 255 128 0 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY,

Description

\

\ This track displays transcriptome data from tiling GeneChips produced by \ Affymetrix. For the ten\ chromosomes 6, 7, 13, 14, 19, 20, 21, 22, X, and Y, more than 74 million\ probes were tiled every 5 bp in non-repeat-masked areas \ and hybridized to mRNA from the SK-N-AS cell line. These data\ are a preview of Phase Two of the transcriptome project, which will\ include data from seven additional cell lines when completed. While the\ coverage of the genome is much larger and the probe density greater,\ the general method is similar to the Phase One project carried out on\ chromosomes 21 and 22 (Kapranov, P. et al. \ Large-scale transcriptional activity in chromosomes 21 and \ 22. Science 296(5569), 916-9 (2002)).

\

\ The track is colored blue in areas that are thought to be\ transcribed at a statistically significant level as described in the\ accompanying Transfrags\ (transcribed fragments) track. Transfrags that have a\ significant blat hit elsewhere in the genome are displayed in a lighter\ shade of blue, and transfrags that overlap putative pseudogenes are\ colored an even lighter shade of blue. All other regions of the track\ are colored brown. While the raw data are based on prefect match minus\ mismatch probe (PM - MM) values and may contain negative values, the\ track has a minimum value of 0 for visualization purposes.

\ \

Methods

\

\ For each data point, probes within 30 bp on either side were used to\ improve the estimate of expression level for a particular probe. This\ helps to smooth the data and produce a more robust estimate of the\ transcription level at a particular genomic location. The following\ analysis method was used:\

    \
  • Replicate arrays were quantile-normalized and the median\ intensity (using both PM and MM intensities) of each array was\ scaled to a target value of 44.\
  • The expression level was estimated for each mapped probe position by \
      \
    • collecting all the probe pairs that fell within a window of ±\ 30 bp.\ \
    • calculating all non-redundant pairwise averages of PM - MM\ values of all probe pairs in the window.\ \
    • taking the median of all resulting pairwise averages.\
    \
  • The resulting signal value is the Hodges-Lehmann estimator\ associated with the Wilcoxon signed-rank statistic of the PM - MM\ values that lie within ± 30 bp of the sliding window centered at\ every genomic coordinate.\

\ \

Credits

\

\ Data generation and analysis: Transcriptome group at Affymetrix -\ Bekiranov, S., Brubaker, S., Cheng, J., Dike, S., Drenkow, J., Ghosh, S., \ Gingeras, T., Helt, G., Kampa, D., Kapranov, P., Long, J., Madhavan, G., \ Manak, J., Patel, S., Piccolboni, A., Sementchenko, V., Tammana, H.

\

\ Data presentation in Genome Browser: Chuck Sugnet.

\ expression 0 altColor 255,128,0\ autoScale Off\ chromosomes chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY\ color 175,150,128\ graphTypeDefault Bar\ gridDefault OFF\ group expression\ longLabel Affy. SK-N-AS Transcript Abundance\ maxHeightPixels 128:36:16\ priority 0\ shortLabel Transcription\ track affyTranscription\ type wig 0 4396.07\ viewLimits 0:150\ visibility hide\ wigColorBy affyTransfrags\ affyTranscriptome Transcriptome sample Affymetrix Experimentally Derived Transcriptome 0 100 100 50 0 0 0 255 0 0 2 chr22,chr21,

Description

\

\ Transcriptome data for chromosomes 21 and 22 from Affymetrix, as described in \ P. Kapranov et al. \ Large-Scale Transcriptional Activity in Chromosomes 21 and 22.\ Science, 296(5569):916-9).\ In general, the data presented\ is the perfect match - mis-match value. Different experiments were\ normalized by setting the average value to be the same for each\ chip. Replicates for different cell types were averaged together to\ produce the data seen in "full" mode for each cell type. In dense\ mode, or at the top of the track in full mode, "Transcriptome" displays\ the maximum value over all experiments for that probe, the idea being\ to paint as many transcribed regions as possible. \

\ To present a more\ interpretable display when zoomed out, averages have been precalculated\ over the chromosome at two different resolutions in addition to the\ raw data. For example, when zoomed out, there may appear to be a peak at\ the center of a gene rather than a signal at every exon. Zooming in\ will reveal the "raw" data for that region.\

\ NOTE: Affymetrix transcriptome annotations appear only on chromosomes 21 and 22.\ \

Credits

\

\ Thanks to Affymetrix for providing these data. Questions/Comments? Email \ sugnet@soe.ucsc.edu. \ \ expression 0 altColor 0,0,255\ chromosomes chr22,chr21\ color 100,50,0\ group expression\ longLabel Affymetrix Experimentally Derived Transcriptome\ priority 0\ shortLabel Transcriptome\ track affyTranscriptome\ type sample\ visibility hide\ affyUcla UCLA GeneChip bed 15 + UCLA Affymetrix U133 GeneChip Data 0 100 0 0 0 127 127 127 0 0 0 expression 0 canPack off\ expTable affyUclaExps\ group expression\ longLabel UCLA Affymetrix U133 GeneChip Data\ priority 0\ shortLabel UCLA GeneChip\ track affyUcla\ type bed 15 +\ visibility hide\ affyUclaNorm UCLA Tissues bed 15 + UCLA Affymetrix U133 GeneChip Normal Tissues 0 100 0 0 0 127 127 127 0 0 0 expression 1 chip U133\ expScale 3.0\ expStep 0.5\ expTable affyUclaNormExps\ group expression\ longLabel UCLA Affymetrix U133 GeneChip Normal Tissues\ priority 0\ shortLabel UCLA Tissues\ track affyUclaNorm\ type bed 15 +\ visibility hide\ vistaEnhancers Vista Enhancers bed 5 + Vista HMR-Conserved Non-coding Human Enhancers from LBNL 0 100 50 70 120 152 162 187 1 0 0

Description and Methods

\

\ Excerpted from the \ Vista Enhancer Handbook and Methods page at the Lawrence Berkeley \ National Laboratory (LBNL) website:

\

\ The VISTA Enhancer Browser \ identifies distant-acting transcriptional enhancers in the human genome by \ coupling the identification of evolutionary conserved non-coding sequences with \ a moderate throughput mouse transgenesis enhancer assay.\

\

\ The "Experimental Dataset" of conserved non-coding human sequences has been \ tested for enhancer activity in transgenic mice.\ As of October 2006, approximately 300 elements were found in this portion of \ the database, and this number is expected to grow steadily. These sequences are \ based on UCSC human assembly hg17 (May 2004).\

\

\ Non-coding elements identified as conserved in human, mouse, and rat are lifted\ with primers, and the PCR fragment is cloned into a reporter vector that has a \ minimal promoter fused to LacZ. This is then injected into a fertilized mouse \ egg and the 11.5 day embryo is assayed for activity with lacZ stain. \ Whole-embryo in situ pictures are available on the LBNL website..

\

\ To be defined as a positive enhancer, an element has to show \ reproducible \ expression in the same structure in at least three independent transgenic \ embryos. An element is defined as negative if at least five \ transgenic embryos \ have been obtained, but no reproducible expression was observed in any structure\ in at least three different embryos.

\

\ Users should keep in mind that the assay captures only a single embryonic \ timepoint. A negative result reported in the experimental dataset does not \ necessarily imply that this conserved element is not a transcriptional \ enhancer, because it might be active at earlier or later timepoints in \ development.

\ \

Display Conventions and Configuration

\

\ Elements that tested positive are assigned a score of 900 and \ display in black\ in the annotation track. Those that tested negative are given the \ score 200 and appear as light grey.

\ \

Credits

\

\ Thanks to Len \ Pennacchio at Lawrence Berkeley National Laboratory for providing the \ enhancer track.

\ \

References

\

\ Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, \ Minovitsky S, Dubchak I, Holt A, Lewis KD et al. \ In vivo enhancer analysis of human conserved non-coding \ sequences. \ Nature. 2006 Nov 23;444(7118):499-502.

\ regulation 1 color 50,70,120\ group regulation\ longLabel Vista HMR-Conserved Non-coding Human Enhancers from LBNL\ priority 0\ shortLabel Vista Enhancers\ track vistaEnhancers\ type bed 5 +\ useScore 1\ visibility hide\ snp131Clinical Clin SNPs (131) bed 6 + Clinically Associated Simple Nucleotide Polymorphisms (dbSNP build 131) 0 100.092 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about the clinically associated subset of \ single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 131, available from\ ftp.ncbi.nih.gov/snp.\

\

\ dbSNP marks SNPs as clinically associated when they meet certain criteria \ (submitted by locus-specific database (LSDB); \ in Online Mendelian Inheritance in Man (OMIM);\ has third-party clinical annotation (TPA);\ or is used as a diagnostic).\ The clinically associated subset of SNPs is very small; fewer than 15,000 have been \ mapped to the human reference assembly, out of over 23,000,000 mapped SNPs.\

\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ The configuration categories reflect the following definitions (not all categories apply\ to this assembly):\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name\
    • No Variation - no variation asserted for sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - submitted by HapMap project (human only)\
    • By 1000Genomes - submitted by 1000Genomes project (human only)\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation within 2000 bases of gene, but not \ \ in transcript (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to reference assembly (nonsense, missense, \ frameshift, cds-indel, coding-synonymy-unknown)\
    • Untranslated - variation in transcript, but not in coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation in intron, but not in first two or last two bases of intron\
    • Splice Site - variation in first two or last two bases of \ \ intron (splice-3, splice-5)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 3.\
    \
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to \ particular gene sets. Choose the gene sets from the list on the SNP \ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to. \ When one or more gene tracks are selected, the SNP details page \ lists all genes that the SNP hits (or is close to), with the same keywords \ used in the function category. The function usually \ agrees with NCBI's function, but can sometimes give a bit more detail\ (e.g. more detail about how close a near-gene SNP is to a nearby gene).\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Annotations

\

\ UCSC checks for several unusual conditions that may indicate a problem \ with the mapping, and reports them in the Annotations section if found:\

\
    \
  • The dbSNP reference allele is not the same as the UCSC reference\ allele, i.e. the bases in the mapped position range.
  • \
  • Class is single, in-del, mnp or mixed and the UCSC reference\ allele does not match any observed allele.
  • \
  • In NCBI's alignment of flanking sequences to the genome, part\ of the flanking sequence around the SNP does not align to\ the genome.
  • \
  • Class is single, but the size of the mapped SNP is not one base.
  • \
  • Class is named and indicates an insertion or deletion, but the size\ of the mapped SNP implies otherwise.
  • \
  • Class is single and the format of observed alleles is unexpected.
  • \
  • The length of the observed allele(s) is not available because it is\ too long.
  • \
  • Multiple distinct insertion SNPs have been mapped to this location.
  • \
  • At least one observed allele contains an ambiguous \ IUPAC base (e.g. R, Y, N).
  • \
\ \ Another condition, which does not necessarily imply any problem, is noted:\
    \
  • Class is single and SNP is tri-allelic or quad-allelic.
  • \
\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\

\ Non-repetitive genomic sequence is shown in upper case like the flanking \ sequence, and a "|" indicates each match between genomic and flanking bases.\ Repetitive genomic sequence (annotated by RepeatMasker and/or the\ Tandem Repeats Finder with period <= 12) is shown in lower case, and matching\ bases are indicated by a "+".\

\ \

Data Sources

\

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g. for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b131_SNPContigLoc_37_1.bcp.gz and \ b131_ContigInfo_37_1.bcp.gz. \
  • b131_SNPMapInfo_37_1.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b131_SNPContigLocusId_37_1.bcp.gz.\
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.\
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.\
  • Clinically associated SNPs were obtained from SNP_bitfield.bcp.gz.\ (See bitfield specification.)\
\ \

Orthologous Alleles (human assemblies only)

\

\ Beginning with the March 2006 human assembly, we provide a related table that \ contains orthologous alleles in the chimpanzee and rhesus macaque assemblies.\ Beginning with dbSNP build 129, the orangutan assembly is also included.\ We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are \ a filtered list that meet the criteria:\

    \
  • class = 'single'\
  • chromEnd = chromStart + 1\
  • align to just one location\
  • are not aligned to a chrN_random chrom\
  • are biallelic (not tri or quad allelic)\
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\

\  \ varRep 1 chimpDb panTro2\ chimpOrangMacOrthoTable snp131OrthoPt2Pa2Rm2\ codingAnnoLabel_snp131CodingDbSnp dbSNP\ codingAnnotations snp131CodingDbSnp,\ defaultGeneTracks knownGene\ group varRep\ hapmapPhase III\ longLabel Clinically Associated Simple Nucleotide Polymorphisms (dbSNP build 131)\ macaqueDb rheMac2\ maxWindowToDraw 10000000\ orangDb ponAbe2\ priority 100.092\ shortLabel Clin SNPs (131)\ snpExceptionDesc snp131ExceptionDesc\ snpExceptions snp131Exceptions\ snpSeq snp131Seq\ track snp131Clinical\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility hide\ snp131NonClinical SNPs (131) bed 6 + Simple Nucleotide Polymorphisms (dbSNP build 131) Not Clinically Associated 0 100.093 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$

Description

\ \

\ This track contains information about \ single nucleotide polymorphisms\ and small insertions and deletions (indels) — collectively Simple\ Nucleotide Polymorphisms — from\ dbSNP\ build 131, available from\ ftp.ncbi.nih.gov/snp,\ that have not been clinically associated.\

\

\ Clinically associated SNPs are available in a separate track, \ Clin SNPs (131).\ dbSNP marks SNPs as clinically associated when they meet certain criteria \ (submitted by locus-specific database (LSDB); \ in Online Mendelian Inheritance in Man (OMIM);\ has third-party clinical annotation (TPA);\ or is used as a diagnostic).\ The clinically associated subset of SNPs excluded from this track \ is very small; fewer than 15,000 have been \ mapped to the human reference assembly, out of over 23,000,000 mapped SNPs.\

\ \

Interpreting and Configuring the Graphical Display

\

\ Variants are shown as single tick marks at most zoom levels.\ When viewing the track at or near base-level resolution, the displayed\ width of the SNP corresponds to the width of the variant in the reference\ sequence. Insertions are indicated by a single tick mark displayed between\ two nucleotides, single nucleotide polymorphisms are displayed as the width \ of a single base, and multiple nucleotide variants are represented by a \ block that spans two or more bases.\

\ \

\ The configuration categories reflect the following definitions (not all categories apply\ to this assembly):\

\
    \ \
  • \ \ Class: Describes the observed alleles
    \
      \
    • Single - single nucleotide variation: all observed alleles are single nucleotides\ \ (can have 2, 3 or 4 alleles)\
    • In-del - insertion/deletion\
    • Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'\
    • Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats\
    • Named - the observed allele from dbSNP is given as a text name\
    • No Variation - no variation asserted for sequence\
    • Mixed - the cluster contains submissions from multiple classes\
    • Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}\
    • Insertion - the polymorphism is an insertion relative to the reference assembly\
    • Deletion - the polymorphism is a deletion relative to the reference assembly\
    • Unknown - no classification provided by data contributor\
    \
  • \ \ \
  • \ \ Validation: Method used to validate\ \ the variant (each variant may be validated by more than one method)
    \
      \
    • By Frequency - at least one submitted SNP in cluster has frequency data submitted\
    • By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method\
    • By Submitter - at least one submitter SNP in cluster was validated by independent assay\
    • By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes\
    • By HapMap - submitted by HapMap project (human only)\
    • By 1000Genomes - submitted by 1000Genomes project (human only)\
    • Unknown - no validation has been reported for this variant\
    \
  • \
  • \ \ Function: Predicted functional role \ \ (each variant may have more than one functional role)
    \
      \
    • Locus Region - variation within 2000 bases of gene, but not \ \ in transcript (near-gene-3, near-gene-5)\
    • Coding - Synonymous - no change in peptide for allele with \ \ respect to reference assembly (coding-synon)\
    • Coding - Non-Synonymous - change in peptide for allele with \ \ respect to reference assembly (nonsense, missense, \ frameshift, cds-indel, coding-synonymy-unknown)\
    • Untranslated - variation in transcript, but not in coding \ \ region interval (untranslated-3, untranslated-5)\
    • Intron - variation in intron, but not in first two or last two bases of intron\
    • Splice Site - variation in first two or last two bases of \ \ intron (splice-3, splice-5)\
    • Unknown - no known functional classification\
    \
  • \
  • \ \ Molecule Type: Sample used to find this variant
    \
      \
    • Genomic - variant discovered using a genomic template\
    • cDNA - variant discovered using a cDNA template\
    • Unknown - sample type not known\
    \
  • \
  • \ \ Average heterozygosity: Calculated by dbSNP as described \ here\
      \
    • Average heterozygosity should not exceed 0.5 for bi-allelic \ single-base substitutions.\
    \
  • \
  • \ \ Weight: Alignment quality assigned by dbSNP
    \
      \
    • Weight can be 0, 1, 2, 3 or 10. \
    • Weight = 1 are the highest quality alignments.\
    • Weight = 0 and weight = 10 are excluded from the data set.\
    • A filter on maximum weight value is supported, which defaults to 3.\
    \
  • \
\ \

\ You can configure this track such that the details page displays\ the function and coding differences relative to \ particular gene sets. Choose the gene sets from the list on the SNP \ configuration page displayed beneath this heading: On details page,\ show function and coding differences relative to. \ When one or more gene tracks are selected, the SNP details page \ lists all genes that the SNP hits (or is close to), with the same keywords \ used in the function category. The function usually \ agrees with NCBI's function, but can sometimes give a bit more detail\ (e.g. more detail about how close a near-gene SNP is to a nearby gene).\

\ \

Insertions/Deletions

\

\ dbSNP uses a class called 'in-del'. We compare the length of the\ reference allele to the length(s) of observed alleles; if the\ reference allele is shorter than all other observed alleles, we change\ 'in-del' to 'insertion'. Likewise, if the reference allele is longer\ than all other observed alleles, we change 'in-del' to 'deletion'.\

\ \

UCSC Annotations

\

\ UCSC checks for several unusual conditions that may indicate a problem \ with the mapping, and reports them in the Annotations section if found:\

\
    \
  • The dbSNP reference allele is not the same as the UCSC reference\ allele, i.e. the bases in the mapped position range.
  • \
  • Class is single, in-del, mnp or mixed and the UCSC reference\ allele does not match any observed allele.
  • \
  • In NCBI's alignment of flanking sequences to the genome, part\ of the flanking sequence around the SNP does not align to\ the genome.
  • \
  • Class is single, but the size of the mapped SNP is not one base.
  • \
  • Class is named and indicates an insertion or deletion, but the size\ of the mapped SNP implies otherwise.
  • \
  • Class is single and the format of observed alleles is unexpected.
  • \
  • The length of the observed allele(s) is not available because it is\ too long.
  • \
  • Multiple distinct insertion SNPs have been mapped to this location.
  • \
  • At least one observed allele contains an ambiguous \ IUPAC base (e.g. R, Y, N).
  • \
\ \ Another condition, which does not necessarily imply any problem, is noted:\
    \
  • Class is single and SNP is tri-allelic or quad-allelic.
  • \
\ \

UCSC Re-alignment of flanking sequences

\

\ dbSNP determines the genomic locations of SNPs by aligning their flanking \ sequences to the genome.\ UCSC displays SNPs in the locations determined by dbSNP, but does not\ have access to the alignments on which dbSNP based its mappings.\ Instead, UCSC re-aligns the flanking sequences \ to the neighboring genomic sequence for display on SNP details pages. \ While the recomputed alignments may differ from dbSNP's alignments,\ they often are informative when UCSC has annotated an unusual condition.\

\

\ Non-repetitive genomic sequence is shown in upper case like the flanking \ sequence, and a "|" indicates each match between genomic and flanking bases.\ Repetitive genomic sequence (annotated by RepeatMasker and/or the\ Tandem Repeats Finder with period <= 12) is shown in lower case, and matching\ bases are indicated by a "+".\

\ \

Data Sources

\

\ The data that comprise this track were extracted from database dump files \ and headers of fasta files downloaded from NCBI. \ The database dump files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/database/\ (e.g. for Human, organism_tax_id = human_9606).\ The fasta files were downloaded from \ ftp://ftp.ncbi.nih.gov/snp/organisms/\ organism_tax_id/rs_fasta/\

\
    \
  • Coordinates, orientation, location type and dbSNP reference allele data\ were obtained from b131_SNPContigLoc_37_1.bcp.gz and \ b131_ContigInfo_37_1.bcp.gz. \
  • b131_SNPMapInfo_37_1.bcp.gz provided the alignment weights.\
  • Functional classification was obtained from \ b131_SNPContigLocusId_37_1.bcp.gz.\
  • Validation status and heterozygosity were obtained from SNP.bcp.gz.\
  • The header lines in the rs_fasta files were used for molecule type,\ class and observed polymorphism.\
  • Clinically associated SNPs were obtained from SNP_bitfield.bcp.gz.\ (See bitfield specification.)\
\ \

Orthologous Alleles (human assemblies only)

\

\ Beginning with the March 2006 human assembly, we provide a related table that \ contains orthologous alleles in the chimpanzee and rhesus macaque assemblies.\ Beginning with dbSNP build 129, the orangutan assembly is also included.\ We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are \ a filtered list that meet the criteria:\

    \
  • class = 'single'\
  • chromEnd = chromStart + 1\
  • align to just one location\
  • are not aligned to a chrN_random chrom\
  • are biallelic (not tri or quad allelic)\
\ \ In some cases the orthologous allele is unknown; these are set to 'N'.\ If a lift was not possible, we set the orthologous allele to '?' and the \ orthologous start and end position to 0 (zero).\ \

Masked FASTA Files (human assemblies only)

\ \ FASTA files that have been modified to use \ IUPAC\ ambiguous nucleotide characters at\ each base covered by a single-base substitution are available for download\ here.\ Note that only single-base substitutions (no insertions or deletions) were used\ to mask the sequence, and these were filtered to exlcude problematic SNPs.\ \

References

\

\ Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. \ \ dbSNP: the NCBI database of genetic variation.\ Nucleic Acids Res. 2001 Jan 1;29(1):308-11.\

\  \ varRep 1 chimpDb panTro2\ chimpOrangMacOrthoTable snp131OrthoPt2Pa2Rm2\ codingAnnoLabel_snp131CodingDbSnp dbSNP\ codingAnnotations snp131CodingDbSnp,\ defaultGeneTracks knownGene\ group varRep\ hapmapPhase III\ longLabel Simple Nucleotide Polymorphisms (dbSNP build 131) Not Clinically Associated\ macaqueDb rheMac2\ maxWindowToDraw 10000000\ orangDb ponAbe2\ priority 100.093\ shortLabel SNPs (131)\ snpExceptionDesc snp131ExceptionDesc\ snpExceptions snp131Exceptions\ snpSeq snp131Seq\ track snp131NonClinical\ type bed 6 +\ url http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=$$\ urlLabel dbSNP:\ visibility hide\ pHMM_5_WayTop5 Thresholded Cons bed 5 . Top 5 % of Human/Chimp/Mouse/Rat/Chicken PhyloHMM Cons 0 100.1 0 0 0 127 127 127 0 0 0

Description

\

\ This track is a collection of the top five percent of scores\ from the phyloHMM multiz alignments track.

\

\ The multiz alignments of the human Jul. 2003 (hg16),\ chimpanzee Nov. 2003 (panTro1), mouse Feb. 2003 (mm3),\ rat Jun. 2003 (rn3), and chicken Feb. 2004 (galGal2)\ assemblies were used to generate the annotation.

\

\

\ View a histogram and cumulative probability\ distribution of this data.\

\

Credits

\

\ This track was generated at UCSC.\ compGeno 1 group compGeno\ longLabel Top 5 % of Human/Chimp/Mouse/Rat/Chicken PhyloHMM Cons\ priority 100.1\ shortLabel Thresholded Cons\ track pHMM_5_WayTop5\ type bed 5 .\ visibility hide\ phastCons phastCons wig 0.0 1.0 phastCons Conservation Score, Human/Chimp/Mouse/Rat/Chicken 0 100.11 0 10 100 127 132 177 0 0 0 compGeno 0 autoScale off\ color 0,10,100\ group compGeno\ longLabel phastCons Conservation Score, Human/Chimp/Mouse/Rat/Chicken\ maxHeightPixels 40\ priority 100.11\ shortLabel phastCons\ spanList 1\ track phastCons\ type wig 0.0 1.0\ visibility hide\ cns CNS bed 4 . Conserved non-coding (Cons elements minus predicted coding) 0 100.12 0 0 0 127 127 127 0 0 0 compGeno 1 group compGeno\ longLabel Conserved non-coding (Cons elements minus predicted coding)\ priority 100.12\ shortLabel CNS\ track cns\ type bed 4 .\ visibility hide\ phastConsElements Most Conserved bed 5 . PhastCons Conserved Elements, Human/Chimp/Mouse/Rat/Chicken 0 100.12 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows predictions of conserved elements produced by the phastCons\ program. PhastCons is part of the \ PHAST (PHylogenetic Analysis with \ Space/Time models) package. The predictions are based on a phylogenetic hidden \ Markov model (phylo-HMM), a type of probabilistic model that describes both \ the process of DNA substitution at each site in a genome and the way this \ process changes from one site to the next.

\ \

Methods

\

\ Best-in-genome pairwise alignments were generated for\ each species using blastz, followed by chaining and netting. A multiple\ alignment was then constructed from these pairwise alignments using multiz.\ Predictions of conserved elements were then obtained by running phastCons\ on the multiple alignments with the --most-conserved option.

\

\ PhastCons constructs a two-state phylo-HMM with a state for conserved\ regions and a state for non-conserved regions. The two states share a\ single phylogenetic model, except that the branch lengths of the tree\ associated with the conserved state are multiplied by a constant scaling\ factor rho (0 <= rho <= 1). The free parameters of the\ phylo-HMM, including the scaling factor rho, are estimated from\ the data by maximum likelihood using an EM algorithm. This procedure is\ subject to certain constraints on the "coverage" of the genome by conserved\ elements and the "smoothness" of the conservation scores. Details can be\ found in Siepel et al. (2005).

\

\ The predicted conserved elements are segments of the alignment that are\ likely to have been "generated" by the conserved state of the phylo-HMM.\ Each element is assigned a log-odds score equal to its log probability\ under the conserved model minus its log probability under the non-conserved\ model. The "score" field associated with this track contains transformed\ log-odds scores, taking values between 0 and 1000. (The scores are\ transformed using a monotonic function of the form a * log(x) + b.) The\ raw log odds scores are retained in the "name" field and can be seen on the\ details page or in the browser when the track's display mode is set to\ "pack" or "full".

\ \

Credits

\

\ This track was created at UCSC using the following programs:\

    \ \
  • Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. \ \
  • AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. \ \
  • PhastCons by Adam Siepel at Cornell University. \
\

\ \

References

\ \

PhastCons

\

\ Siepel A, Bejerano G, Pedersen JS, Hinrichs A, Hou M, Rosenbloom K, \ Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, \ Wilson RK, Gibbs RA, Kent WJ, Miller W, and Haussler D. \ Evolutionarily conserved elements in vertebrate, insect, worm, \ and yeast genomes.\ Genome Res. 2005 15:1034-1050.

\ \

Chain/Net

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, and Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 2003 100(20): 11484-11489.

\ \

Multiz

\

\ Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, \ Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, \ Haussler D, Miller W. \ Aligning multiple genomic sequences with the threaded blockset\ aligner.\ Genome Res. 2004 14(4):708-715.

\ \

Blastz

\

\ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002 115-126.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, \ Haussler D, and Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 13(1):103-107.

\ compGeno 1 exonArrows off\ group compGeno\ longLabel PhastCons Conserved Elements, Human/Chimp/Mouse/Rat/Chicken\ priority 100.12\ shortLabel Most Conserved\ showTopScorers 200\ track phastConsElements\ type bed 5 .\ visibility hide\ dless DLESS bed 4 + Detection of LinEage Specific Selection (DLESS) 0 100.191 0 0 0 127 127 127 0 0 0 compGeno 1 gainColor 0,255,0\ group compGeno\ longLabel Detection of LinEage Specific Selection (DLESS)\ lossColor 255,0,0\ priority 100.191\ shortLabel DLESS\ track dless\ type bed 4 +\ visibility hide\ dlessMD DLESS (MD) bed 4 + Detection of LinEage Specific Selection (DLESS) [missing data version] 0 100.192 0 0 0 127 127 127 0 0 0 compGeno 1 gainColor 0,255,0\ group compGeno\ longLabel Detection of LinEage Specific Selection (DLESS) [missing data version]\ lossColor 255,0,0\ priority 100.192\ shortLabel DLESS (MD)\ track dlessMD\ type bed 4 +\ visibility hide\ pHMM_3_WayTop5 Thresholded 3-Way Cons bed 5 . Top 5 % of Human/Mouse/Rat PhyloHMM Cons 0 100.2 0 0 0 127 127 127 0 0 0 compGeno 1 group compGeno\ longLabel Top 5 % of Human/Mouse/Rat PhyloHMM Cons\ priority 100.2\ shortLabel Thresholded 3-Way Cons\ track pHMM_3_WayTop5\ type bed 5 .\ visibility hide\ multizMm3Rn3GalGal2_phyloHMM Hu/Mouse/Rat/Chick wigMaf 0.0 1.0 Human/Mouse/Rat/Chicken Multiz Alignments & PhyloHMM Cons 0 101 0 10 100 127 132 177 0 0 0

Description

\

\ This track shows a measure of evolutionary conservation in human, mouse, rat, and\ chicken based on a phylogenetic hidden Markov model (phylo-HMM).\ The following multiz alignments were used to generate this annotation:\

    \
  • human July 2003 (NCBI34/hg16) (hg16) \
  • mouse Feb. 2003 (mm3)\
  • rat Jun. 2003 (rn3)\
  • chicken Feb. 2004 (galGal2) \

\

\ In "full" visibility mode, this track displays pairwise alignments \ of mouse, rat, and chicken, each aligned to the human genome. The pairwise alignments are\ displayed in the standard UCSC browser "dense" mode using a greyscale \ density gradient. The checkboxes in the track configuration section allow\ the exclusion of species from the pairwise display; however, this does not\ remove them from the conservation score display.

\

\ When zoomed-in to the base-display level, the track shows the base \ composition of each alignment. The numbers and symbols on the Gaps\ line indicate the lengths of gaps in the human sequence at those \ alignment positions relative to the longest non-human sequence. \ If there is sufficient space in the display, the size of the gap is shown; \ if not, and if the gap size is a multiple of 3, a "*" is displayed, \ otherwise "+" is shown. \ To view detailed information about the alignments at a specific position,\ zoom in the display to 30,000 or fewer bases, then click on the alignment.

\

\ This track may be configured in a variety of ways to highlight different aspects\ of the displayed information. Click the \ Graph \ configuration help link for an explanation of the configuration options.

\ \

Methods

\

\ Best-in-genome blastz pairwise alignments of human-mouse and\ human-rat were multiply aligned using a program called humor\ (HUman-MOuse-Rat), which is a special variant of the Multiz\ program. Multiz was used to align the humor results with\ best-in-genome blastz human-chicken alignments. The resulting\ human-mouse-rat-chicken multiple alignments were then assigned \ conservation scores by phylo-HMM.

\

\ A phylo-HMM is a probabilistic model that describes both the process\ of DNA substitution at each site in a genome, and the way this process\ changes from one site to the next (Felsenstein and Churchill 1996,\ Yang 1995, Siepel and Haussler 2003, Siepel and Haussler 2004). A \ phylo-HMM can be thought\ of as a machine that generates a multiple alignment, in the same way\ that an ordinary hidden Markov model (HMM) generates an individual\ sequence. While the states of an ordinary HMM are associated with\ simple multinomial probability distributions, the states of a\ phylo-HMM are associated with more complex distributions defined by\ probabilistic phylogenetic models. These distributions can capture\ differences in the rates and patterns of nucleotide\ substitution observed in different types of genomic regions (e.g., coding\ or noncoding regions, conserved or nonconserved regions).

\

\ To compute a conservation score, we use a\ k-state phylo-HMM, whose k associated phylogenetic\ models differ only in overall evolutionary rate (Felsenstein and\ Churchill 1996, Yang 1995). In the image at right, there are three\ k states, \ S1, S2, and S3, but in practice we \ use k = 10. \ A phylogenetic model is estimated globally, using the discrete gamma model\ for rate variation (Yang 1994), then a scaled version of the estimated model\ is associated with each state in a phylo-HMM. There is a\ separate "rate constant", ri, for each state i, \ which is multiplied by all branch lengths in the globally estimated model.\ The transition probabilities between states allow for autocorrelation of\ substitution rates, i.e., for adjacent sites to tend to exhibit similar\ overall substitution rates. A single parameter, lambda, describes the\ degree of autocorrelation and defines all transition probabilities. \ Here, we have estimated the rate constants from the data,\ similarly to Yang (1995) (Siepel and Haussler 2003), but have\ allowed lambda to be treated as a tuning parameter. For the\ conservation score, we use the posterior probability that each site was\ "generated" by the state having the smallest rate constant. Because of\ the way the rate categories are defined, the plotted values can be\ thought of as approximately representing the posterior probability that\ each site is among the 10% most conserved sites in the data set\ (allowing for autocorrelation of substitution rates).

\

\ In this case, the general reversible (REV) substitution model was\ used in parameter estimation, and lambda was set to 0.9. Alignment\ gaps were treated as missing data, which sometimes has the effect of\ producing undesirably high posterior probabilities in gappy regions of\ the alignment. We are looking at several possible ways of improving\ the handling of alignment gaps.

\ \

Credits

\

\ This track was created at UCSC using the following programs:\

    \
  • \ Blastz and multiz from Minmei Hou, Scott Schwartz and Webb Miller of the \ Penn State Bioinformatics \ Group. \
  • \ AxtBest, axtChain, chainNet, netSyntenic, and netClass \ developed by Jim Kent at UCSC. \
  • Tree estimation and phylo-HMM software by Adam Siepel at Cornell University.\
  • "Wiggle track" plotting software by Hiram Clawson at UCSC.\
\

\ \

References

\ \

Phylo-HMMs and phastCons

\

\ Felsenstein, J. and Churchill, G.A.\ A hidden Markov model approach to\ variation among sites in rate of evolution.\ Mol Biol Evol 13, 93-104 (1996).

\

\ Siepel, A. and Haussler, D. Phylogenetic hidden Markov models.\ In R. Nielsen, ed., Statistical Methods in Molecular Evolution,\ pp. 325-351, Springer, New York (2005).

\

\ Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom,\ K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M.,\ Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D.\ Evolutionarily conserved elements in vertebrate, insect, worm,\ and yeast genomes.\ Genome Res. 15, 1034-1050 (2005).

\

\ Yang, Z.\ A space-time process model for the evolution of DNA\ sequences. Genetics, 139, 993-1005 (1995).

\ \

Chain/Net:

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron:\ Duplication, deletion, and rearrangement in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).\ \

Multiz:

\

\ Blanchette, M., Kent, W.J., Riemer, C., Elnitski, .L, Smit, A.F.A., Roskin,\ K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D.,\ Miller, W.\ Aligning Multiple Genomic Sequences with the Threaded Blockset Aligner.\ Genome Res. 14(4), 708-15 (2004).\ \

Blastz:

\

\ Chiaromonte, F., Yap, V.B., and Miller, W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ compGeno 1 autoScale Off\ color 0, 10, 100\ group compGeno\ longLabel Human/Mouse/Rat/Chicken Multiz Alignments & PhyloHMM Cons\ maxHeightPixels 100:40:11\ pairwise hmrg\ priority 101\ shortLabel Hu/Mouse/Rat/Chick\ spanList 1\ speciesOrder mm3 rn3 galGal2\ track multizMm3Rn3GalGal2_phyloHMM\ type wigMaf 0.0 1.0\ visibility hide\ wiggle multizMm3Rn3GalGal2_phyloHMM_wig\ yLineOnOff Off\ hapmapAllelesMacaque Macaque Alleles bed 6 + Orthologous Alleles from Macaque (rheMac2) 0 101 0 0 0 127 127 127 0 0 0 varRep 1 longLabel Orthologous Alleles from Macaque (rheMac2)\ parent hapmapSnps\ priority 101\ shortLabel Macaque Alleles\ track hapmapAllelesMacaque\ snpArray SNP Arrays bed 4 . SNP Genotyping Arrays 0 101 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays the SNPs used in genotyping platforms.\

Affymetrix Genome-Wide Human SNP Array 6.0 and SV

\ The SNP Array 6.0 includes more than 906,600 single nucleotide polymorphisms (SNPs) \ and more than 946,000 probes for the detection of copy number variation. \ The SNPs include the 482,000 SNPs from the 5.0 Array (unbiased selection).\ In addition, 424,000 new SNPs were chosen in the following areas:\
    \
  • Tag SNPs
  • \
  • SNPs from chromosomes X and Y
  • \
  • Mitochondrial SNPs
  • \
  • New SNPs added to dbSNP
  • \
  • SNPs in recombination hotspots
  • \
\

\ The structural variation copy number (SV) probes include 202,000 probes \ targeting 5,677 known CNV regions\ from the Toronto Database of Genomic Variants. The additional 744,000 probes \ are evenly spaced throughout the genome.\ \

Affymetrix Genome-Wide Human SNP Array 5.0

\ The SNP Array 5.0 is a single microarray featuring all single nucleotide \ polymorphisms (SNPs) from the original two-chip Mapping 500K Array Set, as \ well as 420,000 additional non-polymorphic probes that can measure other \ genetic differences, such as copy number variation.\

Affymetrix 500K (250K Nsp and 250K Sty)

\ This annotation displays the SNPs available for genotyping with the \ GeneChip Human Mapping 500K Array Set from Affymetrix. It is comprised of\ two arrays: Nsp and Sty, which contain approximately 262,000 and 238,000 SNPs,\ respectively.\ \

Illumina HumanHap650Y

\ This annotation displays the SNPs available for genotyping with Illumina's\ HumanHap650Y Genotyping BeadChip. The HumanHap650Y contains over 650,000 markers,\ extending the HumanHap550 by adding 100,000 additional Yoruba-specific tag\ SNPs. On average, there is 1 SNP every 5.3 kb, 6.2 kb and 5.4 kb across\ the genome in the CEU, CHB+JPT and YRI populations, respectively.\ The HumanHap650Y was derived from release 21 of the \ \ International HapMap Project.\ \

Illumina HumanHap550

\ This annotation displays the SNPs available for genotyping with Illumina's\ HumanHap550 Genotyping BeadChip. The HumanHap550 contains over 550,000 markers,\ the majority of which are tag SNPs \ derived from release 20 of the \ \ International HapMap Project. In addition,\ approximately 7800 non-synonymous SNPs, a higher density of tag SNPs in\ the MHC region, over 150 mitochondrial SNPs and over 4000\ SNPs from regions with copy number polymorphism were included. \ In the CEU population, an r-squared threshold of 0.8 was used\ for common SNPs in genes, within 10 kb of genes or in evolutionarily\ conserved regions. For all other regions, an r-squared threshold of 0.7 was used.\ On average, there is 1 SNP every 5.5 kb, 6.5 kb and 6.2 kb across the genome in \ the CEU, CHB+JPT and YRI populations, respectively.\ \

Illumina HumanHap300

\ This annotation displays the SNPs available for genotyping with Illumina's\ HumanHap300 Genotyping BeadChip. The HumanHap300 contains over 317,000 tagSNP markers\ derived from Phase I of the \ \ International HapMap Project. In addition,\ approximately 7300 non-synonymous SNPs and a higher density of tag SNPs in\ the MHC region were included. On average, there is 1 SNP every 9 kb across\ the genome and median spacing is 5 kb.\ \

Illumina Human1M-Duo

\ This annotation displays the SNPs available for genotyping with Illumina's\ Human1M-Duo Genotyping BeadChip. The Human1M-Duo contains more than 1,100,000 tagSNP markers\ and a set of ~60,000 additional CNV-targeted markers. \ The median spacing is 1.5kb (mean - 2.4 kb).\ \

Illumina HumanOmni1-Quad v1

\ The HumanOmni1-Quad BeadChip consists of 1,140,419 markers in a 4-sample\ format. The whole-genome content provides high\ genomic coverage rates of 93%, 92%, and 76% at r2 > 0.8 for the CEU,\ CHB+JPT, and YRI populations, respectively. High density markers with a\ median spacing of 1.2 kb ensure the highest level of resolution for CNV\ and breakpoint identification.\ \ The content has been derived from the 1,000 Genomes Project,\ all three HapMap phases, and recently published studies, including \ new coding variants identified by the 1000 Genomes Project and\ markers chosen in high-value regions of the genome: ABO blood\ typing SNPs, cSNPs, disease-associated SNPs, eSNPs, SNPs in mRNA splice\ sites, ADME genes, AIMs, HLA complexes, indels, introns, MHC regions,\ miRNA binding sites, mitochondrial DNA, PAR, promoter regions, and\ Y-chromosome.\ \

Illumina Human660W-Quad v1

\ The Human660W-Quad BeadChip consists of 657,366 markers in a 4-sample\ format. The Human660W-Quad BeadChip provides 87%, 85%, and 56% coverage\ of CEU, CHB+JPT, and YRI populations at r2 > 0.8. For \ CNV and cytogenetic analysis, the dense backbone content is combined\ with an additional ~100,000 markers that target observed common CNVs.\ \

HumanCytoSNP-12 v2.1

\ \ The 301,232 markers on the HumanCytoSNP-12 represents a complete 12-sample panel of \ genome-wide SNPs for a uniform backbone and additional markers targeting all regions of \ known cytogenetic importance. Backbone markers provide genome-wide marker spacing of 10kb. \ This is supplemented with dense coverage (at 6 kb spacing average) of ~250 genomic regions \ commonly studied in cytogenetics labs and targeted coverage in ~400 additional genes, \ subtelomeric regions, pericentromeric regions, and sex chromosomes. \ An efficiency-optimized tagging strategy provides a panel for GWAS \ (70% coverage in CEU at r2 > 0.8) in the highest throughput and most cost-effective \ whole-genome DNA Analysis BeadChip.\ \

References

\

\ \ More information on the Affymetrix arrays is available at these sites:\

\ \ More information on the Illumina arrays is available at these sites:\ \ \

Methods

\ Position, strand, and polymorphism data were obtained from Affymetrix and \ supplemented with links to corresponding dbSNP rsIDs based on a positional\ lookup into \ \ dbSNP. The Affy 6.0 Array is based on dbSNP build 127; the Affy 5.0 Array \ is based on dbSNP build 126. The Affy 500K Array is based on dbSNP build 125 \ and was translated from hg17 by UCSC using rsID lookup. \ In fewer than 2% of the cases, a dbSNP rsID was\ not present in dbSNP at the Affymetrix array position. \ Reference allele information was retrieved from the UCSC database based on dbSNP position\ and strand data. \
\
\ Illumina data were supplied as rsIDs and position based on dbSNP build 126. \ Strand, polymorphism and reference allele information was retrieved from the UCSC database \ based on rsID and position.\ The Illumina arrays are comprised of probes for 4 of the possible single-base substitutions:\ A/C, A/G, C/T and G/T. A/T and C/G probes will be available in future arrays.\ \
\
\ For Illumina Human1M-Duo, the position, strand, polymorphism and reference allele information was \ retrieved from the snp129 table of UCSC database if the marker ID can be found in dbSNP 129, \ otherwise the information is retrieved from the data provided by Illumina.\ \
\
\ For Illumina HumanOmni1-Quad, Human660W-Quad, and HumanCytoSNP-12, \ the position, strand, polymorphism and reference allele information was \ retrieved from the snp130 table of UCSC database if the marker ID can be found in dbSNP 130, \ otherwise the information is retrieved from the data provided by Illumina.\ \
\

Credits

\ Thanks to Venu Valmeekam from Affymetrix, Luana Galver and Jennifer L. Stone from Illumina for\ providing these data.\ varRep 1 compositeTrack on\ group varRep\ longLabel SNP Genotyping Arrays\ priority 101\ shortLabel SNP Arrays\ track snpArray\ type bed 4 .\ visibility hide\ covMask1kGPilotLowCov 1000Genomes Cov bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase 0 101.5 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays regions of the reference genome that have anomalies \ in the \ 1000 Genomes Project's \ mapping of high-throughput sequencing reads from the pilot study of ~180 \ genomes, each sequenced at low coverage (~2X), from the \ CEU (CEPH - Northern European from Utah), \ CHB/JPT (Chinese from Beijing and Japanese from Tokyo) and\ YRI (Yoruba from Ibadan, Nigeria) populations \ (see Coriell Institute's description\ of 1000 Genomes samples). \ These regions were excluded from the 1000 Genomes Project's \ SNP-calling process.\

\

\ Regions with abnormal read depth (total depth is greater than twice the \ average depth at HapMap3 sites) are displayed in dark red. \ Regions with low mapping quality (more than 20% of reads from Illumina \ platform have mapping quality 0) are displayed in light red. \ Regions with no coverage (no reads mapped) are shown in light gray.\ There is a separate subtrack per population and type of anomaly.\

\ \

Methods

\

\ Pseudo-fasta files included in the July 2010 release of 1000 Genomes pilot data, \ containing a mapping code letter for each base in the reference genome,\ were downloaded from \ ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/release/2010_07/low_coverage/other_data/ and processed by UCSC to extract genomic coordinates of \ annotated regions. \

\

Excerpted from \ ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/release/2010_07/low_coverage/other_data/README.2010_03.low_coverage_masks:\

\
This directory contains information about the coverage of each base in\
the genome in the pilot 1 populations, and whether that base would\
have passed the filters used for SNP calling in the consensus released\
SNP sets.\
\
The .mask.fa files are pseudo-fasta files for the whole genome,\
with instead of a sequence of bases a sequence of the following symbols:\
  N  N in reference\
  -  no coverage\
  M  failed MAPQ0 filter: more than 20% of Illumina reads have mapping quality 0\
  D  failed DEPTH filter: total depth is greater than twice the average depth at \
     HapMap3 sites, i.e. >625 for CEU, >445 for YRI, >330 for CHBJPT\
  B  failed both M and D filters\
  0  passes filters\
\
Note that the filters are relatively conservative, designed to achieve\
a false discovery rate below 5%\
\ Bases with D or B are included in the Abnormal Depth subtracks; \ bases with M or B are included in the Mapping Quality Failure subtracks;\ and bases with - are included in the No Coverage subtracks.\

\ \

Credits

\ Thanks to Richard Durbin, Sendu Bala and the rest of the 1000 Genomes \ Project Consortium for these data.\ \

References

\

\ 1000 Genomes Project, \ http://1000genomes.org/,\ accessed Sep. 2010.\

\ varRep 1 compositeTrack on\ dimensions dimX=view dimY=pop\ dragAndDrop subTracks\ group varRep\ longLabel Coverage Analysis from the 1000 Genomes Project Pilot Phase\ priority 101.5\ shortLabel 1000Genomes Cov\ sortOrder view=+ pop=+\ subGroup1 view Views Depth=Abnormal_Depth MapQ=Mapping_Quality_Failure Uncov=No_Coverage sum=Summary\ subGroup2 pop Population Ceu=CEU ChbJpt=CHB/JPT Yri=YRI all=All\ track covMask1kGPilotLowCov\ type bed 3\ visibility hide\ covMask1kGPilotLowCovDepth Abnormal Depth bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase 0 101.5 180 0 0 217 127 127 0 0 0 varRep 1 color 180,0,0\ parent covMask1kGPilotLowCov\ shortLabel Abnormal Depth\ track covMask1kGPilotLowCovDepth\ view Depth\ visibility hide\ covMask1kGPilotLowCovMapQ Map Qual Failure bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase 0 101.5 224 108 108 239 181 181 0 0 0 varRep 1 color 224,108,108\ parent covMask1kGPilotLowCov\ shortLabel Map Qual Failure\ track covMask1kGPilotLowCovMapQ\ view MapQ\ visibility hide\ covMask1kGPilotLowCovUncov No Coverage bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase 0 101.5 150 150 150 202 202 202 0 0 0 varRep 1 color 150,150,150\ parent covMask1kGPilotLowCov\ shortLabel No Coverage\ track covMask1kGPilotLowCovUncov\ view Uncov\ visibility hide\ covMask1kGPilotLowCovSumView Summary bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase 1 101.5 0 0 0 127 127 127 0 0 0 varRep 1 parent covMask1kGPilotLowCov\ shortLabel Summary\ track covMask1kGPilotLowCovSumView\ view sum\ visibility dense\ hgdpGeo HGDP Allele Freq bed 4 + Human Genome Diversity Project SNP Population Allele Frequencies 0 102 0 0 0 127 127 127 0 0 0 http://hgdp.uchicago.edu/cgi-bin/gbrowse/HGDP/?name=$$

Description

\

\ This track shows the 657,000 SNPs genotyped in 53 populations worldwide by the \ Human \ Genome Diversity Project in collaboration with the\ Centre d'Etude \ du Polymorphisme Humain (HGDP-CEPH).\ This track and several others are available from the \ HGDP Selection Browser.\

\ \

Methods

\

\ Samples collected by the HGDP-CEPH from 1,043 individuals from around the\ world were genotyped for 657,000 SNPs at\ Stanford.\ Ancestral states for all SNPs were estimated using whole genome\ human-chimpanzee alignments from the UCSC database.\ For each SNP in the human genome (NCBI Build 35, UCSC database hg17), the \ allele at the corresponding position in the chimp genome (Build 2 version 1,\ UCSC database pantro2) was used as ancestral.\

\

\ Allele frequencies were plotted on a world map using programs included in the\ Generic Mapping Tools.\

\ \

Credits

\

\ Thanks to the HGDP-CEPH, the Pritchard lab at the University of Chicago, Joe\ Pickrell and John Novembre for sharing the data and plotting scripts \ for this track.\

\ \ \

References

\

\ Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer\ J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A et al.\ A human genome diversity cell line panel.\ Science. 2002 Apr 12;296(5566):261-2.

\ \

\ Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S,\ Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM.\ Worldwide human relationships inferred from genome-wide\ patterns of variation. Science. 2008 Feb 22;319(5866):1100-4.

\ \

\ Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li J, Absher D,\ Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK.\ Signals\ of recent positive selection in a worldwide sample of human populations.\ Genome Res. 2009 May;19(5):826-37.

\ \ \

\ Wessel P, Smith WHF.\ New, improved version of Generic Mapping Tools released.\ EOS, Trans. Amer. Geophys. U. 1998;79(47):579.

\ \ varRep 1 group varRep\ longLabel Human Genome Diversity Project SNP Population Allele Frequencies\ priority 102\ shortLabel HGDP Allele Freq\ track hgdpGeo\ type bed 4 +\ url http://hgdp.uchicago.edu/cgi-bin/gbrowse/HGDP/?name=$$\ urlLabel HGDP Selection Browser:\ visibility hide\ multizMm3Rn3GalGal2 HMRG maf Human/Mouse(mm3)/Rat(rn3)/Chicken(galGal2) Multiz Alignments 0 102 0 0 0 127 127 127 0 0 0 compGeno 0 group compGeno\ longLabel Human/Mouse(mm3)/Rat(rn3)/Chicken(galGal2) Multiz Alignments\ priority 102\ shortLabel HMRG\ track multizMm3Rn3GalGal2\ type maf\ visibility hide\ hgdpFst HGDP Smoothd FST bedGraph 4 Human Genome Diversity Project Smoothed Relative FST (Fixation Index) 0 102.01 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,

Description

\

\ In this track, the value shown for each SNP is -log10 of the\ fraction of SNPs with a more extreme FST value than that SNP.\ \ \ Relative FST \ (also known as the Fixation index) values were calculated\ from SNPs genotyped in 53 populations worldwide by the \ Human \ Genome Diversity Project in collaboration with the\ Centre d'Etude \ du Polymorphisme Humain (HGDP-CEPH).\ This track and several others are available from the \ HGDP Selection Browser.\

\

\ From Wikipedia:\

\ Fixation index (FST) is a measure of population differentiation based\ on genetic polymorphism data, such as Single nucleotide polymorphisms\ (SNPs) or microsatellites. It is a special case of \ F-statistics, the\ concept developed in the 1920s by Sewall Wright. This statistic\ compares the genetic variability within and between populations and is\ frequently used in the field of population genetics.\
\ From http://www.uwyo.edu/dbmcd/popecol/Maylects/PopGenGloss.html:\
\ FST is the proportion of the total genetic variance\ contained in a subpopulation (the S subscript) relative to the total\ genetic variance (the T subscript). Values can range from 0 to 1. High\ FST implies a considerable degree of differentiation among\ populations. \
\

\ \

Methods

\

\ Samples collected by the HGDP-CEPH from 1,043 individuals from around the\ world were genotyped for 657,000 SNPs at\ Stanford.\ The 53 populations were divided into seven continental groups: Africa,\ Middle East, Europe, South Asia, East Asia, Oceania and the Americas.\ FST was computed for all SNPs, and then each SNP's place in \ the empirical FST distribution was used to derive the scores \ shown in this track, -log10 of the fraction of SNPs with a \ more extreme FST value than that SNP.\

\ \

Credits

\

\ Thanks to the HGDP-CEPH and Joe Pickrell in the \ Pritchard\ lab at the University of Chicago for providing these data.\

\ \ \

References

\

\ Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li J, Absher D,\ Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK.\ Signals of recent positive selection in a worldwide sample of\ human populations. Genome Res. 2009 May;19(5):826-37.

\ \

\ Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S,\ Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM.\ Worldwide human relationships inferred from genome-wide\ patterns of variation. Science. 2008 Feb 22;319(5866):1100-4.

\ \

\ Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer\ J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A et al.\ A human genome diversity cell line panel.\ Science. 2002 Apr 12;296(5566):261-2.

\ \ varRep 0 autoScale Off\ chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,\ group varRep\ longLabel Human Genome Diversity Project Smoothed Relative FST (Fixation Index)\ maxHeightPixels 100:20:10\ maxLimit 6\ minLimit 0\ priority 102.01\ shortLabel HGDP Smoothd FST\ track hgdpFst\ type bedGraph 4\ viewLimits 0:5\ visibility hide\ hgdpHzy HGDP Hetrzygsty bedGraph 4 Human Genome Diversity Project Smoothed Expected Heterozygosity on 7 Continents 0 102.02 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,

Description

\

\ This track shows a 3-SNP moving average of \ p(1-p) where p is the major allele frequency\ (i.e. half of the expected heterozygosity) \ on seven continents,\ from SNPs genotyped in 53 populations worldwide by the \ Human \ Genome Diversity Project in collaboration with the\ Centre d'Etude \ du Polymorphisme Humain (HGDP-CEPH).\ This track and several others are available from the \ HGDP Selection Browser.\

\ \

Methods

\

\ Samples collected by the HGDP-CEPH from 1,043 individuals from around the\ world were genotyped for 657,000 SNPs at\ Stanford.\ The 53 populations were divided into seven continental groups: Africa,\ Middle East, Europe, South Asia, East Asia, Oceania and the Americas.\ Allele frequencies were used to calculate p(1-p) \ for each SNP, and then a\ 3-SNP average was computed for each SNP and its two neighboring SNPs.

\ \

\ The associated analysis tracks HGDP FST, HGP iHS, and HGDP XP-EHH\ (Pickrell et al.) did not make use of all African populations, but\ instead used only the Bantu populations because a more closely related\ group was desired for comparison with other continental groups.\ For this track, separate subtracks show the expected heterozygosity \ of all African populations and of only Bantu populations.\

\ \

Credits

\

\ Thanks to the HGDP-CEPH and Joe Pickrell in the \ Pritchard\ lab at the University of Chicago for providing these data.\

\ \ \

References

\

\ Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li J, Absher D,\ Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK.\ Signals of recent positive selection in a worldwide sample of\ human populations. Genome Res. 2009 May;19(5):826-37.

\ \

\ Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S,\ Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM.\ Worldwide human relationships inferred from genome-wide\ patterns of variation. Science. 2008 Feb 22;319(5866):1100-4.

\ \

\ Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer\ J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A et al.\ A human genome diversity cell line panel.\ Science. 2002 Apr 12;296(5566):261-2.

\ \ varRep 0 autoScale Off\ chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,\ compositeTrack on\ group varRep\ longLabel Human Genome Diversity Project Smoothed Expected Heterozygosity on 7 Continents\ maxHeightPixels 100:20:10\ maxLimit 0.25\ minLimit 0\ priority 102.02\ shortLabel HGDP Hetrzygsty\ track hgdpHzy\ type bedGraph 4\ viewLimits 0:0.25\ visibility hide\ hgdpIhs HGDP iHS bedGraph 4 Human Genome Diversity Project Integrated Haplotype Score on 7 Continents 0 102.03 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,

Description

\

\ This track shows per-continent integrated haplotype score (iHS, \ Voight et al.), a measure of very recent positive selection.\ Scores were calculated using SNPs genotyped in 53 populations worldwide by the \ Human \ Genome Diversity Project in collaboration with the\ Centre d'Etude \ du Polymorphisme Humain (HGDP-CEPH).\ This track and several others are available from the \ HGDP Selection Browser.\

\ \

Methods

\

\ Samples collected by the HGDP-CEPH from 1,043 individuals from around the\ world were genotyped for 657,000 SNPs at\ Stanford.\ The 53 populations were divided into seven continental groups: Africa\ (Bantu populations only),\ Middle East, Europe, South Asia, East Asia, Oceania and the Americas.\ Bantu populations in Africa were chosen instead of all African populations\ because a more closely related group was desired for comparison with \ other continental groups.\ iHS was then calculated for each population group using the program ihs\ (source code available) and then normalizing the \ resulting unstandardized iHS scores in derived allele frequency bins\ as described in (Voight et al.).\ Per-SNP iHS scores were smoothed in windows of 31 SNPs, centered on \ each SNP. The final score is -log10 of the proportion of \ smoothed scores higher than each SNP's smoothed score.\

\ \

Credits

\

\ Thanks to the HGDP-CEPH and Joe Pickrell in the \ Pritchard\ lab at the University of Chicago for providing these data.\

\ \ \

References

\

Voight BF, Kudaravalli S, Wen X, Pritchard JK.\ A map of recent positive selection in the human genome.\ PLoS Biol. 2006 Mar;4(3):e72.

\ \

\ Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li J, Absher D,\ Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK.\ Signals of recent positive selection in a worldwide sample of\ human populations. Genome Res. 2009 May;19(5):826-37.

\ \

\ Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S,\ Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM.\ Worldwide human relationships inferred from genome-wide\ patterns of variation. Science. 2008 Feb 22;319(5866):1100-4.

\ \

\ Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer\ J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A et al.\ A human genome diversity cell line panel.\ Science. 2002 Apr 12;296(5566):261-2.

\ varRep 0 autoScale Off\ chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX\ compositeTrack on\ group varRep\ longLabel Human Genome Diversity Project Integrated Haplotype Score on 7 Continents\ maxHeightPixels 100:20:10\ maxLimit 6\ minLimit 0\ priority 102.03\ shortLabel HGDP iHS\ track hgdpIhs\ type bedGraph 4\ viewLimits 0:5\ visibility hide\ hgdpXpehh HGDP XP-EHH bedGraph 4 Human Genome Diversity Proj Cross-Pop Ext Haplo Homzgty (XP-EHH) on 7 Continents 0 102.04 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,

Description

\

\ This track shows per-continent Cross Population Extended Haplotype\ Homozygosity (XP-EHH) score (Sabeti et al.), an estimate of \ positive selection that highlights SNPs that have approached or \ achieved fixation in a population but remain polymorphic in the human \ population as a whole.\ Scores were calculated using SNPs genotyped in 53 populations worldwide by the\ Human \ Genome Diversity Project in collaboration with the\ Centre d'Etude \ du Polymorphisme Humain (HGDP-CEPH).\ This track and several others are available from the \ HGDP Selection Browser.\

\ \

Methods

\

\ Samples collected by the HGDP-CEPH from 1,043 individuals from around the\ world were genotyped for 657,000 SNPs at\ Stanford.\ The 53 populations were divided into seven continental groups: Africa\ (Bantu populations only),\ Middle East, Europe, South Asia, East Asia, Oceania and the Americas.\ Bantu populations in Africa were chosen instead of all African populations\ because a more closely related group was desired for comparison with \ other continental groups.\ XP-EHH was then calculated for each population group using the program\ xpehh\ (source code available) \ as described in Sabeti et al.\

\ \

Credits

\

\ Thanks to the HGDP-CEPH and Joe Pickrell in the \ Pritchard\ lab at the University of Chicago for providing these data.\

\ \ \

References

\

\ Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, \ Xie X, Byrne EH, McCarroll SA, Gaudet R et al.\ Genome-wide detection and characterization of positive \ selection in human populations.\ Nature. 2007 Oct 18;449(7164):913-8.

\ \

\ Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li J, Absher D,\ Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK.\ Signals of recent positive selection in a worldwide sample of\ human populations. Genome Res. 2009 May;19(5):826-37.

\ \

\ Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S,\ Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM.\ Worldwide human relationships inferred from genome-wide\ patterns of variation. Science. 2008 Feb 22;319(5866):1100-4.

\ \

\ Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer\ J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A et al.\ A human genome diversity cell line panel.\ Science. 2002 Apr 12;296(5566):261-2.

\ \ varRep 0 autoScale Off\ chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,\ compositeTrack on\ group varRep\ longLabel Human Genome Diversity Proj Cross-Pop Ext Haplo Homzgty (XP-EHH) on 7 Continents\ maxHeightPixels 100:20:10\ maxLimit 7\ minLimit 0\ priority 102.04\ shortLabel HGDP XP-EHH\ track hgdpXpehh\ type bedGraph 4\ viewLimits 0:5\ visibility hide\ multizMm3Rn3Pt0 HMRP wigMaf 0.0 1.0 Human/Mouse(mm3)/Rat(rn3)/Chimp(pt0) Multiz Alignments 0 103 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows a measure of evolutionary conservation in human, chimp, \ mouse and rat based on a phylogenetic hidden Markov model (phastCons). \ Multiz alignments of the following assemblies were used to generate this\ annotation: \

    \
  • human July 2003 (NCBI34/hg16) 003 (hg16) \
  • chimp Nov. 2003 (panTro1)\
  • mouse Feb. 2003 (mm3) draft assembly\
  • rat Jun. 2003 (rn3)\

\ \

Methods

\

\ Multiz is a multiple alignment program that takes\ blastz \ "Best-in-Genome" alignments (axtBest or axtNet) as input. For human/rodent\ alignments, it uses the same scoring matrix as blastz between pairs of sequences: \

\
        A     C     G     T\
  A    91  -114   -31  -123\
  C  -114   100  -125   -31\
  G   -31  -125   100  -114\
  T  -123   -31  -114    91\
\
  O = 400, E = 30, K = 3000, L = 3000, M = 50\

\

\ For the mouse/rat alignments the following matrix was used:\ \

\
\
        A     C     G     T\
  A    86  -135   -68  -157\
  C  -135   100  -148   -68\
  G   -68  -148   100  -135\
  T  -157   -68  -135    86\
\
  O = 600, E = 50\

\

\ For the human/chimpanzee alignments the following matrix was used:\ \

\
\
       A    C    G    T\
 A   100 -300 -150 -300\
 C  -300  100 -300 -150\
 G  -150 -300  100 -300\
 T  -300 -150 -300  100\
\
 O = 400, E = 30, K = 4500, L = 4500, M = 50\

\

\ The overall score is the sum of the score over all pairs.

\

\ The resulting human-chimp-mouse-rat multiple alignments were then assigned \ conservation scores by phastCons.\ The phastCons program computes conservation scores based on a phylo-HMM, a\ type of probabilistic model that describes both the process of DNA\ substitution at each site in a genome and the way this process changes from\ one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and\ Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for\ conserved regions and a state for non-conserved regions. The value plotted\ at each site is the posterior probability that the corresponding alignment\ column was "generated" by the conserved state of the phylo-HMM. These\ scores reflect the phylogeny (including branch lengths) of the species in\ question, a continuous-time Markov model of the nucleotide substitution\ process, and a tendency for conservation levels to be autocorrelated along\ the genome (i.e., to be similar at adjacent sites). The general reversible\ (REV) substitution model was used. Note that, unlike many\ conservation-scoring programs, phastCons does not rely on a sliding window\ of fixed size, so short highly-conserved regions and long moderately\ conserved regions can both obtain high scores. More information about\ phastCons can be found in Siepel et al. (2005).

\

\ PhastCons currently\ treats alignment gaps as missing data, which sometimes has the effect of\ producing undesirably high conservation scores in gappy regions of the\ alignment. We are looking at several possible ways of improving the\ handling of alignment gaps.

\ \

Credits

\

\ This track was created at UCSC using the following programs:\

    \
  • \ Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the \ Penn State Bioinformatics \ Group. \
  • \ Mouse sequence data \ are provided by the Mouse Genome Sequencing Consortium. \
  • \ Rat sequence data are provided by the Rat Sequencing \ Consortium.\
  • \ Chimpanzee sequence data are provided by the Chimpanzee\ Sequencing Consortium.\ \
  • \ Jim Kent wrote\ axtBest\ and the scripts to run multiz genome-wide and to display\ the alignments in this browser. \
  • PhastCons by Adam Siepel at Cornell University. \
  • "Wiggle track" plotting software by Hiram Clawson at UCSC.\
\

\

\ \

References

\ \

Phylo-HMMs and phastCons

\

\ Felsenstein, J. and Churchill, G.A.\ A hidden Markov model approach to\ variation among sites in rate of evolution.\ Mol Biol Evol 13, 93-104 (1996).

\

\ Siepel, A. and Haussler, D. Phylogenetic hidden Markov models.\ In R. Nielsen, ed., Statistical Methods in Molecular Evolution,\ pp. 325-351, Springer, New York (2005).

\

\ Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom,\ K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M.,\ Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D.\ Evolutionarily conserved elements in vertebrate, insect, worm,\ and yeast genomes.\ Genome Res. 15, 1034-1050 (2005).

\

\ Yang, Z.\ A space-time process model for the evolution of DNA\ sequences. Genetics, 139, 993-1005 (1995).

\ \

Chain/Net:

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron:\ Duplication, deletion, and rearrangement in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).\ \

Multiz:

\

\ Blanchette, M., Kent, W.J., Riemer, C., Elnitski, .L, Smit, A.F.A., Roskin,\ K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D.,\ Miller, W.\ Aligning Multiple Genomic Sequences with the Threaded Blockset Aligner.\ Genome Res. 14(4), 708-15 (2004).\ \

Blastz:

\

\ Chiaromonte, F., Yap, V.B., and Miller, W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ compGeno 1 autoScale Off\ group compGeno\ longLabel Human/Mouse(mm3)/Rat(rn3)/Chimp(pt0) Multiz Alignments\ maxHeightPixels 100:40:11\ pairwise hmrg\ priority 103\ shortLabel HMRP\ spanList 1\ speciesOrder pt0 mm3 rn3\ track multizMm3Rn3Pt0\ type wigMaf 0.0 1.0\ visibility hide\ wiggle multizMm3Rn3Pt0PhastCons\ yLineOnOff Off\ humorMm3Rn3 Human/Mouse/Rat maf Human/Mouse(mm3)/Rat(rn3) Humor Alignments 0 104 100 0 0 100 0 0 0 0 0

Description

\ This track displays multiz alignments of the mouse Feb. 2003 (mm3) draft\ assembly and the rat Jun. 2003 (rn3) assembly to the human genome.

\ \

Methods

\ Multiz is a multiple alignment program that takes\ blastz \ "Best-in-Genome" alignments (axtBest or axtNet) as input. For human/rodent\ alignments, it uses the same scoring matrix as blastz between pairs of sequences: \
\
        A     C     G     T\
  A    91  -114   -31  -123\
  C  -114   100  -125   -31\
  G   -31  -125   100  -114\
  T  -123   -31  -114    91\
\
  O = 400, E = 30, K = 3000, L = 3000, M = 50\
\ \ For the mouse/rat alignments the following matrix was used:\ \
\
\
        A     C     G     T\
  A    86  -135   -68  -157\
  C  -135   100  -148   -68\
  G   -68  -148   100  -135\
  T  -157   -68  -135    86\
\
  O = 600, E = 50\
\

\ The overall score is the sum of the score over all pairs.

\ \

Credits

\

\ This track was created at UCSC using a program called humor (HUman-MOuse-Rat),\ which is a special variant of the Multiz program created by Minmei \ Hou and Webb Miller of the \ Penn State Bioinformatics Group. \ Jim Kent wrote\ axtBest\ and the scripts to run multiz genome-wide and to display\ the alignments in this browser. \ Mouse sequence data \ are provided by the Mouse Genome Sequencing Consortium. \ Rat sequence data are provided by the Rat Sequencing \ Consortium.

\ \

References

\ \

\ Blanchette, M, Kent, WJ, Riemer, C, Elnitski, L, Smit, AF,\ Roskin, KM, Baertsch, R, Rosenbloom, K, Clawson, H, Green, ED,\ Haussler, D, Miller, W (2004). \ Aligning multiple genomic sequences with the threaded blockset aligner.\ Genome Res. 14(4):708-15.\

\ Chiaromonte F, Yap VB, Miller W (2002). \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002;:115-26.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, and \ Miller W (2003). \ Human-Mouse \ Alignments with BLASTZ. Genome Res. 13(1):103-7.

\ compGeno 0 altColor 100,0,0\ color 100,0,0\ group compGeno\ longLabel Human/Mouse(mm3)/Rat(rn3) Humor Alignments\ priority 104\ shortLabel Human/Mouse/Rat\ track humorMm3Rn3\ type maf\ visibility hide\ PhyloHMMcons_CFTR PhyloHMMcons CFTR wig 0.0 1.0 Phylo-HMM-based conservation, CFTR (post. prob. of slowest of 10 rates) 0 105 175 150 128 255 128 0 0 0 0 compGeno 0 altColor 255,128,0\ autoScale Off\ color 175,150,128\ group compGeno\ longLabel Phylo-HMM-based conservation, CFTR (post. prob. of slowest of 10 rates)\ priority 105\ shortLabel PhyloHMMcons CFTR\ spanList 1\ track PhyloHMMcons_CFTR\ type wig 0.0 1.0\ visibility hide\ phastConsElementsPaper Most Cons. (Std) bed 5 . PhastCons Conserved Elements, Standardized Across Species 0 105.1 0 0 0 127 127 127 1 0 0 compGeno 1 exonArrows off\ group compGeno\ longLabel PhastCons Conserved Elements, Standardized Across Species\ priority 105.1\ shortLabel Most Cons. (Std)\ showTopScorers 200\ track phastConsElementsPaper\ type bed 5 .\ useScore 1\ visibility hide\ phyloHMMcons_HMR PhyloHMMcons HMR wig 0.0 1.0 Phylo-HMM-based conservation, human-mouse-rat (post. prob. of slowest of 10 rates) 0 106 175 150 128 255 128 0 0 0 0

Description

\ This track plots the level of evolutionary conservation along the genome,\ as estimated from multiple alignments of the human (hg16), mouse\ (mm3), and rat (rn3) genomes. The conservation score shown here is\ based on a phylogenetic hidden Markov model (phylo-HMM).

\ \

Methods

\ \

A phylo-HMM is a probabilistic model that describes both the process\ of DNA substitution at each site in a genome, and the way this process\ changes from one site to the next (Felsenstein and Churchill 1996,\ Yang 1995, Siepel and Haussler 2003, Siepel and Haussler 2004). \ A phylo-HMM can be thought\ of as a machine that generates a multiple alignment, in the same way\ that an ordinary hidden Markov model (HMM) generates an individual\ sequence. While the states of an ordinary HMM are associated with\ simple multinomial probability distributions, however, the states of a\ phylo-HMM are associated with more complex distributions defined by\ probabilistic phylogenetic models. These distributions can capture\ differences in the rates and patterns of nucleotide \ substitution observed in different types of genomic regions (e.g., coding\ or noncoding regions, conserved or nonconserved regions).\ \

To compute a conservation score, we use a\ k-state phylo-HMM, whose k associated phylogenetic\ models differ only in overall evolutionary rate (Felsenstein and\ Churchill 1996, Yang 1995). (In the picture at right, k = 3,\ but in practice, we use k = 10.) A phylogenetic model is\ estimated globally, using the discrete gamma model for rate variation\ (Yang 1994), then a scaled version of the estimated model is associated\ with each state in a phylo-HMM (see picture). (There is a separate\ "rate constant," r_i, for each state i, which is\ multiplied by all branch lengths in the globally estimated model.) The\ transition probabilities between states allow for autocorrelation of\ substitution rates, i.e., for adjacent sites to tend to exhibit similar\ overall substitution rates. A single parameter lambda describes the\ degree of autocorrelation and defines all transition probabilities (see\ picture). Here, we have estimated the rate constants from the data,\ similarly to Yang (1995) (see Siepel and Haussler 2003), but have\ allowed lambda to be treated as a tuning parameter. For the\ conservation score, we use the posterior probability that each site was\ "generated" by the state having the smallest rate constant. Because of\ the way the rate categories are defined, the plotted values can be\ thought of as approximately representing the posterior probability that\ each site is among the 10% most conserved sites in the data set\ (allowing for autocorrelation of substitution rates).

\ \

In this case, the general reversible (REV) substitution model was\ used in parameter estimation, and lambda was set to 0.9. Alignment\ gaps were treated as missing data, which sometimes has the effect of\ producing undesirably high posterior probabilities in gappy regions of\ the alignment. We are looking at several possible ways of improving\ the handling of alignment gaps.

\ \

Credits

\

\ This track was created with tree estimation and phylo-HMM software by Adam\ Siepel, and plotting software ("wiggle track") by Hiram Clawson.

\ \

References

\ \

J. Felsenstein and G. A. Churchill. A hidden Markov model approach to\ variation among sites in rate of evolution.\ Mol. Biol. Evol. 13:93-104, 1996.

\ \

A. Siepel and D. Haussler. Combining\ phylogenetic and hidden Markov models in biosequence analysis.\ In Proc. 7th Annual \ Int'l Conf. on Research in Computational Molecular Biology (RECOMB\ 2003), pages 277-286, 2003.

\

\ A. Siepel and D. Haussler. Phylogenetic hidden Markov models.\ In R. Nielsen, ed., Statistical Methods in Molecular Evolution, \ Springer, (2005).

\ \

Z. Yang. A space-time process model for the evolution of DNA\ sequences. Genetics, 139:993-1005, 1995.

\ \

Z. Yang. Maximum likelihood phylogenetic estimation from DNA sequences\ with variable rates over sites: approximate methods,\ J. Mol. Evol. 39:306-314, 1994.

\ compGeno 0 altColor 255,128,0\ autoScale Off\ color 175,150,128\ group compGeno\ longLabel Phylo-HMM-based conservation, human-mouse-rat (post. prob. of slowest of 10 rates)\ priority 106\ shortLabel PhyloHMMcons HMR\ spanList 1\ track phyloHMMcons_HMR\ type wig 0.0 1.0\ visibility hide\ NISC NISC TBA maf TBA Alignments of NISC Regions 0 107 100 0 0 100 0 0 0 0 1 chr7, compGeno 0 altColor 100,0,0\ chromosomes chr7,\ color 100,0,0\ group compGeno\ longLabel TBA Alignments of NISC Regions\ priority 107.0\ shortLabel NISC TBA\ track NISC\ type maf\ visibility hide\ NISC_phyloHMM NISC phyloHMMcons wig 0.0 1.0 Phylo-HMM-based Conservation for NISC Targets 0 107.1 175 150 128 255 128 0 0 0 1 chr7, compGeno 0 altColor 255,128,0\ autoScale Off\ chromosomes chr7,\ color 175,150,128\ group compGeno\ longLabel Phylo-HMM-based Conservation for NISC Targets\ priority 107.1\ shortLabel NISC phyloHMMcons\ track NISC_phyloHMM\ type wig 0.0 1.0\ visibility hide\ celeraHCM Celera HPM maf Human/Chimp/Mouse Celera Alignments (Science issue 302) 0 109 0 0 0 127 127 127 0 0 0 compGeno 0 group compGeno\ longLabel Human/Chimp/Mouse Celera Alignments (Science issue 302)\ priority 109\ shortLabel Celera HPM\ track celeraHCM\ type maf\ visibility hide\ chainDm2 D. mel. Chain chain dm2 D. melanogaster (Apr. 2004 (BDGP R4/dm2)) Chained Alignments 0 109 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows D. melanogaster/human genomic alignments using\ a gap scoring system that allows longer gaps than traditional\ affine gap scoring systems. It can also tolerate gaps in both D. melanogaster \ and human simultaneously. These "double-sided"\ gaps can be caused by local inversions and overlapping deletions\ in both species. The D. melanogaster sequence is from the Apr. 2004 (BDGP R4/dm2) (dm2)\ assembly.

\

\ The chain track displays boxes joined together by either single or \ double lines. The boxes represent aligning regions. \ Single lines indicate gaps that are largely due to a deletion in the \ D. melanogaster assembly or an insertion in the human assembly.\ Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one \ species. In cases where there are multiple \ chains over a particular portion of the human genome, chains with \ single-lined gaps are often due to processed pseudogenes, while chains \ with double-lined gaps are more often due to paralogs and unprocessed \ pseudogenes. In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and \ location (in thousands) of the match for each matching alignment.

\ \ \

Display Conventions and Configuration

\

By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

\

\ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

\ \

Methods

\

\ Transposons that have been inserted since the D. melanogaster/human\ split were removed, and the resulting abbreviated genomes were\ aligned with blastz. The transposons were then put back into the\ alignments. The resulting alignments were converted into axt format\ and the resulting axts fed into axtChain. AxtChain organizes all the \ alignments between a single D. melanogaster and a single human chromosome\ into a group and makes a kd-tree out of all the gapless subsections\ (blocks) of the alignments. Next, maximally scoring chains of these\ blocks were found by running a dynamic program over the kd-tree. Chains\ scoring below a threshold were discarded; the remaining chains are\ displayed here.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his\ program RepeatMasker.

\

\ The axtChain program was developed at the University of California\ at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.\

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Chained Alignments\ otherDb dm2\ priority 109\ shortLabel D. mel. Chain\ spectrum on\ track chainDm2\ type chain dm2\ visibility hide\ netDm2 D. mel. Net netAlign dm2 chainDm2 D. melanogaster (Apr. 2004 (BDGP R4/dm2)) Alignment Net 0 109.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best D. melanogaster/Human \ chain for every part of the Human genome. It is useful for\ finding orthologous regions and for studying genome rearrangement.\ The D. melanogaster sequence used in this annotation is \ from the Apr. 2004 (BDGP R4/dm2) (dm2) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Alignment Net\ otherDb dm2\ priority 109.1\ shortLabel D. mel. Net\ spectrum on\ track netDm2\ type netAlign dm2 chainDm2\ visibility hide\ phastConsTopPaper phastCons HCE bed 5 . PhastCons Highly Conserved Elements (HCEs) 0 109.25 0 100 0 127 177 127 0 0 0 compGeno 1 color 0,100,0\ group compGeno\ longLabel PhastCons Highly Conserved Elements (HCEs)\ priority 109.25\ shortLabel phastCons HCE\ track phastConsTopPaper\ type bed 5 .\ visibility hide\ uc16 Ultra Conserved bed 4 . Ultraconserved Elements (200 bp 100% ID in rat/mouse/human) 0 109.31 150 0 0 202 127 127 0 0 0 compGeno 1 color 150,0,0\ exonArrows off\ group compGeno\ longLabel Ultraconserved Elements (200 bp 100% ID in rat/mouse/human)\ priority 109.31\ searchMethod exact\ searchType bed\ shortLabel Ultra Conserved\ track uc16\ type bed 4 .\ visibility hide\ ux16 Extended Ultras bed 4 . Ultras Extended Until 5 Bases Below 85% in Conservation Track 0 109.32 100 0 100 177 127 177 0 0 0 compGeno 1 color 100,0,100\ exonArrows off\ group compGeno\ longLabel Ultras Extended Until 5 Bases Below 85% in Conservation Track\ priority 109.32\ shortLabel Extended Ultras\ track ux16\ type bed 4 .\ visibility hide\ HMRConservation HMRConservation sample 0 .466217 Human/Mouse/Rat Evolutionary Conservation Score 0 110 100 50 0 175 150 128 0 0 0 compGeno 0 altColor 175,150,128\ color 100,50,0\ group compGeno\ longLabel Human/Mouse/Rat Evolutionary Conservation Score\ priority 110\ shortLabel HMRConservation\ track HMRConservation\ type sample 0 .466217\ visibility hide\ iscaRetrospectiveComposite ISCA Retro gvf International Standards for Cytogenetic Arrays Consortium - Retrospective variants 0 110 0 0 0 127 127 127 0 0 0

Description

\

\ \ \ \
\

\

\

NOTE:\ While the ISCA data are\ open to the public, users seeking information about a personal medical or\ genetic condition are urged to consult with a qualified physician for\ diagnosis and for answers to personal medical questions.\

\ \

UCSC presents these data for use by qualified professionals, and even\ such professionals should use caution in interpreting the significance of \ information found here. No single data point should be taken at face \ value and such data should always be used in conjunction with as much \ corroborating data as possible. No treatment protocols should be \ developed or patient advice given on the basis of these data without \ careful consideration of all possible sources of information.\

\ \

No attempt to identify individual patients should\ be undertaken. No one is authorized to attempt to identify patients \ by any means.\

\
\
\ \

\ \

\ The International \ Standards for Cytogenomic Arrays (ISCA) Consortium is \ group of clinical cytogenetics and molecular genetics laboratories \ that has been organized to improve the quality of patient care in clinical \ genetic testing using new molecular cytogenetic technologies. \ These technologies include array comparative genomic hybridization \ (aCGH) and quantitative SNP analysis by microarrays or bead chips.\ Membership in the ISCA Consortium is open to all individuals and \ laboratories involved in cytogenetic array testing who are committed \ to free data sharing and to participation in a process to develop \ evidence-based standards and guidelines to improve patient care.\

\ \

\ The ISCA retrospective dataset \ displays microarray data submitted to dbGaP by cytogenetics labs \ of the ISCA Consortium, showing genomic regions \ found in patient who were referred for genetic testing for disorders \ such as mental retardation, developmental \ delay, autism and gross congenital malformation. \ \

Some of the deletions and duplications reported in this track have been\ identified as causative for the phenotype by clinical cytogeneticists at \ those locations, and to the best of their knowledge represent the cause \ of the reported phenotype. It should be noted that phenotype information\ is often vague and imprecise and should be used with caution. While all \ samples were submitted because of a phenotype in a patient, only 15% of\ patients had variants determined to be causal.\

\ \

Many samples have multiple variants, not all of which are causative \ of the phenotype. The retrospective dataset was obtained\ from patients who were not asked for consent for release of their genetic \ information into a public database. To protect their privacy, the \ chromosome imbalances in these samples have been decoupled so it is not\ possible to connect multiple imbalances as coming from a single patient.\ It is therefore not possible to identify individuals via their genotype. \

\ \ \

Methods and Color Convention

\

\ The samples were analyzed by oligo array CGH microarray from patients referred for \ cytogeneic testing due to clinical phenotypes. Samples were analyzed on a \ variety of microarray platforms, with a resolution of 20-75 kb, using pooled\ reference samples as control. Several consecutive probes were required\ before a region was determined to be either amplified or deleted.\ Where available, deletion endpoint uncertainty is described on the details page \ for each sample, but is not shown graphically on the Browser image..\

\ \

Data were submitted to \ dbGaP at NCBI and thence decoupled as described into \ dbVar for unrestricted release. \

\ \

\ The entries are colored red for deletions and \ blue for duplications. \

\ \

Verification

\

\ Data were validated using a number of different methods, including\ variously, BAC aCGH, FISH, Karyotype and RT-PCR.\

\ \

Credits

\

\ Thanks to the ISCA Consortium, Christa Martin and Erin Baldwin of Emory University, \ David Ledbetter \ of the Geisinger Institute, Eric Thorland of Mayo Clinic and Swaroop Aradhya \ of GeneDx for samples and analysis. Thanks to Deanna Church and \ Justin Paschall of NCBI for consultation and integration into dbGaP and dbVar, \ and Angie Hinrichs and Robert Kuhn at UCSC for engineering.\

\ \

References

\

Miller DT, Adam MP, Aradhya S, Biesecker LG, Brothman AR, Carter NP, Church DM, Crolla JA, \ Eichler EE, Epstein CJ et al.\ Consensus statement: chromosomal microarray is a first-tier \ clinical diagnostic test for individuals with developmental disabilities \ or congenital anomalies.\ Am J Hum Genet. 2010 May 14;86(5):749-64.\

\ \ varRep 1 compositeTrack on\ group varRep\ longLabel International Standards for Cytogenetic Arrays Consortium - Retrospective variants\ noScoreFilter .\ priority 110\ shortLabel ISCA Retro\ track iscaRetrospectiveComposite\ type gvf\ visibility hide\ tba9MammalCFTR 9 Species CFTR maf CFTR Region TBA Alignments (human,mouse,rat,chimp,baboon,cow,pig,cat,dog) 0 111 0 0 0 127 127 127 0 0 0 compGeno 0 group compGeno\ longLabel CFTR Region TBA Alignments (human,mouse,rat,chimp,baboon,cow,pig,cat,dog)\ priority 111\ shortLabel 9 Species CFTR\ track tba9MammalCFTR\ type maf\ visibility hide\ exoFish Exofish Ecores bed 5 . Exofish Tetraodon/Human Evolutionarily Conserved Regions 1 111 0 60 120 200 220 255 1 0 0

Description

\

The Exofish track shows regions of homology with the \ pufferfish Tetraodon nigroviridis. \ exofish@genoscope.cns.fr. The following paper describes \ Exofish: 'Estimate of human gene number provided by \ genome-wide analysis using Tetraodon nigroviridis \ DNA sequence' Nature Genetics volume 25 page 235, \ June 2000.

\

Credits

\ This information \ was provided by Olivier Jaillon and Hugues Roest Crollius at Genoscope. \ For further information and other Exofish tools please visit the \ \ Genoscope Exofish web site, or \ email exofish@genoscope.cns.fr\ \ compGeno 1 altColor 200,220,255\ color 0,60,120\ group compGeno\ longLabel Exofish Tetraodon/Human Evolutionarily Conserved Regions\ priority 111\ shortLabel Exofish Ecores\ spectrum on\ track exoFish\ type bed 5 .\ visibility dense\ CFTR25 25 Species CFTR wigMaf 0.0 1.0 CFTR Region 25 Species TBA Alignments & PhyloHMM Cons 0 112 0 10 100 0 90 10 0 0 0 compGeno 1 altColor 0,90,10\ autoScale Off\ color 0, 10, 100\ group compGeno\ longLabel CFTR Region 25 Species TBA Alignments & PhyloHMM Cons\ maxHeightPixels 100:40:11\ pairwise CFTR 20\ priority 112\ sGroup_mammal opossum dunnart platypus\ sGroup_placental rabbit rn3 mm3 cow pig horse cat dog ajbat cpbat hedgehog\ sGroup_primate chimp orangutan baboon macaque vervet lemur\ sGroup_vertebrate chicken zfish tetra fr1\ shortLabel 25 Species CFTR\ spanList 1\ speciesGroups primate placental mammal vertebrate\ track CFTR25\ treeImage phylo/cftr25.gif\ type wigMaf 0.0 1.0\ visibility hide\ wiggle chr7_phyloHMMcons_CFTR\ yLineOnOff Off\ blatFish Tetraodon Blat psl xeno Tetraodon nigroviridis Translated Blat Alignments 1 112 0 60 120 200 220 255 1 0 0

Description

\

\ This track displays translated alignments of 728 million bases of \ Tetraodon whole genome shotgun reads vs. the human genome. \ Areas highlighted by this track are quite likely to be coding regions.

\ \

Methods

\

\ The alignments were made with blat in translated protein mode requiring two \ nearby 4-mer matches to trigger a detailed alignment. The human genome was \ masked with RepeatMasker and Tandem Repeat Finder before running blat.

\ \

Credits

\

\ Many thanks to Genoscope for providing the Tetraodon sequence.

\ \

References

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ compGeno 1 altColor 200,220,255\ color 0,60,120\ group compGeno\ longLabel Tetraodon nigroviridis Translated Blat Alignments\ priority 112\ shortLabel Tetraodon Blat\ spectrum on\ track blatFish\ type psl xeno\ visibility dense\ phyloHMM_leptin PhyloHMMcons Leptin wig 0.0 1.0 Phylo-HMM-based conservation, Leptin 0 113 175 150 128 255 128 0 0 0 1 chr7, compGeno 0 altColor 255,128,0\ autoScale Off\ chromosomes chr7,\ color 175,150,128\ group compGeno\ longLabel Phylo-HMM-based conservation, Leptin\ priority 113\ shortLabel PhyloHMMcons Leptin\ spanList 1\ track phyloHMM_leptin\ type wig 0.0 1.0\ visibility hide\ leptinHuman90 PhyloHMMcons Leptin 90 bed 3 . Phylo-HMM-based conservation, Leptin, 90 percent 0 113.1 0 0 0 127 127 127 0 0 1 chr7, compGeno 1 chromosomes chr7,\ group compGeno\ longLabel Phylo-HMM-based conservation, Leptin, 90 percent\ priority 113.1\ shortLabel PhyloHMMcons Leptin 90\ track leptinHuman90\ type bed 3 .\ visibility hide\ leptin Leptin TBA maf Leptin Region TBA alignments (human,mouse,rat,chimp,baboon,cow,pig,cat,dog) 0 114 100 50 0 0 50 100 0 0 1 chr7, compGeno 0 altColor 0,50,100\ chromosomes chr7,\ color 100,50,0\ group compGeno\ longLabel Leptin Region TBA alignments (human,mouse,rat,chimp,baboon,cow,pig,cat,dog)\ priority 114\ shortLabel Leptin TBA\ track leptin\ type maf\ visibility hide\ tet_waba Tetraodon Tetraodon nigroviridis Homologies 0 115 50 100 200 85 170 225 0 0 0 compGeno 0 altColor 85,170,225\ color 50,100,200\ group compGeno\ longLabel Tetraodon nigroviridis Homologies\ priority 115\ shortLabel Tetraodon\ track tet_waba\ visibility hide\ chainCi2 C. intestinalis Chain chain ci2 C. intestinalis (Mar. 2005 (JGI 2.1/ci2)) Chained Alignments 0 117 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of C. intestinalis (ci2, Mar. 2005 (JGI 2.1/ci2)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ C. intestinalis and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

\ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ C. intestinalis assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

\ \ \

Display Conventions and Configuration

\

By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

\

\ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

\ \

Methods

\

\ Transposons that have been inserted since the C. intestinalis/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with blastz, and the transposons were added back in.\ The resulting alignments were converted into psl format using the lavToPsl\ program. The psl alignments were fed into axtChain, which organizes all\ alignments between a single C. intestinalis chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ \ \ Chains scoring below a threshold were discarded; the remaining\ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ \ compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Chained Alignments\ otherDb ci2\ priority 117\ shortLabel $o_Organism Chain\ spectrum on\ track chainCi2\ type chain ci2\ visibility hide\ netCi2 C. intestinalis Net netAlign ci2 chainCi2 C. intestinalis (Mar. 2005 (JGI 2.1/ci2)) Alignment Net 0 117.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best C. intestinalis/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The C. intestinalis sequence used in this annotation is from\ the Mar. 2005 (JGI 2.1/ci2) (ci2) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

\

\ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 2003 Jan;13(1):103-7.

\ \ compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Alignment Net\ otherDb ci2\ priority 117.1\ shortLabel $o_Organism Net\ spectrum on\ track netCi2\ type netAlign ci2 chainCi2\ visibility hide\ humMusL Mouse Cons sample 0 8 Human/Mouse Evolutionary Conservation Score (std units) 0 118 175 150 128 175 150 128 0 0 0

Description

\

\ This track displays the conservation between the human and mouse genomes for \ 50 bp windows in the human genome that have at least 15 bp aligned to\ mouse. The score for a window reflects the probability that the\ level of observed conservation in that 50 bp region would occur by\ chance under neutral evolution. It is given on a logarithmic scale,\ and thus it is called the "L-score". An L-score of 1 means there is a\ 1/10 probability that the observed conservation level would occur by\ chance, an L-score of 2 means a 1/100 probability, an L-score of 3\ means a 1/1000 probability, etc. The L-scores display as\ "mountain ranges". Clicking on a mountain range, a detail page is\ displayed from which you can access the base level alignments, both\ for the whole region and for the individual 50 bp windows.\

\ \

Methods

\

\ Genome-wide alignments between human and mouse were produced by\ blastz. A set of 50 bp windows in the human genome were determined\ by scanning the sequence, sliding 5 bases at a time, and only those\ windows with at least 15 aligned bases were kept. For each window,\ a conservation score defined by\

\

\ S = sqrt(n/m(1-m))(p-m)\
\
\ was calculated, where n is the number of aligning bases in the\ window, p is the percent identity between human and mouse for these\ aligning bases, and m is the average percent identity for aligned\ neutrally evolving bases in a larger region surrounding the 50 bp\ window being scored. Neutral bases were taken from ancestral repeat\ sequences, which are relics of transposons that were inserted before\ the human-mouse split. To transform S into an L-score, the empirical\ cumulative distribution function CDF(S) = P(x < S)\ is computed from the scores of all windows genome-wide, and\ the L-score is defined as\

\
\ L = -log_10(1 - CDF(S)).\
\
\
\ The L-score\ provides a frequentist confidence assessment. A Bayesian\ calculation of the probability that a window is under\ selection can also be made using a mixture decomposition of\ the empirical density of the scores for all windows\ genome-wide into a neutral and a selected component. Details\ are given in a manuscript in preparation. The results are\ summarized in the table below.\

\
\
\
L-score       Frequentist probability       Bayesian probability\
              of this L-score or greater    that window with this\
              given neutral evolution       L-score is under\
                                            selection\
\
------------------------------------------------------------------\
\
   1                0.1                          0.32 \
  2                0.01                         0.75\
  3                0.001                        0.94\
  4                0.0001                       0.97\
  5                0.00001                      0.98\
  6                0.000001                     0.99\
    7                0.0000001                    >0.99 \
   8                0.00000001                   >0.99\
\
\
\

\ \

Using the Filter

\

The track filter can be used to configure some of the display characteristics\ of the track. \

    \
  • Interpolation: This attribute determines whether the data samples are \ displayed as discreet points on the track (the "Only samples" option) or are \ connected by a line (the "Linear interpolation" option).\
  • Fill Blocks: When the on button is selected in this option, the area \ underneath the sample points or line is filled in with gray.\
  • Track Height: Type in a new value to adjust the track height in pixels to best suit your screen display. \
  • Vertical Range: Type in a new min or max value to adjust the portion of the track's vertical\ range that is displayed. Range units are marked by pale blue horizontal lines.\
  • Maximum Interval to Interpolate Across: This attribute sets the maximum gap\ between alignments that will be spanned when the Linear Interpolation \ attribute is selected. Type in a new value to increase or decrease the interval.\

\ When you have finished configuring the filter, click the Submit button.\ \

Credits

\

\ Thanks to Webb Miller and Scott Schwartz for creating the blastz\ alignments, Jim Kent for post-processing them, and \ Mark Diekhans for scoring the windows and selecting out the ancestral repeats. \ Krishna Roskin created S-scores for these windows. Ryan Weber computed the CDF \ for these S-scores, and created the remaining track display functions. Mouse sequence data are provided by the Mouse Genome Sequencing Consortium.\

\ \ compGeno 0 altColor 175,150,128\ color 175,150,128\ group compGeno\ longLabel Human/Mouse Evolutionary Conservation Score (std units)\ priority 118\ shortLabel Mouse Cons\ track humMusL\ type sample 0 8\ visibility hide\ chainCi1ProtEx chainCi1ProtEx chain ci1 chainCi1ProtEx 0 125 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel chainCi1ProtEx\ otherDb ci1\ priority 125\ shortLabel chainCi1ProtEx\ spectrum on\ track chainCi1ProtEx\ type chain ci1\ visibility hide\ chainFr1Ex chainFr1Ex chain fr1 chainFr1Ex 0 125 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel chainFr1Ex\ otherDb fr1\ priority 125\ shortLabel chainFr1Ex\ spectrum on\ track chainFr1Ex\ type chain fr1\ visibility hide\ chainFr1MergeEx chainFr1MergeEx chain fr1 chainFr1MergeEx 0 125 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel chainFr1MergeEx\ otherDb fr1\ priority 125\ shortLabel chainFr1MergeEx\ spectrum on\ track chainFr1MergeEx\ type chain fr1\ visibility hide\ chainFr1ProtEx chainFr1ProtEx chain fr1 chainFr1ProtEx 0 125 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel chainFr1ProtEx\ otherDb fr1\ priority 125\ shortLabel chainFr1ProtEx\ spectrum on\ track chainFr1ProtEx\ type chain fr1\ visibility hide\ chainGalGal2Ex chainGalGal2Ex chain galGal2 chainGalGal2Ex 0 125 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel chainGalGal2Ex\ otherDb galGal2\ priority 125\ shortLabel chainGalGal2Ex\ spectrum on\ track chainGalGal2Ex\ type chain galGal2\ visibility hide\ chainGalGal2MergeEx chainGalGal2MergeEx chain galGal2 chainGalGal2MergeEx 0 125 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel chainGalGal2MergeEx\ otherDb galGal2\ priority 125\ shortLabel chainGalGal2MergeEx\ spectrum on\ track chainGalGal2MergeEx\ type chain galGal2\ visibility hide\ chainGalGal2ProtEx chainGalGal2ProtEx chain galGal2 chainGalGal2ProtEx 0 125 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel chainGalGal2ProtEx\ otherDb galGal2\ priority 125\ shortLabel chainGalGal2ProtEx\ spectrum on\ track chainGalGal2ProtEx\ type chain galGal2\ visibility hide\ syntenyHuman Human Synteny bed 4 + Human/Mouse Synteny Using Blastz Single Coverage (100k window) 0 127 0 100 0 255 240 200 0 0 0

Description

\

\ This track shows syntenous (corresponding) regions between human and mouse chromosomes. \

Methods

\

\ We passed a 100k non-overlapping window over the genome and - using the blastz best in mouse \ genome alignments - looked for high-scoring regions with at least 40% of the bases aligning \ with the same region in mouse. 100k segments were joined together if they agreed in direction and\ were within 500 kb of each other in the human genome and within 4 Mb of each other in the mouse. \ Gaps were joined between syntenic anchors if the bases between two flanking regions agreed with \ synteny (direction and mouse location). Finally, we extended the syntenic block to include those \ areas.

\

Credits

\

\ Contact Robert \ Baertsch at UCSC for more information about this track.\ Thanks to the Mouse Genome Sequencing Consortium for providing the mouse sequence data. \ compGeno 1 altColor 255,240,200\ color 0,100,0\ group compGeno\ longLabel Human/Mouse Synteny Using Blastz Single Coverage (100k window)\ priority 127\ shortLabel Human Synteny\ track syntenyHuman\ type bed 4 +\ visibility hide\ BRout BRout psl xeno BRout 0 128 0 0 0 127 127 127 1 0 0 x 1 group x\ longLabel BRout\ priority 128\ shortLabel BRout\ spectrum on\ track BRout\ type psl xeno\ visibility hide\ tblastHg16 tblastHg16 psl xeno tblastHg16 0 128 0 0 0 127 127 127 1 0 0 x 1 group x\ longLabel tblastHg16\ priority 128\ shortLabel tblastHg16\ spectrum on\ track tblastHg16\ type psl xeno\ visibility hide\ webbNonExonic NonExonic bed 6 . Putative Non-Exonic Regions Conserved with Chicken 0 130 0 60 120 255 220 100 1 0 0

Description

\ This track contains putative non-exonic conserved regions from Webb Miller.\ The name of the item shows the corresponding coordinates in chicken \ (Feb. 2004 - galGal2).\ \ x 1 altColor 255,220,100\ color 0,60,120\ group x\ longLabel Putative Non-Exonic Regions Conserved with Chicken\ priority 130\ shortLabel NonExonic\ spectrum on\ track webbNonExonic\ type bed 6 .\ visibility hide\ blatCi1 Squirt Blat psl xeno Ciona intestinalis (Dec. 2002/ci1) Translated Blat Alignments 0 130 0 60 120 200 220 255 1 0 0

Description

\

\ The Ciona genome shotgun assembly was constructed with the DOE Joint \ Genome Institute (JGI) assembler, JAZZ, paired end sequencing reads at a \ coverage of 8.2X produced at the JGI. The assembly contains 116.7 million base\ pairs of nonrepetitive sequence in 2,501 scaffolds greater than 3 kb. Half of \ this (60 Mbp) is assembled into 117 scaffolds longer than 190 Kbp; 85% of the \ assembly (104.1 Mbp) is found in 905 scaffolds longer than 20 kb. Gene \ modeling and analysis were performed at the JGI.

\ \

Methods

\

\ The alignments were made with blat in translated protein mode requiring \ two nearby 4-mer matches to trigger a detailed alignment.

\ \

Credits

\

\ These data were freely provided by the \ JGI\ for use in this publication/correspondence only.

\

\ The 1.0 draft from \ http://genome.jgi-psf.org/ciona4/ciona4.info.htm was used \ in these alignments.

\ \

References

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ compGeno 1 altColor 200,220,255\ color 0,60,120\ group compGeno\ longLabel Ciona intestinalis (Dec. 2002/ci1) Translated Blat Alignments\ priority 130\ shortLabel Squirt Blat\ spectrum on\ track blatCi1\ type psl xeno\ visibility hide\ pseudoUcsc UCSC Pseudo bed 5 . Processed Pseudogene Locus Based on Blastz Chains UCSC 0 130 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel Processed Pseudogene Locus Based on Blastz Chains UCSC\ priority 130\ shortLabel UCSC Pseudo\ spectrum on\ track pseudoUcsc\ type bed 5 .\ visibility hide\ mrnaBad Bad mRNAs psl . Human mRNAs from GenBank with Potential Genomic Priming 0 132 0 0 0 127 127 127 1 0 0 x 1 group x\ longLabel $Organism mRNAs from GenBank with Potential Genomic Priming\ priority 132\ shortLabel Bad mRNAs\ spectrum on\ track mrnaBad\ type psl .\ visibility hide\ chainDm3 (dm3) D. mel. Chain chain dm3 D. melanogaster (Apr. 2006 (BDGP R5/dm3)) Chained Alignments 0 133.71 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows alignments of D. melanogaster (dm3, Apr. 2006 (BDGP R5/dm3)) to the\ human genome using a gap scoring system that allows longer gaps than \ traditional affine gap scoring systems. It can also tolerate gaps in both \ D. melanogaster and human simultaneously. These "double-sided"\ gaps can be caused by local inversions and overlapping deletions\ in both species.

\

\ The chain track displays boxes joined together by either single or \ double lines. The boxes represent aligning regions. \ Single lines indicate gaps that are largely due to a deletion in the \ D. melanogaster assembly or an insertion in the human assembly.\ Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one \ species. In cases where multiple chains align over a particular region of \ the human genome, the chains with single-lined gaps are often due to \ processed pseudogenes, while chains with double-lined gaps are more often \ due to paralogs and unprocessed pseudogenes.

\

\ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and \ location (in thousands) of the match for each matching alignment.

\ \ \

Display Conventions and Configuration

\

By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

\

\ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

\ \

Methods

\

\ The genomes of D. melanogaster and human were aligned with blastz.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all \ alignments between a single D. melanogaster chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program \ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. The following matrix was used:

\

\ \ \ \ \ \
 ACGT
A91-90-25-100
C-90100-100-25
G-25-100100-90
T-100-25-9091

\ Chains scoring below a threshold were discarded; the remaining \ chains are displayed in this track.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his\ RepeatMasker\ program.

\

\ The axtChain program was developed at the University of California\ at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.\

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Chained Alignments\ matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ otherDb dm3\ priority 133.71\ shortLabel ($o_db) D. mel. Chain\ spectrum on\ track chainDm3\ type chain dm3\ visibility hide\ netDm3 (dm3) D. mel. Net netAlign dm3 chainDm3 D. melanogaster (Apr. 2006 (BDGP R5/dm3)) Alignment Net 0 133.72 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best D. melanogaster/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The D. melanogaster sequence used in this annotation is from\ the Apr. 2006 (BDGP R5/dm3) (dm3) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ \ compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Alignment Net\ otherDb dm3\ priority 133.72\ shortLabel ($o_db) D. mel. Net\ spectrum on\ track netDm3\ type netAlign dm3 chainDm3\ visibility hide\ chainDm1 (dm1) D. mel. Chain chain dm1 D. melanogaster (Jan. 2003 (BDGP R3/dm1)) Chained Alignments 0 133.91 100 50 0 255 240 200 1 0 0

Description

\

\ This track shows D. melanogaster/human genomic alignments using\ a gap scoring system that allows longer gaps than traditional\ affine gap scoring systems. It can also tolerate gaps in both D. melanogaster \ and human simultaneously. These "double-sided"\ gaps can be caused by local inversions and overlapping deletions\ in both species. The D. melanogaster sequence is from the Jan. 2003 (BDGP R3/dm1) (dm1)\ assembly.

\

\ The chain track displays boxes joined together by either single or \ double lines. The boxes represent aligning regions. \ Single lines indicate gaps that are largely due to a deletion in the \ D. melanogaster assembly or an insertion in the human assembly.\ Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one \ species. In cases where there are multiple \ chains over a particular portion of the human genome, chains with \ single-lined gaps are often due to processed pseudogenes, while chains \ with double-lined gaps are more often due to paralogs and unprocessed \ pseudogenes. In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and \ location (in thousands) of the match for each matching alignment.

\ \ \

Display Conventions and Configuration

\

By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

\

\ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

\ \

Methods

\

\ Transposons that have been inserted since the D. melanogaster/human\ split were removed, and the resulting abbreviated genomes were\ aligned with blastz. The transposons were then put back into the\ alignments. The resulting alignments were converted into axt format\ and the resulting axts fed into axtChain. AxtChain organizes all the \ alignments between a single D. melanogaster and a single human chromosome\ into a group and makes a kd-tree out of all the gapless subsections\ (blocks) of the alignments. Next, maximally scoring chains of these\ blocks were found by running a dynamic program over the kd-tree. Chains\ scoring below a threshold were discarded; the remaining chains are\ displayed here.

\ \

Credits

\

\ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his\ program RepeatMasker.

\

\ The axtChain program was developed at the University of California\ at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.\

\

\ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

\ compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Chained Alignments\ otherDb dm1\ priority 133.91\ shortLabel (dm1) D. mel. Chain\ spectrum on\ track chainDm1\ type chain dm1\ visibility hide\ netDm1 (dm1) D. mel. Net netAlign dm1 chainDm1 D. melanogaster (Jan. 2003 (BDGP R3/dm1)) Alignment Net 1 133.92 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows the best D. melanogaster/Human \ chain for every part of the Human genome. It is useful for\ finding orthologous regions and for studying genome rearrangement.\ The D. melanogaster sequence used in this annotation is \ from the Jan. 2003 (BDGP R3/dm1) (dm1) assembly.

\ \

Display Conventions and Configuration

\

\ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

\

\ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

\

\ Individual items in the display are categorized as one of four types\ (other than gap):

\

    \
  • Top - the best, longest match. Displayed on level 1.\
  • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
  • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
  • NonSyn - a match to a chromosome different from the gap in the \ level above.\

\ \

Methods

\

\ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

\ \

Credits

\

\ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

\

\ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

\

\ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

\

\ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

\ \

References

\

\ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

\

\ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

\ \ compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Alignment Net\ otherDb dm1\ priority 133.92\ shortLabel (dm1) D. mel. Net\ spectrum on\ track netDm1\ type netAlign dm1 chainDm1\ visibility dense\ blatTetra Tetra Blat psl xeno Tetraodon nigroviridis Translated Blat Alignments 0 140 0 60 120 200 220 255 1 0 0

Description

\

\ This track displays translated alignments of 728 million bases of \ Tetraodon whole genome shotgun reads vs. the draft \ human genome. Areas highlighted by this track are quite likely to be \ coding regions.

\ \

Methods

\

\ The alignments were made with blat in translated protein mode requiring two \ nearby 4-mer matches to trigger a detailed alignment. The human\ genome was masked with RepeatMasker and \ Tandem \ Repeats Finder before running blat.

\ \

Credits

\

\ Many thanks to Genoscope for providing the Tetraodon sequence.

\ \

References

\

\ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

\ \ compGeno 1 altColor 200,220,255\ color 0,60,120\ group compGeno\ longLabel Tetraodon nigroviridis Translated Blat Alignments\ priority 140\ shortLabel Tetra Blat\ spectrum on\ track blatTetra\ type psl xeno\ visibility hide\ blatChimpWashu Chimp Blat - WashU psl xeno Chimp Blat Alignments - WashU 0 141 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel Chimp Blat Alignments - WashU\ priority 141\ shortLabel Chimp Blat - WashU\ spectrum on\ track blatChimpWashu\ type psl xeno\ visibility hide\ chimp Chimp sample Chimp Sample Track 0 142 100 50 0 0 0 255 0 0 1 chr7, compGeno 0 altColor 0,0,255\ chromosomes chr7,\ color 100,50,0\ group compGeno\ longLabel Chimp Sample Track\ priority 142\ shortLabel Chimp\ track chimp\ type sample\ visibility hide\ blatHg16KG Human knownGene BLAT psl protein Human knownGene BLAT 0 142 0 0 0 127 127 127 0 0 0 compGeno 1 colorChromDefault off\ group compGeno\ longLabel Human knownGene BLAT\ priority 142\ shortLabel Human knownGene BLAT\ track blatHg16KG\ type psl protein\ visibility hide\ lineageMutations LineageMutations sample Lineage Specific Mutations 0 143 0 0 0 0 160 0 0 0 0 varRep 0 altColor 0,160,0\ group varRep\ longLabel Lineage Specific Mutations\ priority 143\ shortLabel LineageMutations\ track lineageMutations\ type sample\ visibility hide\ hapmapSnps HapMap SNPs bed 6 + HapMap SNPs 0 144.48 0 0 0 127 127 127 0 0 0

Description

\

\ The HapMap Project\ identified a set of approximately four million\ common SNPs, and genotyped these SNPs in four populations in Phase II of the \ project. In Phase III, it genotyped approximately 1.4 to 1.5 million SNPs \ in eleven populations. This track shows the combined data from Phases II and III.\ The intent is that this data can be used as a reference for future studies\ of human disease. This track displays the genotype counts and allele\ frequencies of those SNPs, and (when available) shows orthologous alleles \ from the chimp and macaque reference genome assemblies.\

\

\ The four million HapMap Phase II SNPs were genotyped on individuals \ from these four human populations:\

\ Phase III expanded to eleven populations: the four above, plus the following:\ \ Each of the populations is displayed in a separate subtrack.\

\

\ The HapMap assays provide biallelic results. Over 99.8% of HapMap SNPs are\ described as biallelic in \ dbSNP build 129; \ approximately 6,800 are described as more complex types (in-del, mixed, etc).\ 70% of the HapMap SNPs are transitions: 35% are A/G, 35% are C/T.\

\

\ The orthologous alleles in chimp (panTro2) and macaque (rheMac2) \ were derived using \ liftOver.\

\

\ No two HapMap SNPs occupy the same position. Aside from 430 SNPs from the \ pseudoautosomal region of chrX and chrY, no SNP is mapped to more than one \ location in the reference genome. \ No HapMap SNPs occur on "random" chromosomes (concatenations\ of unordered and unoriented contigs).\

\ \

Display Conventions and Configuration

\

\ Note: calculation of heterozygosity has changed since the Phase II (rel22)\ version of this track. \ Observed heterozygosity is calculated as follows: each population's\ heterozygosity is computed as the proportion of heterozygous individuals in \ the population. The population heterozygosities are averaged to determine the \ overall observed heterozygosity. \ [For Phase II genotypes, expected heterozygosity was calculated \ as follows: the allele counts from all populations were summed \ (not normalized for population size)\ and used to determine overall major and minor allele frequencies. \ Assuming Hardy-Weinberg equilibrium, overall expected heterozygosity\ was calculated as two times the product of major and minor allele\ frequencies \ (see Modern Genetic Analysis, section 17-2).]\

\

\ The human SNPs are displayed in gray using a color gradient based on minor allele\ frequency. The higher the minor allele frequency, the darker the display. \ By definition, the maximum minor allele frequency is 50%. \ When zoomed to base level, the major allele is displayed for each population. \

\

\ The orthologous alleles from chimp and macaque are displayed in brown using a color \ gradient based on quality score.\ Quality scores range from 0 to 100 representing low to high quality. For \ orthologous alleles, the higher the quality, the darker the display. Quality \ scores are not available for chimp chromosomes chr21 and chrY; these were set to\ 98, consistent with the panTro2 browser quality track.\

\

\ Filters are provided for the data attributes described above. Additionally,\ a filter is provided for observed heterozgosity (average of all populations'\ observed heterozygosities).\ Filters are applied to all subtracks, even if a subtrack is not displayed.\

\ Notes on orthologous allele filters:\

    \
  • If a SNP's major allele is different between populations, no overall major\ allele for human is determined, thus the "matches major human allele"\ and "matches minor human allele" filters for\ orthologous alleles do not apply.
  • \
  • If a SNP is monomorphic in all populations, the minor allele is not\ verified in the HapMap dataset. In these cases, the filter to match\ orthologous alleles to the minor human allele will yield no results.
  • \
\

\ \

Credits

\

\ This track is based on \ International HapMap Project release 27 data, provided by the HapMap Data Coordination\ Center. \

\ \

References

\ \

HapMap Project

\

\ The International HapMap Consortium. \ A second generation human haplotype map of over 3.1 million SNPs.\ Nature. 2007 Oct 18;449(7164):851-61.

\

\ The International HapMap Consortium.\ A haplotype map of the human genome.\ Nature. 2005 Oct 27;437(7063):1299-320.

\

\ The International HapMap Consortium.\ The International HapMap Project.\ Nature. 2003 Dec 18;426(6968):789-96.

\ \

HapMap Data Coordination Center

\

\ Thorisson GA, Smith AV, Krishnan L, Stein LD.\ The International HapMap Project Web site.\ Genome Res. 2005 Nov;15(11):1592-3.

\ \

A Sampling of HapMap Literature

\

\ Gibson J, Morton NE, Collins A.\ Extended tracts of homozygosity in outbred human populations.\ Hum Mol Genet. 2006 Mar 1; 15(5):789-95.\

\

\ Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero\ MH, Carson AR, Chen W et al.\ Global variation in copy number in the human genome.\ Nature. 2006 Nov 23;444(7118):444-454.\

\ Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG.\ Common genetic variants account for differences in gene\ expression among ethnic groups. Nature Genet. 2007\ Feb;39(2):226-31.\

\

\ Tenesa A, Navarro P, Hayes BJ, Duffy DL, Clarke GM, Goddard ME, Visscher PM.\ Recent human effective population size estimated from linkage\ disequilibrium. Genome Res. 2007 Apr;17(4):520-6.\

\ Voight BF, Kudaravalli S, Wen X, Pritchard JK.\ A Map of Recent Positive Selection in the Human Genome.\ PLoS Biol. 2006 Mar;4(3):e72.

\

\ Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG.\ \ Measures of human population structure show heterogeneity among genomic\ regions. Genome Res. 2005 Nov;15(11):1468-76.\

\ \

Data Source

\

\ The genotypes_chr*_*_r27_nr.b36_fwd.txt.gz files from\
\ http://ftp.hapmap.org/genotypes/2009-02_phaseII+III/forward/\ were processed to make this track.\

\ varRep 1 compositeTrack on\ exonArrows off\ group varRep\ longLabel HapMap SNPs\ priority 144.48\ shortLabel HapMap SNPs\ track hapmapSnps\ type bed 6 +\ visibility hide\ hapmapLd HapMap LD bed 4 + HapMap Linkage Disequilibrium - Phase II 0 144.49 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,

Description

\

\ Linkage disequilibrium (LD) is the association of alleles on\ chromosomes. It measures the difference between the observed allele\ frequency for a two locus allele as compared to its expected\ frequency, which is the product of the two single allele frequencies.\ When LD is low, the two loci tend to be inherited in a nearly random\ manner.

\

\ This track shows three different measures of linkage disequilibrium\ — D', r2, and LOD (log odds) — between\ pairs of SNPs as genotyped by the HapMap consortium. LD is useful for\ understanding the associations between genetic variants throughout the\ genome, and can be helpful in selecting SNPs for genotyping.

\

\ By default, LOD values are displayed in full mode. Each diagonal\ represents a different SNP with each diamond representing a pairwise\ comparison between two SNPs. Shades are used to indicate linkage\ disequilibrium between the pair of SNPs, with darker shades indicating\ stronger LD. For the LOD values, additional colors are used in some\ cases:\

    \
  • White diamonds indicate pairwise D' values less than 1\ with no statistically significant evidence of LD (LOD < 2).
  • \
  • Light blue diamonds indicate high D' values (>0.99) with\ low statistical significance (LOD < 2).
  • \
  • Light pink diamonds are drawn when the statistical\ significance is high (LOD >= 2) but the D' value is low (less\ than 0.5).
  • \
\

\ \

Methods

\

\ Genotypes from HapMap Phase II release 19 were used with Haploview to\ infer phasing and calculate LD values for all SNP pairs within 250 kb.\ As the children in the trios are not independent samples, Haploview\ uses only the parents from those populations. The YRI and CEU tracks\ each use 60 unrelated individuals (parents from the trios), the CHB\ and JPT tracks use 45 unrelated individuals each.\

\

\ Haploview uses a two marker EM (ignoring missing data) to estimate the\ maximum-likelihood values of the four gamete frequencies, from which\ the D', LOD, and r2 calculations derive. Haplotype phase\ is inferred using a standard EM algorithm with a partition-ligation\ approach for blocks with greater than 10 markers.

\ \

Display Conventions and Configuration

\

\

    \
  • \ Display Mode
    \
      \
    • Full mode shows the pairwise LD values in a\ Haploview-style mountain plot.
    • \
    • Dense mode shows the pairwise LD values in a single line\ for each population, where the intensity at each position is\ the average of all of the LD values between the SNP at that\ position and all other SNPs within 250 kb.
    • \
    \
  • \
  • \ LD Values: measures of linkage disequilibrium
    \
      \
    • r2 displays the raw r2 value, or\ the square of the correlation coefficient for a given marker\ pair. SNPs that have not been separated by recombination have\ r2 = 1; in this case, these two markers are said to\ be redundant for genotyping, but may have different functional\ effects. Lower r2 values show a lower degree of LD,\ indicating that some recombination has occurred in this\ population. See Hill and Robertson (1966) for details.\
    • \
    • D' displays the raw D' value, which is the normalized\ covariance for a given marker pair. A D' value of 1 (complete\ LD) indicates that two SNPs have not been separated by\ recombination, while lower values indicate evidence of\ recombination in the history of the sample. Only D' values\ near 1 are a reliable measure of LD; lower values are difficult\ to interpret as the magnitude of D' depends strongly on sample\ size. See Lewontin (1988) for more details.
    • \
    • LOD displays the log odds score for linkage\ disequilibrium between a given marker pair, and is shown by\ default.
    • \
    \
  • \
  • \ Track Geometry
    \
      \
    • Trim to triangle shows the standard mountain plot\ (default); turning this option off will show LD values with\ SNPs outside the window.
    • \
    • Inverting makes it easier to visually compare two\ adjacent populations.
    • \
    \
  • \
  • Colors
    \
      \
    • LD Values can be drawn in a variety of colors, with red\ as default. The intensity of the color is proportional to the\ strength of the LD measure chosen above.
    • \
    • Outlines can be drawn in contrasting colors or turned\ off. Outlines are automatically suppressed when the window is\ larger than 100,000 bp.
    • \
    \
  • \
  • Population Selection
    \ The HapMap populations can be individually displayed or hidden.\
      \
    • YRI: Yoruba people in Ibadan, Nigeria (30 parent-and-adult-child trios)
    • \
    • CEU: European samples from the Centre d'Etude du Polymorphisme Humain (CEPH) (30 trios)
    • \
    • CHB: Han Chinese in Beijing (45 unrelated individuals)
    • \
    • JPT: Japanese in Tokyo (45 unrelated individuals)
    • \
    \
  • \
\

\ \

Credits

\

\ This track was created by \ Daryl Thomas \ at UCSC using \ data \ from the \ International HapMap Project, \ following the display style from \ Haploview.\

\ \

References

\ \

HapMap Project

\

\ The International HapMap Consortium.\ A haplotype map of the human genome.\ Nature 437, 1299-1320 (2005).

\

\ The International HapMap Consortium.\ The International HapMap Project.\ Nature 426, 789-96 (2003).

\ \

HapMap Data Coordination Center

\

\ Thorisson, G.A., Smith, A.V., Krishnan, L. and Stein, L.D. \ The International HapMap Project Web site.\ Genome Res 15, 1591-3 (2005).

\ \

Haploview

\

\ Barrett, J.C., Fry, B., Maller, J. and Daly, M.J. \ Haploview: analysis and visualization of LD and haplotype \ maps. \ Bioinformatics 21(2), 263-5 (2005).

\ \

General references on Linkage Disequilibrium

\

\ Lewontin, R.C.\ On measures of gametic disequilibrium.\ Genetics 120, 849-52 (1988).

\

\ Hill, W. G. and Robertson, A. The effect of linkage on limits to artificial selection. Genet. Res., 8:269-294 (1966).\

\ varRep 0 canPack off\ chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX\ compositeTrack on\ dataVersion HapMap release 19\ group varRep\ longLabel HapMap Linkage Disequilibrium - Phase II\ priority 144.49\ shortLabel HapMap LD\ track hapmapLd\ type bed 4 +\ visibility hide\ hapmapLdPh HapMap LD Phased ld2 HapMap Linkage Disequilibrium - Phase II - from phased genotypes 0 144.491 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,

Description

\

\ Linkage disequilibrium (LD) is the association of alleles on\ chromosomes. It measures the difference between the observed allele\ frequency for a two-locus allele combination as compared to its expected\ frequency, which is the product of the two single allele frequencies.\ When LD is low, the two loci tend to be inherited in a nearly random\ manner.

\

\ This track shows three different measures of linkage disequilibrium\ — D', r2, and LOD (log odds) — between\ pairs of SNPs as genotyped by the HapMap consortium. LD is useful for\ understanding the associations between genetic variants throughout the\ genome, and can be helpful in selecting SNPs for genotyping.

\

\ By default, the display in full mode shows LOD values. Each diagonal\ represents a different SNP with each diamond representing a pairwise\ comparison between two SNPs. Shades are used to indicate linkage\ disequilibrium between the pair of SNPs, with darker shades indicating\ stronger LD. For the LOD values, additional colors are used in some\ cases:\

    \
  • White diamonds indicate pairwise D' values less than 1\ with no statistically significant evidence of LD (LOD < 2).
  • \
  • Light blue diamonds indicate high D' values (>0.99) with\ low statistical significance (LOD < 2).
  • \
  • Light pink diamonds are drawn when the statistical\ significance is high (LOD >= 2) but the D' value is low (less\ than 0.5).
  • \
\

\ \

Methods

\

\ Phased genotypes from HapMap Phase II release 22 were used with Haploview to\ calculate LD values for all SNP pairs within 250 kb. \ The YRI and CEU tracks each use 30 parents+child trios (90 individuals)\ and the combined JPT+CHB track uses 90 unrelated individuals.

\

\ Haploview uses a two marker EM (ignoring missing data) to estimate the\ maximum-likelihood values of the four gamete frequencies, from which\ the D', LOD, and r2 calculations derive.

\ \

Display Conventions and Configuration

\

\

    \
  • \ Display Mode
    \
      \
    • Full mode shows the pairwise LD values in a\ Haploview-style mountain plot.
    • \
    • Dense mode shows the pairwise LD values in a single line\ for each population, where the intensity at each position is\ the average of all of the LD values between the SNP at that\ position and all other SNPs within 250 kb.
    • \
    \
  • \
  • \ LD Values: measures of linkage disequilibrium
    \
      \
    • r2 displays the raw r2 value, or\ the square of the correlation coefficient for a given marker\ pair. SNPs that have not been separated by recombination have\ r2 = 1; in this case, these two markers are said to\ be redundant for genotyping, but may have different functional\ effects. Lower r2 values show a lower degree of LD,\ indicating that some recombination has occurred in this\ population. See Hill and Robertson (1966) for details.\
    • \
    • D' displays the raw D' value, which is the normalized\ covariance for a given marker pair. A D' value of 1 (complete\ LD) indicates that two SNPs have not been separated by\ recombination, while lower values indicate evidence of\ recombination in the history of the sample. Only D' values\ near 1 are a reliable measure of LD; lower values are difficult\ to interpret as the magnitude of D' depends strongly on sample\ size. See Lewontin (1988) for more details.
    • \
    • LOD displays the log odds score for linkage\ disequilibrium between a given marker pair, and is shown by\ default.
    • \
    \
  • \
  • \ Track Geometry
    \
      \
    • Trim to triangle shows the standard mountain plot\ (default); turning this option off will show LD values with\ SNPs outside the window.
    • \
    • Inverting makes it easier to visually compare two\ adjacent populations.
    • \
    \
  • \
  • Colors
    \
      \
    • LD Values can be drawn in a variety of colors, with red\ as default. The intensity of the color is proportional to the\ strength of the LD measure chosen above.
    • \
    • Outlines can be drawn in contrasting colors or turned\ off. Outlines are automatically suppressed when the window is\ larger than 100,000 bp.
    • \
    \
  • \
  • Population Selection
    \ The HapMap populations can be individually displayed or hidden. \
      \
    • YRI: Yoruba people in Ibadan, Nigeria (30 parent-and-adult-child trios)
    • \
    • CEU: European samples from the Centre d'Etude du Polymorphisme Humain (CEPH) (30 trios)
    • \
    • JPT+CHB: Combination of Japanese in Tokyo (45 unrelated individuals) and Han Chinese in Beijing (45 unrelated individuals)
    • \
    \
  • \
\

\ \

Credits

\

\ This track was created at UCSC using \ data \ from the \ International HapMap Project\ and LD scores were computed using the \ Haploview \ program.\ The genome browser track display was created by \ Daryl Thomas following the display style from \ Haploview.\

\ \

References

\ \

HapMap Project

\

\ The International HapMap Consortium.\ A second generation human haplotype map of over 3.1 million SNPs.\ Nature. 2007 Oct 18;449(7164):851-61.

\

\ The International HapMap Consortium.\ A haplotype map of the human genome.\ Nature. 2005 Oct 27;437(7063):1299-320.

\

\ The International HapMap Consortium.\ The International HapMap Project.\ Nature. 2003 Dec 18;426(6968):789-96.

\ \

HapMap Data Coordination Center

\

\ Thorisson GA, Smith AV, Krishnan L, Stein LD. \ The International HapMap Project Web site.\ Genome Res. 2005 Nov;15(11):1592-3.

\ \

Haploview

\

\ Barrett JC, Fry B, Maller J, Daly MJ. \ Haploview: analysis and visualization of LD and haplotype \ maps. \ Bioinformatics. 2005 Jan 15;21(2):263-5. Epub 2004 Aug 5.

\ \

General references on Linkage Disequilibrium

\

\ Lewontin, RC.\ On measures of gametic disequilibrium.\ Genetics. 1988 Nov;120(3):849-52.

\

\ Hill WG, Robertson A.\ The effect of linkage on limits to artificial selection.\ Genet Res. 1966 Dec;8(3):269-94.

\ varRep 0 canPack off\ chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22\ compositeTrack on\ dataVersion HapMap release 22\ group varRep\ longLabel HapMap Linkage Disequilibrium - Phase II - from phased genotypes\ priority 144.491\ shortLabel HapMap LD Phased\ track hapmapLdPh\ type ld2\ visibility hide\ rgdSslp RGD SSLP bed 4 . Rat Genome Database Simple Sequence Length Polymorphisms 0 144.5 12 12 120 133 133 187 0 0 0 http://rgd.mcw.edu/generalSearch/RgdSearch.jsp?quickSearch=1&searchKeyword=

Description

\

\ Simple sequence-length polymorphisms (SSLPs) are \ also known as microsatellite DNA. SSLPs consist of 1 - 6 simple nucleotide \ repeat sequences that are highly polymorphic in repeat length among strains. \ They are often used as genetic markers for genotyping.

\ \

Methods

\

\ The annotation data file, \ RGD_SSLP.gff, was downloaded from the Rat Genome Database\ (RGD) website and processed to create this track.

\ \

Credits

\

\ Thanks to the RGD for \ providing this annotation. RGD is funded by grant HL64541 entitled "Rat \ Genome Database", awarded to Dr. Howard J Jacob, Medical College of \ Wisconsin, from the National Heart Lung and Blood Institute \ (NHLBI) of the National \ Institutes of Health (NIH).\

\ \ varRep 1 color 12,12,120\ group varRep\ longLabel Rat Genome Database Simple Sequence Length Polymorphisms\ priority 144.5\ shortLabel RGD SSLP\ track rgdSslp\ type bed 4 .\ url http://rgd.mcw.edu/generalSearch/RgdSearch.jsp?quickSearch=1&searchKeyword=\ visibility hide\ hapmapLdHotspot HapMap LD Hotspots bedGraph 4 Hotspots of Linkage Disequilibrium in the HapMap 0 144.55 0 0 0 127 127 127 0 0 23 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX, varRep 0 autoScale Off\ chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX\ compositeTrack on\ group varRep\ longLabel Hotspots of Linkage Disequilibrium in the HapMap\ maxHeightPixels 64:32:16\ priority 144.55\ shortLabel HapMap LD Hotspots\ track hapmapLdHotspot\ type bedGraph 4\ viewLimits 0:16\ visibility hide\ tajdSnp Tajima's D SNPs bed 4 . Tajima's D SNPs 0 144.6 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows the SNPs that were used in the calculation of Tajima's D \ (Tajima, 1989), a measure of nucleotide\ diversity, estimated from the Perlegen data set (Hinds et al., 2005).\ Tajima's D is a statistic used to compare an observed nucleotide\ diversity against the expected diversity under the assumption that all\ polymorphisms are selectively neutral and constant population size.

\ \

Methods

\

\ See the Tajima's D track or Carlson et al. for more details on the \ use of this track.

\ \

Credits

\

\ This track was created at the University of Washington using gfetch\ from the Nickerson Laboratory and the R statistical software package.

\ \

References

\

\ Tajima, F. \ Statistical method for testing the neutral mutation hypothesis \ by DNA polymorphism. \ Genetics 123, 585-595 (1989).

\

\ Carlson, C.S., Thomas, D.J., Eberle, M., Livingston, R., Rieder, M. \ Nickerson, D.A. \ Genomic regions exhibiting positive selection identified from \ dense genotype data. \ Genome Res 15, 1553-65 (2005).

\ varRep 1 compositeTrack on\ group varRep\ longLabel Tajima's D SNPs\ priority 144.6\ shortLabel Tajima's D SNPs\ track tajdSnp\ type bed 4 .\ visibility hide\ tajD Tajima's D bedGraph 4 Tajima's D 0 144.65 0 0 0 127 127 127 0 0 0

Description

\

\ \ This track shows Tajima's D (Tajima, 1989), a measure of nucleotide\ diversity, estimated from the Perlegen data set (Hinds et al., 2005).\ Tajima's D is a statistic used to compare an observed nucleotide\ diversity against the expected diversity under the assumption that all\ polymorphisms are selectively neutral and constant population size.\ \

Methods

\ \ Tajima's D was estimated in 100 kbp sliding windows across the\ autosomal genome, reporting the Tajima's D measure at the central 10\ kbp of the window and stepping by 10 kbp. Thus, the Tajima's D for\ the window chr1:100,001-200,000 is reported at coordinates\ chr1:145,001-155,000, the Tajima's D for the window\ chr1:110,001-210,000 is reported at coordinates chr1:155,001-165,000,\ and so forth.\

\ The theoretical distribution of Tajima's D (95% c.i. between -2 and\ +2) assumes that polymorphism ascertainment is independent of allele\ frequency. High values of Tajima's D suggest an excess of common\ variation in a region, which can be consistent with balancing\ selection, population contraction. Negative values of Tajima's D, on\ the other hand, indicate an excess of rare variation, consistent with\ population growth, or positive selection. Population admixture can\ lead to either high or low Tajima's D values in theory. Demographic\ parameters would be expected to affect the genome more evenly than\ selective pressures, so previous analyses have suggested that using\ the empiric distribution of Tajima's D from a collection of regions\ across the genome provides advantages in assessing whether selection\ or demography might explain an observed deviation from\ expectation. Because of the ascertainment bias toward common\ polymorphism in the Perlegen data set, positive Tajima's D values are\ difficult to interpret, and modeling ascertainment is difficult.\ However, given that the ascertainment bias raises the mean of the\ distribution, extreme negative values in extended regions can be\ useful in qualitatively identifying interesting regions for full\ resequencing and more rigorous theoretical analysis of nucleotide\ diversity. For further discussion, see Carlson et al. (2005).\

\ In full display mode, this track shows the nucleotide diversity across\ three human populations: 23 individuals of African American Descent\ (AD), 24 individuals of European Descent (ED) and 24 individuals of\ Chinese Descent (XD), as well as the polymorphic sites within each\ population used to estimate nucleotide diversity. Only SNPs observed\ to be polymorphic within each subpopulation were used in the Tajima's\ D calculation. Nucleotide diversity is shown in dense display mode\ using a grayscale density gradient, with light colors indicating low\ diversity.\ \

Credits

\

\ This track was created at the University of Washington using gfetch\ from the Nickerson Laboratory and the \ R statistical software package.

\ \

References

\

\ Tajima, F. \ Statistical method for testing the neutral mutation hypothesis \ by DNA polymorphism. \ Genetics 123, 585-595 (1989).

\

\ Carlson, C.S., Thomas, D.J., Eberle, M., Livingston, R., Rieder, M. \ Nickerson, D.A. \ Genomic regions exhibiting positive selection identified from \ dense genotype data. \ Genome Res 15, 1553-65 (2005).

\ varRep 0 autoScale Off\ compositeTrack on\ group varRep\ longLabel Tajima's D\ maxHeightPixels 128:64:11\ maxLimit 5\ minLimit -4\ priority 144.65\ shortLabel Tajima's D\ track tajD\ type bedGraph 4\ viewLimits -2.5:3\ visibility hide\ gvPos Locus Variants bed 4 + Compilation of Human Variants from LSDBs 0 144.7 150 0 150 202 127 202 0 0 0 Return to PhenCode
\ Return to PhenCode advanced query page\
\

Disclaimer

\

\ PhenCode is intended for research purposes only. Although the data are freely\ available to all, users should treat the reported mutations with extreme\ caution in clinical settings or for any diagnostic or population screening\ purpose. This information requires expertise to interpret properly; clinical\ diagnosis and/or treatment recommendations should be made only by medical \ professionals. \

\ \ Terms and Conditions of Use

\

\ \

Description

\

\ This track is a result of the PhenCode project. It consolidates variants\ from many curated locus-specific databases.\ Rich genotype and phenotype information is provided. This version includes\ entries from the databases listed in the Methods section below. \ This covers approximately 1/3 of the loci listed on the \ HGVS database listing.\ The coverage of the genome can be viewed using \ Genome Graphs.\ Work is in progress to add more locus-specific databases.\

\ \

Display Conventions and Configuration

\

\ This track is color-coded by the mutation type.

\ The colors for mutation type are:\

    \
  • \ substitution = purple\
  • \ insertion = green\
  • \ duplication = orange\
  • \ deletion = blue\
  • \ complex = brown\
  • \ unknown = black\
\

\

\ Items are labeled with the Human Genome Variation Society (HGVS) \ name. \

\ \

Methods

\

\ The data shown in this track were obtained from the following variant sources:\

\

The HGVS-style name\ was pulled directly or indirectly from the source\ data. The information in this name, in combination with alignments of the\ reference sequence against\ the genome sequence, was used to position the variants. The source may have\ additional variants that could not be mapped to the genome. Additional \ attributes displayed for each variant depend on the information available\ from the different source databases. If the source has web-accessible entries\ for the variants, links are provided back to the source.\

\ \

Credits

\ \

PhenCode developers

\
\
Belinda Giardine, Ross Hardison, Webb Miller, Cathy Riemer\
Center for Comparative Genomics and Bioinformatics, Penn State University,\ University Park, Pennsylvania\
Fan Hsu, Jim Kent, Andrew Kern, Robert Kuhn, Heather Trumbower\
Center for Biomolecular Science and Engineering, University of California,\ Santa Cruz, California\
Richard Gibbons, Doug Higgs, Jim Hughes\
Weatherall Institute of Molecular Medicine, Oxford, United Kingdom\
Garry Cutting, Andrew P. Feinberg\
Johns Hopkins University School of Medicine, Baltimore, Maryland\
\

Cooperating databases

\
\
AD&FTD Mutation Database:\
Marc Cruts,\ Flanders Institute for Biotechnology and\ University of Antwerp, Belgium\

\
ALPL:\
Etienne Mornet,\ Centre Hospitalier de Versailles, Le Chesnay, France\

\
ARdb:\
Bruce Gottlieb,\ Lady Davis Institute for Medical Research, Sir Mortimer B. Davis Jewish\ General Hospital, Montreal, Quebec, Canada\ \

\
BGMUT:\
Olga O. Blumenfeld and Santosh K. Patnaik,\ Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New\ York\

\
CA2base:\
Mauno Vihinen,\ Institute of Medical Technology, University of Tampere, Tampere, Finland\

\
CASRdb:\ \
Geoffrey Hendy and David Cole,\ McGill University, Montreal, Quebec, Canada\

\
CBS:\
Jan P. Kraus and Miroslav Janosik, \ University of Colorado School of Medicine, Aurora, Colorado
\ Viktor Kozich,\ Institute of Inherited Metabolic Diseases,\ Charles University in Prague, First Faculty of Medicine,\ Czech Republic\

\
\ CFMDB:\ \
Julian Zielenski and Richard Sang,\ The Hospital for Sick Children, Genetics and Genomic Biology, Toronto,\ Ontario, Canada\

\
CLCN7base:\
Mauno Vihinen,\ Institute of Medical Technology, University of Tampere, Tampere, Finland\

\
\ dbPEX:\ \
Nancy Braverman, Institute of Genetic Medicine and Dept. of Pediatrics,\ Johns Hopkins Medical Center, Baltimore, MD
\ Steven Steinberg, Dept.of Neurology,\ Johns Hopkins University School of Medicine, Baltimore, MD\

\
dbRIP:\ \
Ping Liang, Roswell Park Cancer Institute, Buffalo, New York
\ Mark Batzer, Louisiana State University, Baton Rouge, LA\

\
F12base:\
Mauno Vihinen,\ Institute of Medical Technology, University of Tampere, Tampere, Finland\

\
Fanconi:\
Arleen D. Auerbach, Francis Lach, \ The Rockefeller University, New York, NY\

\
FHC Mutation Database, version 1.1:\ \
This project is a collaborative effort among the Department of Molecular and\ Clinical Genetics and Department of Cardiology at the Royal Prince Alfred\ Hospital, and the Australian National Genomic Information Services. The\ curators are:
\ Dr. Bing Yu,\ Department of Molecular Genetics,\ C39 - Royal Prince Alfred Hospital,\ The University of Sydney,\ NSW 2006 Australia
\ Professor R. Trent,\ Department of Molecular Genetics,\ K25 - Medical Foundation Building,\ The University of Sydney,\ NSW 2006 Australia\

\
HbVar:\ \
George Patrinos, Erasmus University, Rotterdam, Netherlands
\ Henri Wajcman, Hospital Henri Mondor, Creteil, France
\ David H.K. Chui, Boston University, Boston, Massachusetts
\ Nicholas Anagnou, University of Athens and IIBEAA, Athens, Greece
\ Georgi D. Efremov, Macedonian Academy of Sciences and Arts, RCGEB, Skopje,\ Macedonia\

\
HIFD:\ \
Lim Yun Ping, Centre for Molecular Medicine and the Bioinformatics Institute, Singapore\

\
IARC TP53:\
Magali Olivier,\ Pierre Hainaut,\ Group of Molecular Carcinogenesis and Biomarkers,\ France\

\
IDbases:\ \
Anne Durandy,\ Michael Hershfield,\ Laszlo Marodi,\ Luigi D. Notarangelo,\ Claudio Pignata,\ Jose R. Regueiro,\ Dirk Roos,\ C.I.Edvard Smith,\ Jouni Valiaho,\ Mauno Vihinen,\ Anna Villa,\ Institute of Medical Technology, University of Tampere, Tampere, Finland\ \

\
IPNMDB:\ \
Eva Nelis, VIB - Department of Molecular Genetics, University of Antwerp, Antwerpen, Belgium\ \

\
ISTH SSC VWF:\ \
Dan Hampshire, Anne Goodeve, Nick Beauchamp, David Lillicrap,\ Ross MacLachlan,\ The University of Sheffield, United Kingdom\ \

\
KinMutBase:\
C. Ortutay,\ Jouni Valiaho,\ K. Stenberg,\ Mauno Vihinen,\ Institute of Medical Technology, University of Tampere, Tampere, Finland\ \

\
LMDp:\
Johan T. den Dunnen,\ Leiden University Medical Center, Leiden, Nederland
\ See also the database tool LOVD\ \

\
LQTSdb:\ \
Michael Christiansen, Lars Allan Larsen, and Paal Skytt Andersen,\ Molecular Cardiology Group, Statens Serum Institut, Denmark\ \

\
MMR:\  \
Michael O. Woods,\ Faculty of Medicine, HSC,\ Memorial University of Newfoundland,\ St. John's, NL, Canada\ \

\
OSTM1base:\
Mauno Vihinen,\ Institute of Medical Technology, University of Tampere, Tampere, Finland\ \

\
PAHdb:\ \
Charles R. Scriver, McGill University, Montreal Children's Hospital\ Research Institute, Montreal, Quebec, Canada\

\
RettBASE: \
John Christodoulou,\ Gladys Ho,\ Andrew Grimm (previous database coordinator),\ Children's Hospital at Westmead, Sydney and\ University of Sydney, Australia\

\
RISN: \
Markus Preising, \ Molecular Genetics Laboratory,\ University of Regensburg, Bavaria, Germany \

\
RPGR:\
Xinhua Shu and Alan Wright\

\
SRD5A2:\
Juergen Reichardt, University of Sydney, Sydney, Australia
\ Nick Makridakis, University of Southern California, Los Angeles, California\

\
X-ALD: \
Stephan Kemp,\ Academic Medical Center/Emma Children's Hospital,\ Amsterdam, The Netherlands\

\
\ varRep 1 color 150,0,150\ group varRep\ longLabel Compilation of $Organism Variants from LSDBs\ priority 144.7\ shortLabel Locus Variants\ track gvPos\ type bed 4 +\ visibility hide\ dgv DGV Struct Var bed 9 + Database of Genomic Variants: Structural Variation (CNV, Inversion, In/del) 0 144.8 0 100 0 127 177 127 0 0 0 http://projects.tcag.ca/cgi-bin/variation/xview?source=$D&view=variation&id=Variation_$$

Description

\

\ This track displays copy number variants (CNVs), insertions/deletions (InDels),\ inversions and inversion breakpoints annotated by the \ Database of Genomic Variants (DGV), which\ contains genomic variations observed in healthy individuals. \ DGV focuses on structural variation, defined as \ genomic alterations that involve segments of DNA that are larger than\ 1000 bp. Insertions/deletions of 100 bp or larger are also included.\

\ \

Display Conventions

\

\ Color is used to indicate the type of variation. \
Note that the color scheme changed in March, 2011.\

    \
  • inversions and\ inversion breakpoints are purple. \
  • \ \
  • CNVs and InDels are blue if there is a \ gain in size relative to the reference.\
  • \ \
  • CNVs and InDels are red if there is a \ loss in size relative to the reference.\
  • \ \
  • CNVs and InDels are brown if there are reports of\ both a loss and a gain in size \ relative to the reference.\
  • \
\ \

\ \

Methods

\

\ DGV collects these variants by ongoing manual curation of the literature. \ A brief description of the method and sample used for a particular \ variant is included on the details page, along with a link to the \ PubMed abstract for the study from which the variants were collected. \

\

\ For data sets where the variation calls are reported at a \ sample-by-sample level, DGV merges calls with similar boundaries \ across the sample\ set. Only variants of the same type (i.e. CNVs, Indels, inversions)\ are merged, and gains and losses are merged separately. In addition,\ if several different platforms/approaches are used within the same\ study, these datasets are merged separately. Sample level calls that\ overlap by ≥ 70% are merged in this process.\

\ \

Credits

\

\ Thanks to the Database of Genomic Variants for providing these data.\ In citing the Database of Genomic Variants please refer to Iafrate \ et al.. \

\ \

References

\

\ Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y,\ Scherer SW, Lee C.\ Detection of large-scale variation in the human genome.\ Nat Genet. 2004 Sep;36(9):949-51.

\

\ Zhang J, Feuk L, Duggan GE, Khaja R, Scherer SW. \ Development of bioinformatics resources for display and\ analysis of copy number and other structural variants in the human\ genome.\ Cytogenet Genome Res. 2006;115(3-4):205-14

\ \ varRep 1 color 0,100,0\ dataVersion v10\ exonArrows off\ group varRep\ itemRgb on\ longLabel Database of Genomic Variants: Structural Variation (CNV, Inversion, In/del)\ noScoreFilter .\ priority 144.8\ shortLabel DGV Struct Var\ track dgv\ type bed 9 +\ url http://projects.tcag.ca/cgi-bin/variation/xview?source=$D&view=variation&id=Variation_$$\ urlLabel DGV Browser and Report:\ visibility hide\ protVarPos UniProt Variants bed 4 + UniProt Variants 0 144.8 0 200 0 127 227 127 0 0 0 \

Description

\

\ This track displays variants from the UniProt database.\

\ \

Methods

\

\ The data shown in this track were obtained from\
\ UniProt\ (Swiss-Prot/TrEMBL).\
\ Swiss Institute of Bioinformatics, Geneva, Switzerland\ \

\ \ varRep 1 color 0,200,0\ group varRep\ longLabel UniProt Variants\ priority 144.8\ shortLabel UniProt Variants\ track protVarPos\ type bed 4 +\ visibility hide\ kiddEichlerValid HGSV Validated bed 9 HGSV Validated Sites of Structural Variation 0 144.84 0 0 0 127 127 127 0 0 0

Description

\

\ Data from Human Genome Structural Variation Project.\ This track shows validated regions of structural variation in nine \ individuals from Kidd, et al.. \ Deletions, \ insertions and\ inversions are included. \ For inversions, sites corresponding to both breakpoints may be\ depicted. Clones corresponding to only a single breakpoint were\ selected to validate the site. Coordinates correspond to the variant\ region predicted by end-sequence pairs (ESPs), not to sequence-derived\ breakpoints.\

\

\ Each site was validated by at least one of these methods:\

    \
  • Agi: Agilent CGH
  • \
  • FISH: Inversion FISH assay
  • \
  • MCD: Clone fingerprint
  • \
  • NIL: Overlap with "novel" insertion locus
  • \
  • Nim: NimbleGen CGH
  • \
  • Seq: Clone sequencing
  • \ \

    \ Each individual's validated sites are in a different \ subtrack. The nine individuals' labels used in Kidd, et al., \ populations of origin, and \ Coriell Cell Repository catalog IDs are shown here:\

    \ \ \ \ \ \ \ \ \ \ \ \
    Individual  Population  Coriell ID
    ABC14CEPHNA12156
    ABC13YorubaNA19129
    ABC12CEPHNA12878
    ABC11ChinaNA18555
    ABC10YorubaNA19240
    ABC9JapanNA18956
    ABC8YorubaNA18507
    ABC7YorubaNA18517
    G248UnknownNA15510
    \ \

    Methods

    \

    \ Excerpted from Kidd, et al.:

    \ \ \
       \ We selected eight individuals as part of the first phase of the Human\ Genome Structural Variation Project. This included four individuals of\ Yoruba Nigerian ethnicity and four individuals of non-African\ ethnicity. For each individual we constructed a whole genomic library\ of about 1 million clones by using a fosmid subcloning strategy.\ Each library was arrayed and both ends of each clone insert were\ sequenced to generate a pair of high-quality end sequences (termed an\ end-sequence pair (ESP)). \ The overall approach generated a physical clone map for each\ individual human genome, flagging regions discrepant by size or\ orientation on the basis of the placement of end sequences against the\ reference assembly.\ Across all eight libraries, we mapped 6.1 million clones to distinct\ locations against the reference sequence\ (http://hgsv.washington.edu). \ Of these, 76,767 were discordant by length and/or orientation,\ indicating potential sites of structural variation. About 0.4%\ (23,742) of the ESPs mapped with only one end to the reference\ assembly despite the presence of high-quality sequence at the other\ end (termed one-end anchored (OEA) clones).\

    \ \ Fosmid clones discordant by size (n = 3,371 fosmid clones) were\ subjected to fingerprint analysis using four multiple complete\ restriction enzyme digests (MCD analysis) to confirm insert size and\ eliminate rearranged clones. Two high-density customized\ oligonucleotide microarrays (Agilent and NimbleGen) were designed to\ confirm sites of deletion and insertion (GEO accessions GSE10008 and\ GSE10037). We developed a new, expectation maximization-based\ clustering approach to genotype deletions with the use of data from\ the Illumina Human1M BeadChip collected for 125 HapMap DNA samples.\ We found that more than 98% of the children's genotypes were\ consistent with mendelian transmission on the basis of an analysis of\ 28 parent-child trios.\
    \ \

    References

    \

    \ Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, \ Teague B, Alkan C, Antonacci F, et al. \ \ Mapping and sequencing of structural variation from eight \ human genomes.\ Nature. 2008 May 1;453(7191):56-64.

    \ \ varRep 1 compositeTrack on\ group varRep\ itemRgb on\ longLabel HGSV Validated Sites of Structural Variation\ noScoreFilter .\ priority 144.84\ shortLabel HGSV Validated\ track kiddEichlerValid\ type bed 9\ visibility hide\ kiddEichlerDisc HGSV Discordant bed 12 HGSV Discordant Clone End Alignments 0 144.85 0 0 0 127 127 127 0 0 0 http://mrhgsv.gs.washington.edu/cgi-bin/hgc?i=$$&c=$S&l=$[&r=$]&db=$D&position=$S:$[-$]

    Description

    \

    \ This track shows data from the Human Genome Structural Variation Project.\ Clone ends from nine individuals from Kidd, et al. \ were mapped to the reference Human genome. This track\ shows clones whose end mappings were discordant with the reference\ genome in one of the following ways:\ \

      \
    • deletion: Clone mapping too large relative to reference
    • \
    • insertion: Clone mapping too small relative to reference
    • \
    • inversion: In appropriate orientation, clone mapping spans \ potential inversion breakpoint
    • \
    • OEA: One End Anchored clones (only one end could be mapped\ to reference)
    • \
    • transchrm: Clone ends map to different chromosomes\ (name indicates identity of other chromosome after the underscore).\
    • \ \
    \

    \

    \ Each individual's discordant clone end mappings are in a different \ subtrack. The nine individuals' labels used in Kidd, et al., \ populations of origin, and \ Coriell Cell Repository catalog IDs are shown here:\

    \ \ \ \ \ \ \ \ \ \ \ \
    Individual  Population  Coriell ID
    ABC14CEPHNA12156
    ABC13YorubaNA19129
    ABC12CEPHNA12878
    ABC11ChinaNA18555
    ABC10YorubaNA19240
    ABC9JapanNA18956
    ABC8YorubaNA18507
    ABC7YorubaNA18517
    G248UnknownNA15510
    \ \

    Methods

    \

    \ Excerpted from Kidd, et al.:

    \
    \ We selected eight individuals as part of the first phase of the Human\ Genome Structural Variation Project. This included four individuals of\ Yoruba Nigerian ethnicity and four individuals of non-African\ ethnicity. For each individual we constructed a whole genomic library\ of about 1 million clones, using a fosmid subcloning strategy.\ Each library was arrayed and both ends of each clone insert were\ sequenced to generate a pair of high-quality end sequences (termed an\ end-sequence pair (ESP)). \ The overall approach generated a physical clone map for each\ individual human genome, flagging regions discrepant by size or\ orientation on the basis of the placement of end sequences against the\ reference assembly.\ \ Across all eight libraries, we mapped 6.1 million clones to distinct\ locations against the reference sequence\ (http://hgsv.washington.edu). \ Of these, 76,767 were discordant by length and/or orientation,\ indicating potential sites of structural variation. About 0.4%\ (23,742) of the ESPs mapped with only one end to the reference\ assembly despite the presence of high-quality sequence at the other\ end (termed one-end anchored (OEA) clones).\
    \

    \ Note: This track contains many more than the 76,767 + 23,742 items\ mentioned above because it also includes clones whose ends map to\ different chromosomes (transchrm).\

    \ \

    References

    \

    \ Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, \ Teague B, Alkan C, Antonacci F, et al.\ \ Mapping and sequencing of structural variation from eight \ human genomes.\ Nature. 2008 May 1;453(7191):56-64.

    \ \ varRep 1 compositeTrack on\ group varRep\ itemRgb on\ longLabel HGSV Discordant Clone End Alignments\ ncbiAccXref kiddEichlerToNcbi\ pairedEndUrlFormat http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?&cmd=retrieve&val=CENTER_NAME%%3D'ABC'%%20and%%20LIBRARY_ID%%3D'%s'%%20and%%20TRACE_NAME%%3D'%s'&retrieve=Submit\ priority 144.85\ shortLabel HGSV Discordant\ track kiddEichlerDisc\ type bed 12\ url http://mrhgsv.gs.washington.edu/cgi-bin/hgc?i=$$&c=$S&l=$[&r=$]&db=$D&position=$S:$[-$]\ urlLabel Clone Summary (Eichler Lab Browser):\ visibility hide\ perlegen Perlegen Haplotypes bed 12 . Perlegen Common High-Resolution Haplotype Blocks 0 145.21 0 0 0 127 127 127 1 0 1 chr21,

    Description

    \

    \ Haplotype blocks derived from common single nucleotide polymorphisms (SNPs) \ on chromosome 21 by\ Perlegen Sciences, as \ described in Patil, N. et al. \ Blocks of limited haplotype diversity revealed by \ high-resolution scanning.\ Science 294, 1719-1723 (2001).

    \

    \ The location of each haplotype block is represented by\ a blue horizontal line with tall vertical blue bars at the first and\ last SNPs of the block. Blocks are displayed as starting at the first\ SNP and ending at the last SNP of the block. This is slightly\ different from the representation on the Perlegen web site in which blocks are\ stretched until they abut each other. The shade of the blue indicates the \ minimum number of SNPs required to discriminate between haplotype patterns\ that account for at least 80% of genotyped chromosomes. Darker colors\ indicate that fewer SNPs are necessary. Individual SNPs are denoted by\ smaller black vertical bars. At multi-megabase resolution in dense\ display mode, clusters of tall blue bars may indicate hotspots for\ recombination.

    \

    \ For more information on a particular block, click "Outside Link" \ on the item's details page. General information on the\ blocks is available from Perlegen's\ Chromosome 21 Haplotype Browser.

    \

    \ NOTE: Perlegen annotations appear only on chromosome 21.

    \ \

    Credits

    \

    \ Thanks to Perlegen Sciences for making these data available.

    \ varRep 1 altColor 0,0,0\ chromosomes chr21,\ color 0,0,0\ group varRep\ longLabel Perlegen Common High-Resolution Haplotype Blocks\ priority 145.21\ shortLabel Perlegen Haplotypes\ spectrum on\ track perlegen\ type bed 12 .\ visibility hide\ haplotype Haplotype Blocks bed 12 . Common Haplotype Blocks 0 145.22 0 0 0 127 127 127 1 0 1 chr22,

    Description

    \

    \ Haplotype blocks on chromosome 22 from \ The University \ of Oxford and \ The Wellcome Trust Sanger \ Institute, as described in Dawson, E. et al. \ A first-generation linkage disequilibrium map of human \ chromosome 22. Nature 418, 544-8 (2002).

    \

    \ The location of each haplotype block is represented by\ a blue horizontal line with tall vertical blue bars at the first and\ last SNPs of the block. Blocks are displayed as starting at the first\ SNP and ending at the last SNP of the block. Individual SNPs are denoted by\ smaller black vertical bars. At multi-megabase resolution in dense\ display mode, clusters of tall blue bars may indicate hotspots for\ recombination.

    \

    \ NOTE: Haplotype block annotations appear only on chromosome 22.

    \ \

    Credits

    \

    \ Thanks to The University of Oxford and the the Sanger Institute for providing these data.\ varRep 1 altColor 0,0,0\ chromosomes chr22,\ color 0,0,0\ group varRep\ longLabel Common Haplotype Blocks\ priority 145.22\ shortLabel Haplotype Blocks\ spectrum on\ track haplotype\ type bed 12 .\ visibility hide\ snpRecombRate SNP Recomb Rates bedGraph 4 Recombination Rates from SNP Genotyping 0 145.5 0 0 0 127 127 127 0 0 22 chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr20,chr21,chr22,chrX,

    Description

    \

    \ This track shows recombination rates measured in centiMorgans per\ Megabase. \ It is based on the HapMap Phase I data, release 16a, and Perlegen data (Hinds et al., 2005).\

    \

    \ Observations from sperm studies (Jeffreys et al., 2001) and\ patterns of genetic variation (McVean et al., 2004; Crawford\ et al., 2004) show that recombination rates in the human\ genome vary extensively over kilobase scales and that much\ recombination occurs in recombination hotspots. This provides an\ explanation for the apparent block-like structure of linkage\ disequilibrium (Daly et al., 2001; Gabriel et al.,\ 2002).\

    \

    \ Fine-scale recombination rate estimates provide a new route to\ understanding the molecular mechanisms underlying human recombination.\ A better understanding of the genomic landscape of human recombination\ rate variation would facilitate the efficient design and analysis of\ disease association studies and greatly improve inferences from\ polymorphism data about selection and human demographic history.\

    \ \

    Display Conventions and Configuration

    \

    \ This annotation track may be configured in a variety of ways to highlight \ different aspects of the displayed data. The graphical configuration options \ are shown at the top of the track description page. \ For more information, click the \ Graph\ configuration help link.\

    \ \

    Methods

    \

    \ Fine-scale recombination rates are estimated using the reversible-jump\ Markov chain Monte Carlo (MCMC) method (McVean et al., 2004). This\ approach explores the posterior distribution of fine-scale recombination\ rate profiles, where the state-space considered is the distribution of\ piece-wise constant recombination maps. The Markov chain explores the\ distribution of both the number and location of change-points, in addition\ to the rates for each segment. A prior is set on the number of\ change-points that increases the smoothing effect of trans-dimensional\ MCMC, which is necessary because of the composite-likelihood scheme\ employed.\

    \

    \ This method is implemented in the package \ LDhat, \ which includes full details of installation and implementation.\

    \

    \ A block-penalty of five was used (calibrated by simulation \ and comparison to data from sperm-typing studies). Each region was\ analyzed as a single run with 10,000,000 iterations, sampling every 5000th\ iteration and discarding the first third of all samples as burn-in. The\ mean posterior rate for each SNP interval is the value reported. Because of \ the non-independence of the composite likelihood scheme,\ the quantiles of the sampling distribution do not reflect true uncertainty\ and are therefore not given.\

    \

    \ Estimates were generated separately from each of the four HapMap \ populations, and then combined to give a single figure. Differences between \ populations are not significant.\

    \ \

    Validation

    \

    \ This approach has been validated in three ways: by extensive\ simulation studies and by comparisons with independent estimates of\ recombination rates, both over large scales from the genetic map and\ over fine scales from sperm analysis. Full details of validation can be \ found in McVean et al. (2004) and Winckler et al. (2005).\

    \ \

    Credits

    \

    \ The HapMap data are based on HapMap \ release 16a; the Perlegen data are from Hinds et al. (2005). \ The recombination rates were ascertained by Simon Myers from the\ Mathematical Genetics Group at the University of Oxford.\

    \ \

    References

    \

    \ Crawford, D.C., Bhangale, T., Li, N., Hellenthal, G., Rieder, M.J., \ Nickerson, D.A. and Stephens, M.\ Evidence for substantial fine-scale variation in recombination \ rates across the human genome.\ Nat Genet. 36(7), 700-6 (2004).\

    \

    \ Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J. and Lander, E.S.\ High-resolution haplotype structure in the human genome.\ Nat Genet. 29(2), 229-32 (2001).\

    \

    \ Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, \ B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M. et al.\ The structure of haplotype blocks in the human genome.\ Science 296(5576), 2225-9 (2002).\

    \

    \ Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.G., Frazer, K.A., Cox, D.R.\ Whole-Genome Patterns of Common DNA Variation in Three Human Populations.\ Science 307(5712), 1072-1079 (2005).\

    \

    \ Jeffreys, A.J,. Kauppi, L. and Neumann, R.\ Intensely punctate meiotic recombination in the class II region \ of the major histocompatibility complex.\ Nat Genet. 29(2), 217-22 (2001).\

    \

    \ McVean, G.A., Myers, S.R., Hunt, S., Deloukas, P., Bentley, D.R. and Donnelly, \ P.\ The fine-scale structure of recombination rate variation in the \ human genome.\ Science 304(5670), 581-4 (2004).\

    \

    \ Winckler, W., Myers, S.R., Richter, D.J., Onofrio, R.C., McDonald, G.J., \ Bontrop, R.E., McVean, G.A., Gabriel, S.B., Reich, D., Donnelly, P. \ et al.\ Comparison of fine-scale recombination rates in humans and \ chimpanzees.\ Science 308(5718), 107-11 (2005).\

    \ \ varRep 0 autoScale Off\ chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr20,chr21,chr22,chrX\ compositeTrack on\ group varRep\ longLabel Recombination Rates from SNP Genotyping\ maxHeightPixels 64:32:16\ maxLimit 100\ minLimit 0\ origAssembly hg16\ priority 145.5\ shortLabel SNP Recomb Rates\ track snpRecombRate\ type bedGraph 4\ viewLimits 0:16\ visibility hide\ snpRecombHotspot SNP Recomb Hots bed 3 . Recombination Hotspots from SNP Genotyping 0 145.51 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This track shows the location of recombination hotspots detected from\ patterns of genetic variation. \ It is based on the HapMap Phase I data, release 16a, and Perlegen data (Hinds et al., 2005).\

    \

    \ Observations from sperm studies (Jeffreys et al., 2001) and\ patterns of genetic variation (McVean et al., 2004; Crawford\ et al., 2004) show that recombination rates in the human\ genome vary extensively over kilobase scales and that much\ recombination occurs in recombination hotspots. This provides an\ explanation for the apparent block-like structure of linkage\ disequilibrium (Daly et al., 2001; Gabriel et al.,\ 2002).\

    \

    \ Recombination hotspot estimates provide a new route to\ understanding the molecular mechanisms underlying human recombination.\ A better understanding of the genomic landscape of human recombination\ hotspots would facilitate the efficient design and analysis of\ disease association studies and greatly improve inferences from\ polymorphism data about selection and human demographic history.\

    \ \

    Methods

    \

    \ Recombination hotspots are identified using the likelihood-ratio test\ described in McVean et al. (2004) and Winckler et al. (2005), \ referred to as LDhot. For successive intervals of 200 kb, the maximum\ likelihood of a model with a constant recombination rate is compared\ to the maximum likelihood of a model in which the central 2 kb is a\ recombination hotspot (likelihoods are approximated by the composite\ likelihood method of Hudson 2001). The observed difference in log\ composite likelihood is compared against the null distribution, which\ is obtained by simulations. Simulations are matched for sample size,\ SNP density, background recombination rate and an approximation to the\ ascertainment scheme (a panel of 12 individuals with a Poisson number\ of chromosomes, mean 1, sampled from this panel, using a single hit\ ascertainment scheme for dbSNP and resequencing of 16 individuals for\ the ten HapMap ENCODE regions). Evidence for a hotspot was assessed in\ each analysis panel separately (YRI, CEU and combined CHB+JPT), and\ p-values were combined such that a hotspot requires that two of the\ three populations show some evidence of a hotspot (p < 0.05) and at\ least one population showed stronger evidence for a hotspot\ (p < 0.01). Hotspot centers were estimated at those locations where\ distinct recombination rate estimate peaks occurred with at least a factor \ of two separation between peaks, within the low p-value intervals.\

    \ \

    Validation

    \

    \ This approach has been validated in three ways: Over large scales from \ the genetic map, both by extensive simulation studies and by comparisons with \ independent estimates of recombination rates, and over fine scales from sperm \ analysis. Full details of validation can be \ found in McVean et al. (2004) and Winckler et al. (2005).\

    \ \

    Credits

    \

    \ The HapMap data are based on HapMap \ release 16a; the Perlegen data are from Hinds et al. (2005). \ The recombination hotspots were ascertained by Simon Myers from the\ Mathematical Genetics Group at the University of Oxford.\

    \ \

    References

    \

    \ Crawford, D.C., Bhangale, T., Li, N., Hellenthal, G., Rieder, M.J., \ Nickerson, D.A. and Stephens, M.\ Evidence for substantial fine-scale variation in recombination \ rates across the human genome.\ Nat Genet. 36(7), 700-6 (2004).\

    \

    \ Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J. and Lander, E.S.\ High-resolution haplotype structure in the human genome.\ Nat Genet. 29(2), 229-32 (2001).\

    \

    \ Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, \ B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M. et al.\ The structure of haplotype blocks in the human genome.\ Science 296(5576), 2225-9 (2002).\

    \

    \ Hudson, R. R. Two-locus sampling distributions and their application. Genetics 159(4):1805-1817 (2001).\

    \

    \ Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.G., Frazer, K.A., Cox, D.R.\ Whole-Genome Patterns of Common DNA Variation in Three Human Populations.\ Science 307(5712), 1072-1079 (2005).\

    \

    \ Jeffreys, A.J,. Kauppi, L. and Neumann, R.\ Intensely punctate meiotic recombination in the class II region \ of the major histocompatibility complex.\ Nat Genet. 29(2), 217-22 (2001).\

    \

    \ McVean, G.A., Myers, S.R., Hunt, S., Deloukas, P., Bentley, D.R. and Donnelly, \ P.\ The fine-scale structure of recombination rate variation in the \ human genome.\ Science 304(5670), 581-4 (2004).\

    \

    \ Winckler, W., Myers, S.R., Richter, D.J., Onofrio, R.C., McDonald, G.J., \ Bontrop, R.E., McVean, G.A., Gabriel, S.B., Reich, D., Donnelly, P. \ et al.\ Comparison of fine-scale recombination rates in humans and \ chimpanzees.\ Science 308(5718), 107-11 (2005).\

    \ varRep 1 compositeTrack on\ group varRep\ longLabel Recombination Hotspots from SNP Genotyping\ origAssembly hg16\ priority 145.51\ shortLabel SNP Recomb Hots\ track snpRecombHotspot\ type bed 3 .\ visibility hide\ affy500K Affy 500K bed 6 + Affy GeneChip Mapping 500K Array 0 145.9 0 0 0 127 127 127 0 0 0 https://www.affymetrix.com/LinkServlet?probeset=$$

    Description

    \ \

    This track shows locations of the Single Nucleotide Polymorphisms\ on the Affymetrix 500K Mapping Array. The SNPs in this track include\ all of the markers from the array that have been successfully mapped\ against the current assembly.

    \ \

    The SNPs were selected from public domain databases. Genotypes for\ these markers were generated at Affymetrix using GeneChip(r) DNA\ analysis technology and are available at their site.

    \ \

    Credits

    \ \

    Thanks to dbSnp at the NCBI for providing the public SNPs.\ \

    Thanks to Affymetrix, Inc. for developing the genotyping\ array. For more details on this genotyping assay, please see the\ supplemental information on the Affymetrix 500K SNP product.

    \ \

    Terms of Use

    \

    Please see the Terms and Conditions for the use of this data at the\ Affymetrix, Inc.\ site.

    \ \ \ varRep 1 group varRep\ longLabel Affy GeneChip Mapping 500K Array\ priority 145.9\ shortLabel Affy 500K\ track affy500K\ type bed 6 +\ url https://www.affymetrix.com/LinkServlet?probeset=$$\ visibility hide\ omim OMIM bed 4 . NIH's Online Mendelian Inheritance in Man 0 146 0 0 0 127 127 127 0 0 0 http://www.ncbi.nlm.nih.gov/omim/$$ varRep 1 group varRep\ longLabel NIH's Online Mendelian Inheritance in Man\ priority 146\ shortLabel OMIM\ track omim\ type bed 4 .\ url http://www.ncbi.nlm.nih.gov/omim/$$\ visibility hide\ genomicSuperDups Segmental Dups bed 6 . Duplications of >1000 Bases of Non-RepeatMasked Sequence 0 146 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This track shows regions detected as putative genomic duplications within the\ golden path. The following display conventions are used to distinguish\ levels of similarity:\

      \
    • \ Light to dark gray: 90 - 98% similarity\
    • \ Light to dark yellow: 98 - 99% similarity\
    • \ Light to dark orange: greater than 99% similarity \
    • \ Red: duplications of greater than 98% similarity that lack sufficient \ Segmental Duplication Database evidence (most likely missed overlaps) \
    \ For a region to be included in the track, at least 1 Kb of the total \ sequence (containing at least 500 bp of non-RepeatMasked sequence) had to \ align and a sequence identity of at least 90% was required.

    \ \

    Methods

    \

    \ Segmental duplications play an important role in both genomic disease \ and gene evolution. This track displays an analysis of the global \ organization of these long-range segments of identity in genomic sequence.\

    \ \

    Large recent duplications (>= 1 kb and >= 90% identity) were detected\ by identifying high-copy repeats, removing these repeats from the genomic \ sequence ("fuguization") and searching all sequence for similarity. The \ repeats were then reinserted into the pairwise alignments, the ends of \ alignments trimmed, and global alignments were generated.\ For a full description of the "fuguization" detection method, see Bailey \ et al. (2001) in the References section below. This method has become \ known as WGAC (whole-genome assembly comparison); for example, see Bailey \ et al. (2002).\ \

    Credits

    \

    \ These data were provided by \ Ginger Cheng, \ Xinwei She, \ Tin Louie and\ Evan Eichler \ at the University of Washington.

    \ \

    References

    \

    \ Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, \ Myers EW, Li PW, Eichler EE.\ Recent segmental duplications in the human genome.\ Science. 2002 Aug 9;297(5583):1003-7.

    \

    \ Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE.\ Segmental duplications: organization and impact within the \ current human genome project assembly.\ Genome Res. 2001 Jun;11(6):1005-17.

    \ varRep 1 group varRep\ longLabel Duplications of >1000 Bases of Non-RepeatMasked Sequence\ noScoreFilter .\ priority 146\ shortLabel Segmental Dups\ track genomicSuperDups\ type bed 6 .\ visibility hide\ cnp Structural Var bed 4 + Structural Variation 0 146 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This annotation shows regions detected as putative copy number polymorphisms\ (CNP) and sites of detected intermediate-sized structural variation (ISV). \ The CNPs and ISVs were determined by various methods, displayed in \ individual subtracks within the annotation:

    \
      \
    • \ BAC microarray analysis (Sharp): 160 putative CNP regions detected by BAC\ microarray analysis in a population of 47 individuals comprised of 8 \ Chinese, 4 Japanese, 10 Czech, 2 Druze, 7 Biaka, 9 Mbuti, and 7 Amerindians. \
    • \ BAC microarray analysis (Iafrate): 255 putative CNP regions detected by\ BAC microarray analysis in a population of 55 individuals, 16 of which had\ previously-characterized chromosomal abnormalities. The group consisted of 10\ Caucasians, 4 Amerindians, 2 Chinese, 2 Indo-Pakistani, 2 Sub-Saharan\ African, and 35 of unknown ethnic origin.\
    • \ Representational oligonucleotide microarray analysis (ROMA) (Sebat): 81 putative\ CNP regions detected by ROMA in a population of 20 normal individuals comprised\ of 1 Biaka, 1 Mbuti, 1 Druze, 1 Melanesian, 4 French, 1 Venezualan, 1 Cambodian,\ 1 Mayan and 9 of unknown ethnicity.\
    • \ Fosmid mapping (Tuzun): 285 ISV sites detected by mapping paired-end sequences \ from a human fosmid DNA library.\
    • \ Deletions from genotype analysis (McCarroll): 541 deletions detected\ by analysis of SNP genotypes, using the HapMap Phase I data, release 16a.\
    • \ Deletions from genotype analysis (Conrad): 935 deletions detected\ by analysis of SNP genotypes, using the HapMap Phase I data, release 16c.1, \ CEU and YRI samples.\
    • \ Deletions from haploid hybridization analysis (Hinds): 100 deletions \ from haploid hybridization analysis in 24 unrelated individuals from the \ Polymorphism Discovery Resource, selected for SNP LD study.\

    \ \

    Display Conventions and Configuration

    \

    \ CNP and ISV regions are indicated by solid blocks that are color-coded to \ indicated the type of variation detected:\

      \
    • \ Green: gain (duplications)\
    • \ Red: loss (deletions)\
    • \ Blue: gain and loss (both deletion and duplication)\
    • \ Black: inversion\

    \ \

    Sharp subtrack

    \

    \ On the details pages for elements in this subtrack, \ the table shows value/threshold data for each individual in the population.\ "Value" is defined as the log2 ratio of fluorescence intensity of\ test versus reference DNA. "Threshold" is defined as 2 standard \ deviations from the mean log2 ratio of all autosomal clones per \ hybridization. \ The "Disease Percent" value reflects the percent of the BAC that lies \ within a "rearrangement hotspot", as defined in Sharp et al. \ (2005) (the rationale used to choose BACs for the array construction). A \ rearrangement hotspot is defined by the presence of flanking intrachromosomal \ duplications >10 kb in length with >95% similarity and separated by \ 50 kb - 10 Mb of intervening sequence.

    \ \

    Tuzun subtrack

    \

    \ Items are labeled using the following naming convention:\

      \
    • First letter: rearrangement type (D=deletion, I=insertion, \ V=inversion).\
    • Second letter: association with repeat or duplication\ (R=human-specific repeat, D=duplication, N=neither \ (unique)).\
    • Third letter: second haplotype support (N=variant site lacking\ support from the human genome reference, S=variant site with support \ from the human genome reference). \

    \ \

    Conrad subtrack

    \

    \ The method used to identify these deletions approximates the breakpoints of each\ event; therefore, a set of minimal and maximal endpoints is associated with each\ deletion. Thick lines delineate\ the minimally deleted region; thin lines delineate the maximally deleted region.\ \

    Methods

    \ \

    Sharp BAC microarray analysis

    \

    \ All hybridizations were performed in duplicate incorporating a dye-reversal \ using a custom array consisting of 2,194 end-sequence or FISH-confirmed BACs, \ targeted to regions of the genome flanked by segmental duplications. \ The false positive rate was estimated at ~3 clones per 4,000 tested.

    \ \

    Iafrate BAC microarray analysis

    \

    \ All hybridizations were performed in duplicate incorporating a dye-reversal \ using proprietary 1 Mb GenomeChip V1.2 Human BAC Arrays consisting of 2,632 BAC \ clones (Spectral Genomics, Houston, TX). The false positive rate was estimated \ at ~1 clone per 5,264 tested.

    \

    \ Further information is available from the \ Database of Genome \ Variants website.

    \ \

    Sebat ROMA

    \

    \ Following digestion with BglII or HindIII, genomic DNA was hybridized to a \ custom array consisting of 85,000 oligonucleotide probes. The probes were \ selected to be free of common repeats and have unique homology within the human \ genome. The average resolution of the array was ~35 kb; however, only intervals \ in which three consecutive probes showed concordant signals were scored as \ CNPs. All hybridizations were performed in duplicate incorporating a \ dye-reversal, with the false positive rate estimated to be ~6%.

    \

    \ Note that CNP intervals, as detailed by Sebat et al. (2004), were \ converted from the April 2003 human genome assembly (NCBI Build 33) to the \ July 2003 assembly (NCBI Build 34) using the UCSC liftOver tool.

    \ \

    Tuzun fosmid mapping

    \

    \ Paired-end sequences from a human fosmid DNA library were mapped to the assembly. \ The average resolution of this \ technique was ~8 kb, and included 56 sites of inversion not detectable by \ the array-based approaches. However, because of the physical constraints of \ fosmid insert size, this technique was unable to detect insertions greater than \ 40 kb in size.

    \ \

    McCarroll genotype analysis

    \

    \ A segregating deletion can leave "footprints" in SNP genotype data, including\ apparent deviations from Mendelian inheritance, apparent deviations from\ Hardy-Weinberg equilibrium and null genotypes. Using these clues to discover\ true variants is challenging, however, because the vast majority of such observations\ represent technical artifacts and genotyping errors.\

    \

    \ To determine whether a subset of "failed" SNP genotyping assays in the HapMap data\ might reflect structural variation, the authors examined whether such failures\ were physically clustered in a manner that is specific to individuals. Consistent\ with this hypothesis, the rate of Mendelian-inconsistent genotypes was elevated\ near other Mendelian-inconsistent genotypes in the same individual but was unrelated to\ Mendelian inconsistencies in other individuals.\

    \

    \ The authors systematically looked for regions of the genome in which the\ same failure profile appeared repeatedly at nearby markers in a manner that\ was statistically unexpected based on chance. A set of statistical thresholds was \ tailored to each mode of failure, genotyping center and genotyping platform used in the\ project. The same procedure could readily apply to dense SNP data from any\ platform or study.

    \ \ \

    Conrad genotype analysis

    \

    \ SNPs in regions that are hemizygous for a deletion are generally miscalled as homozygous \ for the allele that is present. Hence, when a deletion is transmitted from parent to child, \ the genotypes at SNPs within the deletion region will often appear to violate the rules of Mendelian \ transmission. The authors developed a simple algorithm for scanning trio data for unusual runs of \ consecutive SNPs that, in a single family, have genotype configurations consistent with the presence of a deletion. \ \

    Hinds haploid hybridization analysis

    \

    \ Approximately 600 Mb of genomic DNA from 24 unrelated individuals\ was obtained from the Polymorphism Discovery Resource.\ Haploid hybridization was used to identify genomic intervals\ showing a reduced hybridization signal in comparison to the reference\ assembly. PCR amplification was performed on 215 candidate deletions.\ 100 deletions were selected that were unambiguously confirmed.\

    Validation

    \

    McCarroll genotype analysis

    \

    \ Four methods of validation were used: \ fluorescent in situ hybridization (FISH), \ two-color fluorescence intensity measurements, PCR amplification and quantitative PCR.\

    \

    \ The authors performed fluorescent in situ hybridization (FISH) for five\ candidate deletions large enough to span available FISH probes. In all five cases,\ FISH assays confirmed the deletions in the predicted individuals.\

    \

    \ The authors examined two-color allele-specific fluorescence data from SNP genotyping\ assays from a data subset available at the Broad Institute, looking for a \ reduction in fluorescence intensity in individuals predicted to carry a \ deletion. At most SNPs\ in the genome, fluorescence intensity measurements cluster into two or three\ discrete groups corresponding to homozygous and hetrozygous genotypes.\ At 15 of 17 candidate deletion loci, fluorescence intensity data for one or more\ SNPs clustered into additional groups that corresponded to the predicted deletion\ genotypes.\

    \

    \ The authors used PCR amplification to query 60 loci for which the pattern of genotypes\ suggested multiple individuals with homozygous deletions. Variants were considered\ confirmed if the pattern of amplication success and failure matched prediction\ across a set of 12-24 individuals. The authors confirmed 51 of 60 candidate\ variants by this criterion.\

    \

    \ The authors performed quantitative PCR in all 269 HapMap DNA samples for 11 candidate\ deletions that overlapped the coding exons of genes and that were discovered in\ many individuals. At 10/11 loci, the authors observed three discrete clusters, identifying \ individuals with zero, one and two gene copies.\ All 60 trios displayed Mendelian inheritance for the ten deletions, as well as\ Hardy-Weinberg equilibrium in all four populations surveyed, and transmission rates\ close to 50%. This suggests that the deletions behave as a stable, heritable\ genetic polymorphism.\

    \ \

    Conrad genotype analysis

    \

    \ The authors first tested 12 predicted deletions using quantitative PCR. \ For all 12 deletions they observed DNA concentrations consistent with transmission of a deletion \ from parent to child. \

    To provide more extensive validation by comparative genome hybridization (CGH), the authors designed a \ custom oligonucleotide microarray comprised of 380,000 probes that tile across all 134 candidate deletions \ identified in nine HapMap offspring (8 YRI and 1 CEU). \ The results of this CGH analysis indicate that the majority (about 85%) of candidate deletions detected \ by the method are real.\ \

    References

    \

    \ Conrad, D., Andrews, T.D., Carter, N.P., Hurles, M.E., Pritchard, J.K.\ A high-resolution survey of deletion polymorphism in the human genome.\ Nature Genet 38(1), 75-81 (2006).

    \

    \ Hinds, D., Kloek, A.P., Jen, M., Chen, X., Frazer, K.A.\ Common deletions and SNPs are in linkage disequilibrium in the human genome.\ Nature Genet 38(1), 82-85 (2006).

    \

    \ Iafrate, J.A., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y., \ Scherer, S.W. and Lee, C. \ Detection of large-scale variation in the human genome.\ Nature Genet 36(9), 949-51 (2004).

    \

    \ McCarroll, S.A., Hadnott, T.N., Perry, G.H., Sabeti, P.C., \ Zody, M.C., Barrett, J.C., Dallaire, S., Gabriel, S., Lee, C., Daly, M.J., \ Altshuler, D.M.\ Common deletion polymorphisms in the human genome.\ Nature Genet 38(1), 86-92 (2006).

    \

    \ Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., \ Maner, S., Massa, H., Walker, M., Chi, M. et al.\ Large-scale copy number polymorphism in the human genome.\ Science 305(5683), 525-8 (2004).

    \

    \ Sharp, A.J., Locke, D.P., McGrath, S.D., Cheng, Z., Bailey, J.A., Samonte, R.V.,\ Pertz, L.M., Clark, R.A., Schwartz, S., Segraves, R. et al.\ Segmental duplications and copy number variation in the human \ genome. \ Am J Hum Genet 77(1), 78-88 (2005).

    \

    \ Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M., \ Haugen, E., Hayden, H., Albertson, D. Pinkel, D. et al.\ Fine-scale structural variation of the human genome. \ Nature Genet 37(7), 727-32 (2005).

    \ varRep 1 compositeTrack on\ group varRep\ longLabel Structural Variation\ priority 146\ shortLabel Structural Var\ track cnp\ type bed 4 +\ visibility hide\ rmsk RepeatMasker rmsk Repeating Elements by RepeatMasker 1 147 0 0 0 127 127 127 1 0 0

    Description

    \

    \ This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences \ for interspersed repeats and low complexity DNA sequences. The program\ outputs a detailed annotation of the repeats that are present in the \ query sequence (represented by this track), as well as a modified version\ of the query sequence in which all the annotated repeats have been masked\ (generally available on the\ Downloads page). RepeatMasker uses the\ RepBase library of repeats from the \ Genetic \ Information Research Institute (GIRI). \ RepBase is described in Jurka, J. (2000) in the References section below.

    \ \

    Display Conventions and Configuration

    \

    \ In full display mode, this track displays up to ten different classes of repeats:\

      \
    • Short interspersed nuclear elements (SINE), which include ALUs\
    • Long interspersed nuclear elements (LINE)\
    • Long terminal repeat elements (LTR), which include retroposons\
    • DNA repeat elements (DNA)\
    • Simple repeats (micro-satellites)\
    • Low complexity repeats\
    • Satellite repeats\
    • RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA)\
    • Other repeats, which includes class RC (Rolling Circle)\
    • Unknown\

    \

    \ The level of color shading in the graphical display reflects the amount of \ base mismatch, base deletion, and base insertion associated with a repeat \ element. The higher the combined number of these, the lighter the shading.

    \ \

    Methods

    \

    \ UCSC has used the most current versions of the RepeatMasker software \ and repeat libraries available to generate these data. Note that these \ versions may be newer than those that are publicly available on the Internet. \

    \

    \ Data are generated using the RepeatMasker -s flag. Additional flags\ may be used for certain organisms. Repeats are soft-masked. Alignments may \ extend through repeats, but are not permitted to initiate in them. \ See the \ FAQ for \ more information.

    \ \

    Credits

    \

    \ Thanks to Arian Smit, Robert Hubley and GIRI\ for providing the tools and repeat libraries used to generate this track.

    \ \

    References

    \

    \ Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0.\ http://www.repeatmasker.org. 1996-2010.\

    \

    \ RepBase is described in \ Jurka J. \ Repbase update: a database and an electronic journal of \ repetitive elements. \ Trends Genet. 2000 Sep;16(9):418-420.

    \

    \ For a discussion of repeats in mammalian genomes, see: \

    \ Smit AF. Interspersed repeats and other mementos of transposable \ elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):\ 657-63.

    \

    \ Smit AF. The origin of interspersed repeats in the human genome. \ Curr Opin Genet Dev. 1996 Dec;6(6):743-8.\

    \ varRep 0 canPack off\ group varRep\ longLabel Repeating Elements by RepeatMasker\ priority 147\ shortLabel RepeatMasker\ spectrum on\ track rmsk\ type rmsk\ visibility dense\ reconRepeat Recon Repeats bed 4 + Repeats Determined with Recon 0 147.5 0 0 0 127 127 127 0 0 0 varRep 1 group varRep\ longLabel Repeats Determined with Recon\ priority 147.5\ shortLabel Recon Repeats\ track reconRepeat\ type bed 4 +\ visibility hide\ vntr Microsatellites bed 4 + Perfect Microsatellites - VNTR 0 148 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This track contains all perfect 'microsatellite' repeats with between\ 2 and 10 bp repeat units and 10 or more perfect copies. Over 90% of\ the items will be multi-allelic polymorphisms. Click on an individual\ repeat element within the track for more information about that item.\

    \ \

    Methods

    \

    \ This track was created by using three programs: Tandyman, display_VNTR and \ Primeleftright.\

      \
    • \ Tandyman is a program for identifying perfectly identical tandem \ repeat sequences written by Robert Leach. It has been shown \ that the number of continous perfect repeats in a microsatellite is \ perhaps the primary factor in generating polymorphism at that locus.\
    • \
    • display_VNTR is a wrapper for tandyman which, among other things, \ creates a fasta delimited file suitable for automated primer design. \ This is available from \ \ Gerome Breen.\ \
    • \
    • Primeleftright is a simple program which uses a strict set of \ thermodynamic parameters to select primers giving the smallest possible \ PCR product. \ This is available from \ \ Leo Schalkwyk.\ \
    • \
    \

    \

    \ These programs were used (via linking Perl scripts to reformat output and \ input files) to find all perfect 'microsatellite' repeats with between 2 \ and 10 bp repeat units and 10 or more perfect copies.\

    \

    \ Particular features include:\ \

    The high probability (>90%) that elements of this track are polymorphic \ and may have multiple alleles.\

    \ \

    The exclusion of mono-nucleotide repeats. These are particularly\ common but very difficult to genotype.\

    \ \

    The presence of a "distance to next repeat" score in bp allowing users to \ filter overlapping repeats.\

    \

    The primer designs are a first pass design which will improve in future \ versions of this data. The main problem with the design is the tendency \ of primers to end up in repeat regions near the repeat of interest. \ Users may want to carry out their own QC on the quality of the designs \ and we expect a good proportion of the design to be usable.\

    \ \

    Credits

    \

    \ We'd like to thank Gerome Breen and Nik Ammar and the SGDP Centre at the \ \ Institute of Psychiatry\ for providing the data used to generate this track. \ If you wish to cite this data please cite
    \ Breen et al. \ "Distributions of Polymorphic Microsatellites in Mammalian and Other Genomes." \ (in preparation).\

    \ \ varRep 1 group varRep\ longLabel Perfect Microsatellites - VNTR\ priority 148\ shortLabel Microsatellites\ track vntr\ type bed 4 +\ visibility hide\ simpleRepeat Simple Repeats bed 4 + Simple Tandem Repeats by TRF 0 148 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This track displays simple tandem repeats (possibly imperfect repeats) located\ by Tandem Repeats\ Finder (TRF) which is specialized for this purpose. These repeats can\ occur within coding regions of genes and may be quite\ polymorphic. Repeat expansions are sometimes associated with specific\ diseases.

    \ \

    Methods

    \

    \ For more information about the TRF program, see Benson (1999).\

    \ \

    Credits

    \

    \ TRF was written by \ Gary Benson.

    \ \

    References

    \

    \ Benson G. \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.

    \ varRep 1 group varRep\ longLabel Simple Tandem Repeats by TRF\ priority 148\ shortLabel Simple Repeats\ track simpleRepeat\ type bed 4 +\ visibility hide\ olly25 Cross-hyb3 25 sample 0 5 0.199 Cross-hybridization Counts for Off-by-3 25-mers 0 148.5 0 0 0 127 127 127 0 0 0

    Description

    \

    This track shows the number of 25-mers in the genome that\ are the same as the 25-mer centered at the current position \ with up to three mismatches allowed. The current position is\ included. This track is empty over areas masked by RepeatMasker\ or trf at period 12 or less. It is best to design microarray\ probes and PCR primers where the count in this track is only\ one to avoid cross-hybridization.

    \ \

    For best results view this track with Interpolation set to Only Samples.

    \ \

    Credits

    \ This track was computed with the program 'olly' at the default settings.\ Olly was created by Jim Kent.\ x 0 group x\ longLabel Cross-hybridization Counts for Off-by-3 25-mers\ priority 148.5\ shortLabel Cross-hyb3 25\ track olly25\ type sample 0 5 0.199\ visibility hide\ olly2 Cross-hyb2 25 sample 0 5 0.199 Cross-hybridization for Off-by-2 25-mers 0 148.6 0 0 0 127 127 127 0 0 0

    Description

    \

    This track shows the number of 25-mers in the genome that\ are the same as the 25-mer centered at the current position \ with up to two mismatches allowed. The current position is\ included. This track is empty over areas masked by RepeatMasker\ or trf at period 12 or less. It is best to design microarray\ probes and PCR primers where the count in this track is only\ one to avoid cross-hybridization.

    \ \

    For best results view this track with Interpolation set to Only Samples.

    \ \

    Credits

    \ This track was computed with the program 'olly' at the default settings.\ Olly was created by Jim Kent.\ x 0 group x\ longLabel Cross-hybridization for Off-by-2 25-mers\ priority 148.6\ shortLabel Cross-hyb2 25\ track olly2\ type sample 0 5 0.199\ visibility hide\ gpcr Gpcr genePred Gpcr from Softberry and Rachel Karchin's HMM 0 149 0 0 0 127 127 127 0 0 0 x 1 group x\ longLabel Gpcr from Softberry and Rachel Karchin's HMM\ priority 149\ shortLabel Gpcr\ track gpcr\ type genePred\ visibility hide\ rmskRM327 RepMask 3.2.7 rmsk Repeating Elements by RepeatMasker version 3.2.7 0 149.105 0 0 0 127 127 127 1 0 0

    Description

    \

    \ This track was created by using a more recent version (3.2.7, Jan. 2009) \ of Arian Smit's RepeatMasker program, which screens DNA sequences \ for interspersed repeats and low complexity DNA sequences. The program\ outputs a detailed annotation of the repeats that are present in the \ query sequence, as well as a modified version of the query sequence \ in which all the annotated repeats have been masked. RepeatMasker uses \ the RepBase library of repeats from the \ Genetic \ Information Research Institute (GIRI). \ RepBase is described in Jurka, J. (2000) in the References section below.

    \

    \ Results from the original RepeatMasker run have been kept in the\ RepeatMasker track in order to avoid disrupting any analyses performed\ on the original run's results.

    \ \

    Display Conventions and Configuration

    \

    \ In full display mode, this track displays up to ten different classes of repeats:\

      \
    • Short interspersed nuclear elements (SINE), which include ALUs\
    • Long interspersed nuclear elements (LINE)\
    • Long terminal repeat elements (LTR), which include retroposons\
    • DNA repeat elements (DNA)\
    • Simple repeats (micro-satellites)\
    • Low complexity repeats\
    • Satellite repeats\
    • RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA)\
    • Other repeats, which includes class RC (Rolling Circle)\
    • Unknown\

    \

    \ The level of color shading in the graphical display reflects the amount of \ base mismatch, base deletion, and base insertion associated with a repeat \ element. The higher the combined number of these, the lighter the shading.

    \ \

    Methods

    \

    \ UCSC has used the most current versions of the RepeatMasker software \ and repeat libraries available to generate these data. Note that these \ versions may be newer than those that are publicly available on the Internet. \

    \

    \ Data are generated using the RepeatMasker -s flag. Additional flags\ may be used for certain organisms. Repeats are soft-masked. Alignments may \ extend through repeats, but are not permitted to initiate in them. \ See the \ FAQ for \ more information.

    \ \

    Credits

    \

    \ Thanks to Arian Smit and GIRI\ for providing the tools and repeat libraries used to generate this track.

    \ \

    References

    \

    \ Smit, AFA, Hubley, R and Green, P. RepeatMasker Open-3.0.\ http://www.repeatmasker.org. 1996-2007.\

    \

    \ RepBase is described in \ Jurka J. \ Repbase update: a database and an electronic journal of \ repetitive elements. \ Trends Genet. 2000 Sep;16(9):418-420.

    \

    \ For a discussion of repeats in mammalian genomes, see: \

    \ Smit AF. Interspersed repeats and other mementos of transposable \ elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):\ 657-63.

    \

    \ Smit AF. The origin of interspersed repeats in the human genome. \ Curr Opin Genet Dev. 1996 Dec;6(6):743-8.\

    \ varRep 0 canPack off\ group varRep\ longLabel Repeating Elements by RepeatMasker version 3.2.7\ priority 149.105\ shortLabel RepMask 3.2.7\ spectrum on\ track rmskRM327\ type rmsk\ visibility hide\ nestedRepeats Interrupted Rpts bed 12 + Fragments of Interrupted Repeats Joined by RepeatMasker ID 0 149.11 0 0 0 127 127 127 1 0 0

    Description

    \

    \ This track shows joined fragments of interrupted repeats \ extracted from the output of the \ RepeatMasker program which screens DNA sequences \ for interspersed repeats and low complexity DNA sequences using \ the RepBase library of repeats from the \ Genetic \ Information Research Institute (GIRI). \ RepBase is described in Jurka, J. (2000) in the References section below.\

    \

    \ The detailed annotations from RepeatMasker are in the RepeatMasker\ track. This track shows fragments of original repeat insertions \ which have been\ interrupted by insertions of younger repeats or through local\ rearrangements. The fragments are joined using the ID column of\ RepeatMasker output.\

    \ \

    Display Conventions and Configuration

    \

    \ In pack or full mode, each interrupted repeat is displayed as boxes\ (fragments) joined by horizontal lines, labeled with the repeat name.\ If all fragments are on the same strand, arrows are added to the\ horizontal line to indicate the strand. In dense or squish mode, labels \ and arrows are omitted and in dense mode, all items are collapsed to \ fit on a single row.\

    \

    \ Items are shaded according to the average identity score of their\ fragments. Usually, the shade of an item is similar to the shades of\ its fragments unless some fragments are much more diverged than\ others. The score displayed above is the average identity score,\ clipped to a range of 50% - 100% and then mapped to the range\ 0 - 1000 for shading in the browser.\

    \ \

    Methods

    \

    \ UCSC has used the most current versions of the RepeatMasker software \ and repeat libraries available to generate these data. Note that these \ versions may be newer than those that are publicly available on the Internet. \

    \

    \ Data are generated using the RepeatMasker -s flag. Additional flags\ may be used for certain organisms. See the \ FAQ for \ more information.

    \ \

    Credits

    \

    \ Thanks to Arian Smit, Robert Hubley and GIRI\ for providing the tools and repeat libraries used to generate this track.

    \ \

    References

    \

    \ Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0.\ http://www.repeatmasker.org. 1996-2010.\

    \

    \ RepBase is described in \ Jurka J. \ Repbase update: a database and an electronic journal of \ repetitive elements. \ Trends Genet. 2000 Sep;16(9):418-420.

    \

    \ For a discussion of repeats in mammalian genomes, see: \

    \ Smit AF. Interspersed repeats and other mementos of transposable \ elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):\ 657-63.

    \

    \ Smit AF. The origin of interspersed repeats in the human genome. \ Curr Opin Genet Dev. 1996 Dec;6(6):743-8.\

    \ varRep 1 group varRep\ longLabel Fragments of Interrupted Repeats Joined by RepeatMasker ID\ priority 149.11\ shortLabel Interrupted Rpts\ track nestedRepeats\ type bed 12 +\ useScore 1\ visibility hide\ nestedRepeatsRM327 Intr Rpts 3.2.7 bed 12 + Fragments of Interrupted Repeats Joined by RepeatMasker ID (RM version 3.2.7) 0 149.115 0 0 0 127 127 127 1 0 0

    Description

    \

    \ This track shows joined fragments of interrupted repeats extracted from \ the output of a more recent version (3.2.7, Jan. 2009) of the\ RepeatMasker program, which screens DNA sequences \ for interspersed repeats and low complexity DNA sequences using \ the RepBase library of repeats from the \ Genetic \ Information Research Institute (GIRI). \ RepBase is described in Jurka, J. (2000) in the References section below.\

    \

    \ The detailed annotations from RepeatMasker are in the RepMask 3.2.7\ track. This track shows fragments of original repeat insertions \ which have been\ interrupted by insertions of younger repeats or through local\ rearrangements. The fragments are joined using the ID column of\ RepeatMasker output.\

    \

    \ Interrupted repeats from the original RepeatMasker run have been kept in the\ Interrupted Rpts track in order to avoid disrupting any analyses performed\ on the original run's results.

    \ \

    Display Conventions and Configuration

    \

    \ In pack or full mode, each interrupted repeat is displayed as boxes\ (fragments) joined by horizontal lines, labeled with the repeat name.\ If all fragments are on the same strand, then arrows are added to the\ horizontal line to indicate strand. In dense or squish mode, labels \ and arrows are omitted, and in dense mode, all items are collapsed to \ fit on a single row.\

    \

    \ Items are shaded according to the average identity score of their\ fragments. Usually, the shade of an item is similar to the shades of\ its fragments, unless some fragments are much more diverged than\ others. The score displayed above is the average identity score,\ clipped to a range of 50% - 100%, and then mapped to the range\ 0 - 1000 for shading in the browser.\

    \ \

    Methods

    \

    \ UCSC has used the most current versions of the RepeatMasker software \ and repeat libraries available to generate these data. Note that these \ versions may be newer than those that are publicly available on the Internet. \

    \

    \ Data are generated using the RepeatMasker -s flag. Additional flags\ may be used for certain organisms. See the \ FAQ for \ more information.

    \ \

    Credits

    \

    \ Thanks to Arian Smit, Robert Hubley and GIRI\ for providing the tools and repeat libraries used to generate this track.

    \ \

    References

    \

    \ Smit, AFA, Hubley, R and Green, P. RepeatMasker Open-3.0.\ http://www.repeatmasker.org. 1996-2007.\

    \

    \ RepBase is described in \ Jurka J. \ Repbase update: a database and an electronic journal of \ repetitive elements. \ Trends Genet. 2000 Sep;16(9):418-420.

    \

    \ For a discussion of repeats in mammalian genomes, see: \

    \ Smit AF. Interspersed repeats and other mementos of transposable \ elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):\ 657-63.

    \

    \ Smit AF. The origin of interspersed repeats in the human genome. \ Curr Opin Genet Dev. 1996 Dec;6(6):743-8.\

    \ varRep 1 group varRep\ longLabel Fragments of Interrupted Repeats Joined by RepeatMasker ID (RM version 3.2.7)\ priority 149.115\ shortLabel Intr Rpts 3.2.7\ track nestedRepeatsRM327\ type bed 12 +\ useScore 1\ visibility hide\ rmskCensor CENSOR Repeats rmsk Repeating Elements by CENSOR and RepBase 11.6 (Giri Ihstitute) 0 149.2 0 0 0 127 127 127 1 0 0 varRep 0 canPack off\ group varRep\ longLabel Repeating Elements by CENSOR and RepBase 11.6 (Giri Ihstitute)\ priority 149.2\ shortLabel CENSOR Repeats\ spectrum on\ track rmskCensor\ type rmsk\ visibility hide\ windowmasker WindowMasker bed 3 Genomic Intervals Masked by WindowMasker 0 149.25 0 0 0 127 127 127 0 0 0 varRep 1 group varRep\ longLabel Genomic Intervals Masked by WindowMasker\ priority 149.25\ shortLabel WindowMasker\ track windowmasker\ type bed 3\ visibility hide\ windowmaskerSdust WM + SDust bed 3 Genomic Intervals Masked by WindowMasker + SDust 0 149.26 0 0 0 127 127 127 0 0 0

    Description

    \ This track depicts masked sequence as determined by WindowMasker. The \ WindowMasker tool is included in the NCBI C++ toolkit. The source code \ for the entire toolkit is available \ here.\ \

    Methods

    \ To create this track, WindowMasker was run with the following parameters:\
    \
    windowmasker -mk_counts true -input hg16.fa -output wm_counts\
    windowmasker -ustat wm_counts -sdust true -input hg16.fa -output repeats.bed\
    
    \ The repeats.bed (BED3) file was loaded into the "windowmaskerSdust" table for\ this track.\ \

    References

    \

    \ Morgulis A, Gertz EM, Schäffer AA, Agarwala R.\ WindowMasker: window-based masker for sequenced genomes. \ Bioinformatics. 2006 Jan 15;22(2):134-41.

    \ varRep 1 group varRep\ longLabel Genomic Intervals Masked by WindowMasker + SDust\ priority 149.26\ shortLabel WM + SDust\ track windowmaskerSdust\ type bed 3\ visibility hide\ microsat Microsatellite bed 4 Microsatellites - Di-nucleotide and Tri-nucleotide Repeats 0 149.4 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This track displays regions that are likely to be useful as microsatellite\ markers. These are sequences of at least 15 perfect di-nucleotide and \ tri-nucleotide repeats and tend to be highly polymorphic in the\ population.\

    \ \

    Methods

    \

    \ The data shown in this track are a subset of the Simple Repeats track, \ selecting only those \ repeats of period 2 and 3, with 100% identity and no indels and with\ at least 15 copies of the repeat. The Simple Repeats track is\ created using the \ Tandem Repeats Finder. For more information about this \ program, see Benson (1999).

    \ \

    Credits

    \

    \ Tandem Repeats Finder was written by \ Gary Benson.

    \ \

    References

    \

    \ Benson G. \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.

    \ varRep 1 group varRep\ longLabel Microsatellites - Di-nucleotide and Tri-nucleotide Repeats\ priority 149.4\ shortLabel Microsatellite\ track microsat\ type bed 4\ visibility hide\ blatFugu Fugu Blat psl xeno Takifugu rubripes Translated Blat Alignments 0 150 0 60 120 200 220 255 1 0 0

    Description

    \

    \ The Fugu v.3.0 whole genome shotgun assembly was provided by the\ US DOE Joint \ Genome Institute (JGI). The assembly was constructed with the JGI\ assembler, JAZZ, from paired end sequencing reads produced at JGI, Myriad \ Genetics, and Celera Genomics, resulting in a sequence coverage of 5.7X. All \ reads are plasmid, cosmid, or BAC end-sequences, with the predominant coverage\ derived from 2 Kb insert plasmids. This assembly contains 20,379\ scaffolds totaling 319 million base pairs. The largest 679 scaffolds\ total 160 million base pairs.

    \

    \ The strand information (+/-) for this track is in two parts. The\ first + or - indicates the orientation of the query sequence whose\ translated protein produced the match. The second + or - indicates the\ orientation of the matching translated genomic sequence. Because the two\ orientations of a DNA sequence give different predicted protein sequences,\ there are four combinations. ++ is not the same as --; nor is +- the same\ as -+.

    \ \

    Methods

    \

    \ The alignments were made with blat in translated protein mode requiring two\ nearby 4-mer matches to trigger a detailed alignment. The human\ genome was masked with RepeatMasker and Tandem Repeat Finder before \ running blat.

    \ \

    Credits

    \

    \ The 3.0 draft from the\ \ JGI Fugu rubripes website was used in the\ UCSC Genome Browser Fugu blat alignments. These data were freely provided \ by the JGI for use in this publication only.

    \ \

    References

    \

    \ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

    \ \ compGeno 1 altColor 200,220,255\ color 0,60,120\ group compGeno\ longLabel Takifugu rubripes Translated Blat Alignments\ priority 150\ shortLabel Fugu Blat\ spectrum on\ track blatFugu\ type psl xeno\ visibility hide\ blatFr1 Fugu Blat psl xeno Takifugu rubripes (Aug. 2002/fr1) Translated Blat Alignments 1 150 0 60 120 200 220 255 1 0 0

    Description

    \

    \ This track shows blat translated protein alignments of the Fugu \ (Aug. 2002 (JGI 3.0/fr1)) genome assembly to the human genome. The \ v3.0 Fugu whole genome shotgun assembly was provided by the\ US \ DOE Joint Genome Institute (JGI). \

    \

    \ The strand information (+/-) for this track is in two parts. The\ first + or - indicates the orientation of the query sequence whose\ translated protein produced the match. The second + or - indicates the\ orientation of the matching translated genomic sequence. Because the two\ orientations of a DNA sequence give different predicted protein sequences,\ there are four combinations. ++ is not the same as --; nor is +- the same\ as -+.

    \ \

    Methods

    \

    \ The alignments were made with blat in translated protein mode requiring two \ nearby 4-mer matches to trigger a detailed alignment. The human\ genome was masked with RepeatMasker and Tandem Repeat Finder before \ running blat.

    \ \

    Credits

    \

    \ The \ \ 3.0 draft from JGI was used in the\ UCSC Fugu blat alignments. These data were provided freely by the JGI\ for use in this publication only.

    \ \

    References

    \

    \ Kent, W.J.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 12(4), 656-664 (2002).

    \ \ compGeno 1 altColor 200,220,255\ color 0,60,120\ group compGeno\ longLabel Takifugu rubripes (Aug. 2002/fr1) Translated Blat Alignments\ otherDb fr1\ priority 150\ shortLabel Fugu Blat\ spectrum on\ track blatFr1\ type psl xeno\ visibility dense\ gpcrKnown Gpcr Known genePred Gpcr from gpcrdb and genewise 0 150 0 0 0 127 127 127 0 0 0 x 1 group x\ longLabel Gpcr from gpcrdb and genewise\ priority 150\ shortLabel Gpcr Known\ track gpcrKnown\ type genePred\ visibility hide\ gpcrTwinscan Gpcr Twinscan genePred Gpcr predictions based on Twinscan and Rachel Karchin's HMM 0 150 0 0 0 127 127 127 0 0 0 x 1 group x\ longLabel Gpcr predictions based on Twinscan and Rachel Karchin's HMM\ priority 150\ shortLabel Gpcr Twinscan\ track gpcrTwinscan\ type genePred\ visibility hide\ gpcrUcsc Gpcr UCSC genePred Gpcr Predictions Based on Synteny 0 150 0 0 0 127 127 127 0 0 0 x 1 group x\ longLabel Gpcr Predictions Based on Synteny\ priority 150\ shortLabel Gpcr UCSC\ track gpcrUcsc\ type genePred\ visibility hide\ rgdQtl RGD Human QTL bed 4 . Human Quantitative Trait Locus from RGD 0 150 12 12 120 133 133 187 0 0 0 http://rgd.mcw.edu/objectSearch/qtlReport.jsp?rgd_id=

    Description

    \

    \ A quantitative trait locus (QTL) is a polymorphic locus that contains alleles\ which differentially affect the expression of a continuously distributed \ phenotypic trait. Usually a QTL is a marker described by statistical \ association to quantitative variation in the particular phenotypic trait that\ is thought to be controlled by the cumulative action of alleles at multiple \ loci.

    \ \

    Credits

    \

    \ Thanks to the RGD for \ providing this annotation. RGD is funded by grant HL64541 entitled "Rat \ Genome Database", awarded to Dr. Howard J Jacob, Medical College of \ Wisconsin, from the National Heart Lung and Blood Institute \ (NHLBI) of the National \ Institutes of Health (NIH).\

    \ \

    References

    \

    \ Rapp, J.P. \ Genetic Analysis of Inherited Hypertension in the Rat.\ Physiol. Rev. 2000 Jan;90(1):135-172.

    \ phenDis 1 color 12,12,120\ group phenDis\ longLabel $Organism Quantitative Trait Locus from RGD\ priority 150\ shortLabel RGD Human QTL\ track rgdQtl\ type bed 4 .\ url http://rgd.mcw.edu/objectSearch/qtlReport.jsp?rgd_id=\ visibility hide\ rgdRatQtl RGD Rat QTL bed 4 . Rat Quantitative Trait Locus from RGD Coarsely Mapped to Human 0 150.001 12 100 100 133 177 177 0 0 0 http://rgd.mcw.edu/objectSearch/qtlReport.jsp?rgd_id=

    Description

    \

    \ This track shows Rat quantitative trait loci (QTLs) from the \ Rat Genome Database (RGD) \ that have been coarsely mapped by UCSC to the Human genome using \ stringently filtered cross-species alignments. \ A quantitative trait locus (QTL) is a polymorphic locus that contains alleles\ which differentially affect the expression of a continuously distributed \ phenotypic trait. Usually a QTL is a marker described by statistical \ association to quantitative variation in the particular phenotypic trait that\ is thought to be controlled by the cumulative action of alleles at multiple \ loci.

    \

    \ For a comprehensive review of QTL mapping techniques in the rat, see Rapp, \ J.P. (2000) in the References section below.

    \

    \ To map the Rat QTLs to Human, UCSC's chained and netted blastz\ alignments of Rat to Human were filtered to retain only those with\ high chain scores (>=500,000). This removed many valid-but-short\ alignments and in general retained only very long chains (>10,000,\ usually >100,000 bp), so that only large regions could be mapped. This\ choice was made because QTLs in general are extremely large and\ approximate regions. After the alignment filtering, UCSC's liftOver\ program was used to map Rat regions to Human via the filtered\ alignments.

    \

    \ To get a sense of how many genomic rearrangments between Rat and\ Human are in the region of a particular Rat QTL, you may want to\ view the Human Nets track in the Rat Nov. 2004 (Baylor 3.4/rn4) genome browser. \ In the position/search box, enter the name of the Rat QTL of interest.\

    \ \

    Credits

    \

    \ Thanks to the RGD for \ providing the Rat QTLs. RGD is funded by grant HL64541 entitled "Rat \ Genome Database", awarded to Dr. Howard J Jacob, Medical College of \ Wisconsin, from the National Heart Lung and Blood Institute \ (NHLBI) of the National \ Institutes of Health (NIH).\

    \ \

    References

    \

    \ Rapp JP.\ Genetic analysis of inherited hypertension in the rat.\ Physiol Rev. 2000 Jan;80(1):135-72.

    \ phenDis 1 color 12,100,100\ group phenDis\ longLabel $o_Organism Quantitative Trait Locus from RGD Coarsely Mapped to $Organism\ otherDb rn4\ priority 150.001\ shortLabel RGD Rat QTL\ track rgdRatQtl\ type bed 4 .\ url http://rgd.mcw.edu/objectSearch/qtlReport.jsp?rgd_id=\ visibility hide\ netSyntenyFr1 Fugu Synteny netAlign fr1 chainFr1 Fugu (Aug. 2002 (JGI 3.0/fr1)) Synteny Using Chained/Netted Blastz 0 150.3 0 100 0 255 240 200 0 0 0 compGeno 0 altColor 255,240,200\ color 0,100,0\ group compGeno\ longLabel $o_Organism ($o_date) Synteny Using Chained/Netted Blastz\ otherDb fr1\ priority 150.3\ shortLabel Fugu Synteny\ track netSyntenyFr1\ type netAlign fr1 chainFr1\ visibility hide\ fuguPseudo Fugu Pseudo bed 12 . Takifugu rubripes (Aug. 2002/fr1) Translated Blat Alignments that overlap Processed Pseudogenes 0 151 0 60 120 200 220 255 1 0 0 compGeno 1 altColor 200,220,255\ color 0,60,120\ group compGeno\ longLabel Takifugu rubripes (Aug. 2002/fr1) Translated Blat Alignments that overlap Processed Pseudogenes\ priority 151\ shortLabel Fugu Pseudo\ spectrum on\ track fuguPseudo\ type bed 12 .\ visibility hide\ loweProbes Lowe's Probes bed 6 . Candidate Oligos for Stanford Microarray 0 151 0 0 0 127 127 127 0 0 1 chr22,

    Candidate Oligos for every Stanford Oligo Chip track

    \

    \ Oligos were chosen for every Sanger22 annotation on chr22 as\ well as about 2000 other genes. Two oligos were chosen with\ a 3' bias, two with a 5' bias, and two with no bias. For this\ purpose exons are defined to include 3' and 5' UTRs.

    \ \

    The strategy

    \

    \ These oligo selections are based on the following ideas:\

      \
    • Oligos should have minimum secondary structure as\ they must be available for hybridization.
    • \
    • Oligos should be unique in genome if possible. No\ repeats, should not Blat or Blast other places in genome.
    • \
    • If using oligo-dT for RT-Priming oligos should be in 3' end\ of gene transcript (including UTR).
    • \
    • Oligos should have a uniform hybridization temperature if\ possible. All oligos must be hybridized at same temperature,\ want to minimize cross hybe yet maximize signal.\

    \

    \ Currently we don't have data to identify which parameters\ are more important than others. Also, some of these scores\ are overlapping (i.e. if tm is limited then high secondary\ structure is less likely). See below for histograms of these\ criteria.

    \ \
    \ \

    The Details:

    \ \

    The Algorithm

    \

    \

      \
    • Step through each exon at a step size proportional to the\ size of the exon examining possible oligos, excluding areas that\ are RepeatMasked.
    • \
    • Score each oligos for: Tm difference, distance from 3' end,\ secondary structure, and an Affymetrix heuristic.
    • \
    • Look through candidate probes remembering the maximum\ score for each score.
    • \
    • Each score is then normalized by dividing by the maximum\ and then the normalized scores are combined as an average and oligos\ are sorted to find the best overall score.
    • \
    • Oligos with the best combined normalized scores are blatted\ until one is found that has a blat score below a given \ threshold.
    • \
    • As oligos are chosen, candidate oligos that overlap those\ already chosen are discarded.
    • \
    • If no scores pass the blat score or not enough oligos have been\ chosen just pick oligos that have the best combined score.
    • \
    \

    \

    About the scores:

    \

    \

      \
    • Tm: Formulas for calculating Tm taken from: "A unified\ view of polymer, dumbbell, and oligonucleotide DNA\ nearest-neighbor thermodynamics" John SantaLucia, Jr. PNAS, Vol\ 95, pp 1460-1465 February 1998.
    • A web version called \ Hyther exists.\
    • Secondary Structure: Calculates the Gibbs free energy of the\ best secondary structure using libraries from the RNAstructure program.
    • \
    • Affy Heuristic: 1 if oligo passes heuristics derived from that published by Affymetrix \ "Nature Biotechnology" vol. 14, Dec, '96) are satisfied, 0 otherwise. The heuristic\ is as follows:\
      \
         no more than 9 A's in window of 20 \
         no more than 9 T's in window of 20\
         no more than 8 C's in window of 20\
         no more than 8 G's in window of 20\
        \
         no more than 6 A's in window of  8\
         no more than 6 T's in window of  8\
         no more than 5 C's in window of  8 \
         no more than 5 G's in window of  8\
      
      \
    • \
    • 3' Dist: Distance from end of oligo to 3' end of target\ sequence.
    • \
    • Blat Score: Blat score of second most homologous region\ in the genome. If no inserts this is approximately the number of\ base pairs that match.
    • \
    \

    \ \

    Histograms of Scores

    \

    \ Histograms are from the Stanford picked gene set.\ \ \ \ \ \ \ \ \ \

    \ Secondary structure measured in Gibb's Free energy, higher scores are better.

    \ Blat (similar to blast) histogram, lower scores are better.

    \ Melting temperatures, scores over 100C do happen in algorithm.

    \ Percentage GC, not used in algorithm but presented anyway.
    \

    \

    Please note that all coordinates are relative to the '+' strand\ while all oligo sequences are 5'->3'. This means that all sequences\ displayed are part of the sense strand. So if the oligo is represented\ in the database as being on the '-' strand and starts at 1 and ends at\ 5 of 'atgcatgc' the '+' sequence of the probe would be 'tgcat' but\ that is 3'->5' on the '-' strand so the sequence in the sequence would\ be the reverse complement 'atgct'.

    \ x 1 chromosomes chr22,\ group x\ longLabel Candidate Oligos for Stanford Microarray\ priority 151\ shortLabel Lowe's Probes\ track loweProbes\ type bed 6 .\ visibility hide\ jaxQtlMapped MGI Mouse QTL bed 4 . MGI Mouse Quantitative Trait Loci Coarsely Mapped to Human 0 151 0 0 0 127 127 127 0 0 0 http://www.informatics.jax.org/searches/accession_report.cgi?id=$$

    Description

    \

    \ This track shows Mouse quantitative trait loci (QTLs) from \ Mouse Genome Informatics (MGI) at the \ Jackson Laboratory \ that have been coarsely mapped by UCSC to the Human genome using \ stringently filtered cross-species alignments. \ A quantitative trait locus (QTL) is a polymorphic locus that contains alleles\ which differentially affect the expression of a continuously distributed \ phenotypic trait. Usually a QTL is a marker described by statistical \ association to quantitative variation in the particular phenotypic trait that\ is thought to be controlled by the cumulative action of alleles at multiple \ loci.

    \

    \ To map the Mouse QTLs to Human, UCSC's chained and netted blastz\ alignments of Mouse to Human were filtered to retain only those with\ minimum length of 20,000 bases in both Mouse and Human, and minimum \ score of 10,000. This removed many valid-but-short alignments. This\ choice was made because QTLs in general are extremely large and\ approximate regions. After the alignment filtering, UCSC's liftOver\ program was used to map Mouse regions to Human via the filtered\ alignments.

    \

    \ For the purpose of cross-species mapping, MGI QTLs were divided into\ two categories: QTLs whose genomic coordinates span the entire\ confidence interval (often several million bases), and QTLs for which\ only the STS marker with the peak score was given, resulting in\ genomic coordinates for very small regions (most less than 300 bases).\ QTLs in the latter set were so small as to make mapping impossible in many \ cases, so their coordinates were padded by 50,000 bases before and \ after, for a total size of approximately 100,000 bases, a \ conservative proxy for the unknown confidence interval. The two \ categories of QTL are displayed in subtracks: MGI Mouse QTL for the \ unmodified QTLs and MGI Mouse QTL Padded for the single-marker QTLs \ that were padded to 100,000 bases.\

    \

    \ To get a sense of how many genomic rearrangments between Mouse and\ Human are in the region of a particular Mouse QTL, you may want to\ view the Human Nets track in the Mouse Feb. 2006 (NCBI36/mm8) genome browser. \ In the position/search box, enter the name of the Mouse QTL of interest.\

    \ \

    Credits

    \

    \ Thanks to \ MGI \ at the Jackson Laboratory, \ and Bob Sinclair in particular, for providing these data.

    \

    \ phenDis 1 compositeTrack on\ group phenDis\ longLabel MGI Mouse Quantitative Trait Loci Coarsely Mapped to $Organism\ otherDb mm8\ otherDbTable jaxQtl\ priority 151\ shortLabel MGI Mouse QTL\ track jaxQtlMapped\ type bed 4 .\ url http://www.informatics.jax.org/searches/accession_report.cgi?id=$$\ urlLabel MGI QTL Detail:\ visibility hide\ ecoresTetNig1 Tetraodon Ecores genePred Human(hg16)/Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) Evolutionary Conserved Regions 0 151.1 0 60 120 127 157 187 0 0 0

    Description

    \ This track shows Evolutionary Conserved Regions computed by the \ \ Exofish program at Genoscope.\ Each singleton block corresponds to an "ecore"; two blocks connected by a thin line \ correspond to an "ecotig", a set of colinear ecores in a syntenic region. \ \

    Methods

    \ Genome-wide sequence comparisons were done at the protein-coding level between the genome sequences of human, Homo sapiens, and Tetraodon (green spotted pufferfish), Tetraodon nigroviridis, to detect evolutionarily conserved \ regions (ECORES).\ \

    Credits

    \ \ Thanks to Olivier Jaillon at Genoscope for contributing the data. \ \ compGeno 1 autoTranslate 0\ color 0,60,120\ group compGeno\ longLabel Human($db)/$o_Organism ($o_date) Evolutionary Conserved Regions\ otherDb tetNig1\ priority 151.1\ shortLabel Tetraodon Ecores\ track ecoresTetNig1\ type genePred\ visibility hide\ exoMouse Exonerate Mouse bed 6 + Mouse/Human Evolutionarily Conserved Regions (Exonerate) 0 152 100 50 0 255 240 200 1 0 0

    The Exonerate mouse shows regions of homology with the\ mouse based on Exonerate alignments of mouse random reads\ with the human genome. The data for this track were kindly provided by\ Guy Slater, Michele Clamp, and Ewan Birney at\ Ensembl.

    \ x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel Mouse/Human Evolutionarily Conserved Regions (Exonerate)\ priority 152\ shortLabel Exonerate Mouse\ spectrum on\ track exoMouse\ type bed 6 +\ visibility hide\ ecoresFr1 Fugu Ecores bed 12 . Human/Fugu (Aug. 2002/fr1) Evolutionary Conserved Regions 0 152.1 0 60 120 200 220 255 0 0 0

    Description

    \ This track shows Evolutionary Conserved Regions computed by the \ \ Exofish program at Genoscope.\ Each singleton block corresponds to an "ecore"; two blocks connected by a thin line \ correspond to an "ecotig", a set of colinear ecores in a syntenic region. \ \

    Methods

    \ Genome-wide sequence comparisons were done at the protein-coding level between the genome sequences\ of human, Homo sapiens, and Fugu (Japanese pufferfish), Takifugu \ rubripes, to detect evolutionarily conserved regions (ECORES).\ The sequence versions used in the comparison were \ Human (July 2003) and Fugu (August 2002). \ \ \

    Credits

    \ \ Thanks to Olivier Jaillon at Genoscope for contributing the data. \ \ compGeno 1 altColor 200,220,255\ color 0,60,120\ group compGeno\ longLabel Human/Fugu (Aug. 2002/fr1) Evolutionary Conserved Regions\ priority 152.1\ shortLabel Fugu Ecores\ track ecoresFr1\ type bed 12 .\ visibility hide\ ecoresTetraodon Tetraodon Ecores bed 12 . Human/Tetraodon Evolutionary Conserved Regions 0 152.2 0 60 120 200 220 255 0 0 0

    Description

    \ This track shows Evolutionary Conserved Regions computed by the \ \ Exofish program at Genoscope.\ Each singleton block corresponds to an "ecore"; two blocks connected by a thin line \ correspond to an "ecotig", a set of colinear ecores in a syntenic region. \ \

    Methods

    \ Genome-wide sequence comparisons were done at the protein-coding level between the genome sequences\ of human, Homo sapiens, and Tetraodon, Tetraodon nigroviridis, to detect evolutionarily conserved regions (ECORES).\ The sequence versions used in the comparison were \ Human (July 2003) and Tetraodon (March 2004). \ \ \

    Credits

    \ \ Thanks to Olivier Jaillon at Genoscope for contributing the data. \ \ compGeno 1 altColor 200,220,255\ color 0,60,120\ group compGeno\ longLabel Human/Tetraodon Evolutionary Conserved Regions\ priority 152.2\ shortLabel Tetraodon Ecores\ track ecoresTetraodon\ type bed 12 .\ visibility hide\ geneReviews GeneReviews bed 4 GeneReviews 0 155 0 80 0 127 167 127 0 0 0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

    Description

    \

    GeneReviews\ is an online collection of expert-authored, peer-reviewed articles\ that describe specific gene-related diseases. GeneReciew articles are\ searchable by disease name, gene symbol, protein name, author, or\ title GeneReviews is supported by the National Institutes of Health,\ hosted at NCBI as part of the GeneTests.\ The GeneReviews data underlying this track will be updated\ frequently. \

    \

    The GeneReviews track allows the user to locate the NCBI\ GeneReviews resource quickly from the Genome browser. Short name of\ the GeneReview article and its related diseases name are displayed on\ the item details page. Also displayed are link to GeneReviews article\ and GeneTests search result of the related disease name. Similar\ information, when available, are provided in the detail page of items\ from UCSC Genes, RefSeq Genes, and Omim Genes tracks. \

    \

    \

    \

    Display Conventions

    \

    The GeneReviews track shows all available GeneReview articles in\ the selected genomic region. RefSeq gene symbols related to the\ articles are used as name of track items. Since item are all related\ to diseases, items are display as dark green in the browser window. \

    \



    \

    \

    References

    \

    Roberta A Pagon, Editor-in-chief, Thomas D Bird, Cynthia R Dolan,\ and Karen Stephens. GeneReviews.\ University of Washington, Seattle; 1993-. \

    \ \ phenDis 1 color 0, 80, 0\ group phenDis\ html geneReviews\ longLabel GeneReviews\ priority 155\ shortLabel GeneReviews\ track geneReviews\ type bed 4\ visibility hide\ mouseOrtho Mouse Ortholog bed 5 + Mouse Orthology Using Fgenesh++ Gene Predictions 0 157 0 100 0 255 240 200 0 0 0 x 1 altColor 255,240,200\ color 0,100,0\ group x\ longLabel Mouse Orthology Using Fgenesh++ Gene Predictions\ priority 157\ shortLabel Mouse Ortholog\ track mouseOrtho\ type bed 5 +\ visibility hide\ tblastFr1 tblastFr1 psl xeno Fugu (Aug. 2003/fr1) Best tblastn hit/hg16 knownGene Exon 0 158 0 0 0 127 127 127 1 0 0 x 1 group x\ longLabel Fugu (Aug. 2003/fr1) Best tblastn hit/hg16 knownGene Exon\ priority 158\ shortLabel tblastFr1\ spectrum on\ track tblastFr1\ type psl xeno\ visibility hide\ tblastGalGal2 tblastGalGal2 psl xeno galGal1 (galGal1) tblastn Hit/hg16 knownGene Exon 0 158 0 0 0 127 127 127 1 0 0 x 1 group x\ longLabel $o_Organism ($o_date) tblastn Hit/hg16 knownGene Exon\ otherDb galGal1\ priority 158\ shortLabel tblastGalGal2\ spectrum on\ track tblastGalGal2\ type psl xeno\ visibility hide\ mouseOrthoSeed Tight Ortholog bed 5 + Tight Mouse Orthology Using Fgenesh++ Gene Predictions (only reciprocal best) 0 158 0 100 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 0,100,0\ group x\ longLabel Tight Mouse Orthology Using Fgenesh++ Gene Predictions (only reciprocal best)\ priority 158\ shortLabel Tight Ortholog\ spectrum on\ track mouseOrthoSeed\ type bed 5 +\ visibility hide\ chainNetDanRer1 Zebrafish Chain/Net bed 3 Zebrafish (Nov. 2003 (Zv3/danRer1)), Chain and Net Alignments 0 159 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of zebrafish (Nov. 2003 (Zv3/danRer1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ zebrafish and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ zebrafish assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best zebrafish/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The zebrafish sequence used in this annotation is from\ the Nov. 2003 (Zv3/danRer1) (danRer1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the zebrafish/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single zebrafish chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-90-25-100
    C-90100-100-25
    G-25-100100-90
    T-100-25-9091

    \ \ \ Chains scoring below a minimum score of '5000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap loose\ chainMinScore 5000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb danRer1\ priority 159\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetDanRer1\ type bed 3\ visibility hide\ chainNetXenTro3 xenTro3 Chain/Net bed 3 xenTro3 (xenTro3), Chain and Net Alignments 0 159 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of xenTro3 (xenTro3) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ xenTro3 and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ xenTro3 assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best xenTro3/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The xenTro3 sequence used in this annotation is from\ the xenTro3 (xenTro3) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the xenTro3/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single xenTro3 chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-90-25-100
    C-90100-100-25
    G-25-100100-90
    T-100-25-9091

    \ \ \ Chains scoring below a minimum score of '5000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap loose\ chainMinScore 5000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb xenTro3\ priority 159\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetXenTro3\ type bed 3\ visibility hide\ chainNetDanRer1Viewchain Chain bed 3 Zebrafish (Nov. 2003 (Zv3/danRer1)), Chain and Net Alignments 3 159 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetDanRer1\ shortLabel Chain\ spectrum on\ track chainNetDanRer1Viewchain\ view chain\ visibility pack\ chainNetXenTro3Viewchain Chain bed 3 xenTro3 (xenTro3), Chain and Net Alignments 3 159 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetXenTro3\ shortLabel Chain\ spectrum on\ track chainNetXenTro3Viewchain\ view chain\ visibility pack\ chainNetDanRer1Viewnet Net bed 3 Zebrafish (Nov. 2003 (Zv3/danRer1)), Chain and Net Alignments 2 159 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetDanRer1\ shortLabel Net\ track chainNetDanRer1Viewnet\ view net\ visibility full\ chainNetXenTro3Viewnet Net bed 3 xenTro3 (xenTro3), Chain and Net Alignments 2 159 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetXenTro3\ shortLabel Net\ track chainNetXenTro3Viewnet\ view net\ visibility full\ blastzSelfUnmasked Self Unmasked psl xeno Blastz Self Join Without Repeats Masked (tandem repeats masked) 0 159 0 0 0 127 127 127 1 0 0 x 1 group x\ longLabel Blastz Self Join Without Repeats Masked (tandem repeats masked)\ priority 159\ shortLabel Self Unmasked\ spectrum on\ track blastzSelfUnmasked\ type psl xeno\ visibility hide\ twinscanMgc TwinScan MGC psl . TwinScan MGC Candidates 0 159 0 0 0 127 127 127 0 0 0 x 1 group x\ longLabel TwinScan MGC Candidates\ priority 159\ shortLabel TwinScan MGC\ track twinscanMgc\ type psl .\ visibility hide\ pHMM_5_WayTop01 0.1% Conserved bed 5 . Top 0.1 % of Human/Chimp/Mouse/Rat/Chicken PhyloHMM Cons 0 160 0 0 0 127 127 127 0 0 0 x 1 group x\ longLabel Top 0.1 % of Human/Chimp/Mouse/Rat/Chicken PhyloHMM Cons\ priority 160\ shortLabel 0.1% Conserved\ track pHMM_5_WayTop01\ type bed 5 .\ visibility hide\ pHMM_5_WayTop1 1% Conserved bed 5 . Top 1 % of Human/Chimp/Mouse/Rat/Chicken PhyloHMM Cons 0 160 0 0 0 127 127 127 0 0 0 x 1 group x\ longLabel Top 1 % of Human/Chimp/Mouse/Rat/Chicken PhyloHMM Cons\ priority 160\ shortLabel 1% Conserved\ track pHMM_5_WayTop1\ type bed 5 .\ visibility hide\ blastzSelf Self Blastz psl xeno hg16 Human Merged Blastz Self Alignments 0 160 100 50 0 255 240 200 1 0 0 varRep 1 altColor 255,240,200\ color 100,50,0\ group varRep\ longLabel Human Merged Blastz Self Alignments\ otherDb hg16\ priority 160\ shortLabel Self Blastz\ spectrum on\ track blastzSelf\ type psl xeno hg16\ visibility hide\ blastzBestSelf Self Best psl xeno hg16 Human Blastz Best-in-Genome Self Alignments 0 161 100 50 0 255 240 200 1 0 0 varRep 1 altColor 255,240,200\ color 100,50,0\ group varRep\ longLabel $Organism Blastz Best-in-Genome Self Alignments\ priority 161\ shortLabel Self Best\ spectrum on\ track blastzBestSelf\ type psl xeno hg16\ visibility hide\ unAnnotated unAnnotated bed 4 . Regions Not Annotated as Genes/mRNAs/ESTs/CpG/Repeats/Gaps 0 161 20 0 50 137 127 152 0 0 0 x 1 color 20,0,50\ group x\ longLabel Regions Not Annotated as Genes/mRNAs/ESTs/CpG/Repeats/Gaps\ priority 161\ shortLabel unAnnotated\ track unAnnotated\ type bed 4 .\ visibility hide\ netSyntenyGalGal2 Chicken Synteny netAlign galGal2 chainGalGal2 Chicken (Feb. 2004 (WUGSC 1.0/galGal2)) Synteny Using Chained/Netted Blastz 0 162 0 100 0 255 240 200 0 0 0

    Description

    \

    This track shows the chicken/human chains for syntenic \ regions of chicken (Feb. 2004 (WUGSC 1.0/galGal2) - galGal2) and human.

    \ \

    Methods

    \

    The Chicken/Human Alignment Net track (netGalGal2) was processed \ with netFilter -syn to produce this set of syntenic alignments.

    \ \

    Credits

    \

    The chainNet, netSyntenic, netFilter and netClass programs were\ developed at the University of California\ at Santa Cruz by Jim Kent.\ For more information, see \ Kent WJ, Baertsch R, Hinrichs A, Miller W, and Haussler D (2003). \ Evolution's cauldron: \ Duplication, deletion, and rearrangement in the mouse and human genomes. \ Proc Natl Acad Sci USA 100(20):11484-11489 Sep 30 2003.\ \ compGeno 0 altColor 255,240,200\ color 0,100,0\ group compGeno\ longLabel $o_Organism ($o_date) Synteny Using Chained/Netted Blastz\ otherDb galGal2\ priority 162\ shortLabel Chicken Synteny\ track netSyntenyGalGal2\ type netAlign galGal2 chainGalGal2\ visibility hide\ blastzTightSelf Tight Self psl xeno hg16 Blastz Tight Subset of Best Self Alignments 0 162 100 50 0 255 240 200 1 0 0 varRep 1 altColor 255,240,200\ color 100,50,0\ group varRep\ longLabel Blastz Tight Subset of Best Self Alignments\ otherDb hg16\ priority 162\ shortLabel Tight Self\ spectrum on\ track blastzTightSelf\ type psl xeno hg16\ visibility hide\ ancientR Ancient Repeats bed 12 . Human/Mouse Ancient Repeats 0 163 0 0 0 127 127 127 1 0 0

    Display

    \

    This track displays alignments of the current mouse assembly (phusion.3)\ against regions of the human genome contained in an ancient copies of\ transposable elements. In this case "ancient" means that RepeatMasker's\ annotation indicates that the copy was fixed as an interspersed repeat in\ a common ancestor of human and mouse. These regions are of interest\ because they, more likely then any other region, have not been under\ functional constraint.\ Each block in the alignment is displayed as a colored block on the track\ with a line connecting all the blocks. The color of each alignment\ indicates the percent identity of aligned residues over all blocks of the\ alignment. 50% identity and below is lightly colored and the color gets\ linearly darker as the percent identity approaches 100%.\ In the alignments, lower case letters indicate that RepeatMasker annotated\ them as an interspersed repeat. Because of the high substitution rate in\ the mouse lineage, the element often only was recognized in the human\ genome. The original alignments often are much longer, but only the region\ witin the repeat is displayed.\ \

    Methods

    \

    The sequences were aligned with blastz (discontiguous exact seeds,\ ungapped extension, local alignments via dynamic programming) and\ postprocessed for single coverage.\ \

    Data

    \

    Human sequence from:

    \ \ http://genome-test.cse.ucsc.edu/gs.8/oo.33/chromFa.zip\

    Mouse sequence from phusion.3:

    \ \ ftp://ftp.ncbi.nlm.nih.gov/pub/TraceDB/mus_musculus/ClipReads/Assemblies/Sanger_Oct15/\

    Repeats from:

    \ \ http://genome.ucsc.edu/goldenPath/06aug2001/database/
    \ (chrN_rmsk.txt.gz for chromosome N)\
    \

    Credits

    \ Alignments contributed by Scott Schwartz. See \ http://bio.cse.psu.edu/genome/hummus/2001-12-16/aar/README.\ x 1 group x\ longLabel Human/Mouse Ancient Repeats\ priority 163\ shortLabel Ancient Repeats\ spectrum on\ track ancientR\ type bed 12 .\ visibility hide\ blastzBestGalGal2 Chicken Best psl xeno galGal2 Chicken (Feb. 2004 (WUGSC 1.0/galGal2)) Blastz Best-in-Genome 0 163 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz Best-in-Genome\ otherDb galGal2\ priority 163\ shortLabel Chicken Best\ spectrum on\ track blastzBestGalGal2\ type psl xeno galGal2\ visibility hide\ blatChicken Gg0 Blat psl xeno Chicken (gg0) Translated Blat Alignments 0 164 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel Chicken (gg0) Translated Blat Alignments\ priority 164\ shortLabel Gg0 Blat\ spectrum on\ track blatChicken\ type psl xeno\ visibility hide\ brMaf brMaf wigMaf brMaf 0 165 0 0 0 127 127 127 0 0 0 x 1 group x\ irows on\ longLabel brMaf\ priority 165\ sGroup_mammal monodelphis platypus\ sGroup_placental rat mouse rabbit cow dog rfbat hedgehog armadillo elephant tenrec\ sGroup_primate chimp baboon macaque marmoset galago\ sGroup_vertebrate chicken xenopus zebrafish tetraodon fugu\ shortLabel brMaf\ speciesGroups primate placental mammal vertebrate\ summary brMafSum\ track brMaf\ type wigMaf\ visibility hide\ blastzGg0 Gg0 Blastz psl xeno Chicken (Gg0-contigs, 5.2x coverage) Blastz 0 165 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel Chicken (Gg0-contigs, 5.2x coverage) Blastz\ priority 165\ shortLabel Gg0 Blastz\ spectrum on\ track blastzGg0\ type psl xeno\ visibility hide\ xTBAmaf xTBAmaf wigMaf xTBAmaf 0 165 0 0 0 127 127 127 0 0 0 x 1 group x\ irows on\ longLabel xTBAmaf\ priority 165\ sGroup_mammal monDom1 platypus\ sGroup_placental rn3 mm6 rabbit cow canFam1 rfbat hedgehog armadillo elephant tenrec\ sGroup_primate panTro1 baboon rheMac1 marmoset galago\ sGroup_vertebrate galGal2 xenTro1 danRer2 tetNig1 fr1\ shortLabel xTBAmaf\ speciesGroups primate placental mammal vertebrate\ summary xTBAmafSum\ track xTBAmaf\ type wigMaf\ visibility hide\ blastzBestGg0 Gg0 Best psl xeno Chicken (Gg0-contigs, 5.2x coverage) Blastz Best-in-Genome 0 166 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel Chicken (Gg0-contigs, 5.2x coverage) Blastz Best-in-Genome\ priority 166\ shortLabel Gg0 Best\ spectrum on\ track blastzBestGg0\ type psl xeno\ visibility hide\ syntenyNetSelf Self Syntenic Net netAlign hg16 chainSelf Syntenic Self Alignment Net 0 166 0 0 0 127 127 127 1 0 0 varRep 0 group varRep\ longLabel Syntenic Self Alignment Net\ otherDb hg16\ priority 166\ shortLabel Self Syntenic Net\ spectrum on\ track syntenyNetSelf\ type netAlign hg16 chainSelf\ visibility hide\ fosmidDiscordant Fosmid Discordants bed 4 . Fosmid Discordants 1 170 0 0 0 127 127 127 0 0 0

    Description

    \

    \ \ This track shows 285 sites of intermediate-sized structural variation (ISV) detected by \ mapping paired end-sequences from a human fosmid DNA library against the July 2003 assembly (build34) \ of the human genome. \ \ The average resolution of this technique is ~8 kb, and includes 56 sites of inversion \ not detectable by array-based approaches. \ \ However, because of the physical constraints of fosmid insert size, this technique is unable to \ detect insertions greater than 40 kb in size.\ \

    Type Abbreviation Annotation:

    \
    \
    First Letter = Rearrangement Type\
      D->Deletion \
      I->Insertion\
      V->Inversion\
    \
    Second Letter = Association with repeat or duplication\
      R->Human Specific Repeat\
      D->Duplication\
      N->Neither(Unique)\
    \
    Third Letter = Second Haplotype Support\
      N->Variant Site lacking support from the human genome reference\
      S->Variant Site with support from the human genome reference \
    
    \ \

    References

    \

    \ E Tuzun, AJ Sharp, JA Bailey, R Kaul, VA Morrison, LM Pertz, E Haugen, H Hayden, D Albertson, \ D Pinkel, MV Olson, EE Eichler (2005) \ Fine-scale structural variation of the human genome. Nature Genet\ varRep 1 group varRep\ longLabel Fosmid Discordants\ priority 170\ shortLabel Fosmid Discordants\ track fosmidDiscordant\ type bed 4 .\ visibility dense\ ratChain Rat Chain chain rn2 rn2 (rn2) Chained Alignments 0 170 100 50 0 255 240 200 1 0 0

    Description

    \

    \ This track shows rat/human genomic alignments using\ a gap scoring system that allows longer gaps than traditional\ affine gap scoring systems. It can also tolerate gaps\ in both rat and human simultaneously. These "double-sided"\ gaps can be caused by local inversions and overlapping deletions\ in both species.

    \

    \ The chain track displays boxes joined together by either single or \ double lines. The boxes represent aligning regions. \ Single lines indicate gaps that are largely due to a deletion in the \ rn2 assembly or an insertion in the human assembly.\ Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one \ species. In cases where there are multiple \ chains over a particular portion of the human genome, chains with \ single-lined gaps often are due to processed pseudogenes, while chains \ with double-lined gaps are more often due to paralogs and non-prodessed \ pseudogenes.

    \ \

    Methods

    \

    \ Transposons that have been inserted since the rat/human\ split were removed, and the resulting abbreviated genomes were\ aligned with blastz. The transposons were then put back into the\ alignments. The resulting alignments were converted into axt format\ and the resulting axts were fed into axtChain. AxtChain organizes all \ the alignments between a single rat and a single human chromosome\ into a group and creates a kd-tree out of all the gapless subsections\ (blocks) of the alignments. Next, maximally scoring chains of these\ blocks were found by running a dynamic program over the kd-tree. Chains\ scoring below a threshold were discarded; the remaining chains are\ displayed here.

    \ \

    Credits

    \

    \ Blastz was developed at Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

    \

    \ The axtChain program was developed at the University of California\ at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.\

    \

    \ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

    \ \

    References

    \

    \ Chiaromonte, F., Yap, V.B., Miller, W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput 2002, 115-26 (2002).

    \

    \ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

    \

    \ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., \ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 13(1), 103-7 (2003).

    \ compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Chained Alignments\ otherDb rn2\ priority 170\ shortLabel Rat Chain\ spectrum on\ track ratChain\ type chain rn2\ visibility hide\ slamNonCodingRat Slam Non-Coding Rat bed 5 . Slam Predictions of Human/Rat Conserved Non-Coding Regions 0 170 30 130 210 200 220 255 1 0 0

    Description and Credits

    \

    \ Slam predicts coding exons and conserved noncoding regions in a pair of\ homologous DNA sequences, incorporating both statistical sequence properties\ and degree of conservation into predictions. The model is symmetric and the\ same structure (with possibly different lengths) is predicted in both\ sequences.

    \

    \ The CNS (conserved non-coding sequence) predictions are ab initio\ predictions of conserved regions that do not fit in with a gene structure.\ Thus, SLAM is not simply trying to predict conserved regions to be coding,\ but is classifying such regions according to an overall probabilistic model\ of gene structure. The set of SLAM CNS predictions is therefore highly\ enriched for conserved non-coding regions.

    \

    \ More information and a web server can be found on the \ Slam website.

    \ \

    References

    \

    \ Alexandersson, M., Cawley, S., and Pachter, L. \ SLAM - Cross-species gene finding and alignment with a \ generalized pair hidden Markov model. \ Genome Res. 13(3), 496-502.

    \

    \ Cawley, S., Pachter, L., and Alexandersson, M. \ SLAM web server for comparative gene finding and alignment.\ Nucleic Acids Res. 31(13), 3507-3509 (2003).

    \

    \ Pachter, L., Alexandersson, M., and Cawley, S. \ Applications of generalized pair hidden Markov models to \ alignment and gene finding problems. \ J Comput Biol. 9(2), 389-99 (2002).

    \

    \ Pachter, L., Alexandersson, M., and Cawley, S. Applications of generalized \ pair hidden Markov models to alignment and gene finding problems. \ Proceedings of the Fifth Annual International Conference on Computational \ Molecular Biology (RECOMB 2001) (2001).

    \ \ compGeno 1 altColor 200,220,255\ color 30,130,210\ group compGeno\ longLabel Slam Predictions of Human/Rat Conserved Non-Coding Regions\ priority 170\ shortLabel Slam Non-Coding Rat\ spectrum on\ track slamNonCodingRat\ type bed 5 .\ visibility hide\ encodeRegionsLiftOver liftOver Regions bed 4 . liftOver ENCODE Region Orthologs (Freeze 3) 0 170.1 0 200 0 127 227 127 0 0 0 encode 1 color 0,200,0\ group encode\ longLabel liftOver ENCODE Region Orthologs (Freeze 3)\ priority 170.1\ shortLabel liftOver Regions\ track encodeRegionsLiftOver\ type bed 4 .\ visibility hide\ encodeRegionsMercator Mercator Regions bed 4 . Mercator ENCODE Region Orthologs (Freeze 3) 0 170.2 0 0 200 127 127 227 0 0 0 encode 1 color 0,0,200\ group encode\ longLabel Mercator ENCODE Region Orthologs (Freeze 3)\ priority 170.2\ shortLabel Mercator Regions\ track encodeRegionsMercator\ type bed 4 .\ visibility hide\ encodeRegionsMercatorMerged Mercator Regions bed 4 . Merged Mercator ENCODE Region Orthologs (Freeze 3) 0 170.3 0 0 200 127 127 227 0 0 0 encode 1 color 0,0,200\ group encode\ longLabel Merged Mercator ENCODE Region Orthologs (Freeze 3)\ priority 170.3\ shortLabel Mercator Regions\ track encodeRegionsMercatorMerged\ type bed 4 .\ visibility hide\ encodeRegionsConsensus ENCODE Region Consensus bed 4 . Consensus Orthology of ENCODE Regions from LiftOver and Mercator (Freeze 3) 0 170.4 150 100 30 202 177 142 0 0 0 encode 1 color 150,100,30\ group encode\ longLabel Consensus Orthology of ENCODE Regions from LiftOver and Mercator (Freeze 3)\ priority 170.4\ shortLabel ENCODE Region Consensus\ track encodeRegionsConsensus\ type bed 4 .\ visibility hide\ encodeRegions2 ENCODE Region Consensus (Freeze 2) bed 4 . Consensus Orthology of ENCODE Regions from liftOver and Mercator (Freeze 2) 3 170.5 200 0 0 227 127 127 0 0 0 encode 1 color 200,0,0\ group encode\ longLabel Consensus Orthology of ENCODE Regions from liftOver and Mercator (Freeze 2)\ priority 170.5\ shortLabel ENCODE Region Consensus (Freeze 2)\ track encodeRegions2\ type bed 4 .\ visibility pack\ ratNet Rat Net netAlign rn2 ratChain rn2 (rn2) Alignment Net 0 171 0 0 0 127 127 127 1 0 0

    Description

    \

    \ This track shows the best rat/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. \ \

    Display Conventions and Configuration

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    \ Chains were derived from blastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

    \

    \ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

    \ \

    References

    \

    \ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

    \

    \ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

    \ \ \ \ compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Alignment Net\ otherDb rn2\ priority 171\ shortLabel Rat Net\ spectrum on\ track ratNet\ type netAlign rn2 ratChain\ visibility hide\ syntenyRat Rat Synteny bed 4 + Rat (June 2003 (Baylor 3.1/rn3)) Synteny Using Blastz Single Coverage (100k window) 0 172 0 100 0 255 240 200 0 0 0

    Description

    \

    \ This track shows syntenous (corresponding) regions between human and rat chromosomes. \

    Methods

    \

    \ We passed a 100k non-overlapping window over the genome and using the Blastz best in rat \ genome alignments - looked for high-scoring regions with at least 40% of the bases aligning \ with the same region in rat. 100k segments were joined together if they agreed in direction and\ were within 500 kb of each other in the human genome and within 4 Mb of each other in the rat. \ Gaps were joined between syntenic anchors if the bases between two flanking regions agreed with \ synteny (direction and rat location). Finally, we extended the syntenic block to include those \ areas.

    \

    Credits

    \

    \ Contact Robert \ Baertsch at UCSC for more information about this track.\ compGeno 1 altColor 255,240,200\ color 0,100,0\ group compGeno\ longLabel $o_Organism ($o_date) Synteny Using Blastz Single Coverage (100k window)\ otherDb rn3\ priority 172\ shortLabel Rat Synteny\ track syntenyRat\ type bed 4 +\ visibility hide\ blastzRn3 Rat Blastz psl xeno rn3 Rat (June 2003 (Baylor 3.1/rn3)) Blastz Alignments 0 173 100 50 0 255 240 200 1 0 0

    Description

    \

    \ This track displays blastz alignments of the rat assembly (rn3,\ June 2003 (Baylor 3.1/rn3)) to the human genome. The track has an optional feature that\ color codes alignments to indicate the chromosomes from which they are\ derived in the aligning assembly. To activate the color feature, click\ the on radio button next to "Color track based on\ chromosome" on the track description page.

    \ \

    Methods

    \

    \ For blastz, 12 of 19 seeds were used and then scored using:\

    \
          A     C     G     T\
    A    91  -114   -31  -123\
    C  -114   100  -125   -31\
    G   -31  -125   100  -114\
    T  -123   -31  -114    91\
    \
    O = 400, E = 30, K = 3000, L = 3000, M = 50\
    
    \

    \

    \ A second pass was made at reduced stringency (7mer seeds and\ MSP threshold of K=2200) to attempt to fill in gaps of up to about 10K bp.\ Lineage-specific repeats were abridged during this alignment.

    \ \

    Using the Filter

    \

    \ This track has a filter that can be used to change the display mode,\ turn on the chromosome color track, or filter the display output by\ chromosome. The filter is located at the top of the track description page,\ which is accessed via the small button to the left of the track's graphical\ display or through the link on the track's control menu.\

      \
    • Color track: To display the chromosome color track, click the\ on button next to "Color track based on chromosome".\ When the color track is activated, each of the items within the annotation\ track will be colored to show the chromosome in the aligning genome\ assembly from which the alignment originated.\
    • Chromosome filter: To display only alignments from a specific\ chromomsome in the aligning assembly, type the chromosome number (in the\ form chrN) in the text box to the right of "Filter by\ chromosome". For example, to display alignments from chromosome 6,\ type "chr6".\
    \

    \ When you have finished configuring the filter, click the Submit\ button.

    \ \

    Credits

    \

    \ These alignments were contributed by Scott Schwartz of the\ Penn State Bioinformatics\ Group. The best-in-genome filtering was done using UCSC's\ axtBest program.

    \ \

    References

    \

    \ Chiaromonte, F., Yap, V.B., Miller, W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput 2002, 115-26 (2002).

    \

    \ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-mouse alignments with BLASTZ.\ Genome Res. 13(1), 103-107 (2003).

    \ \ compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz Alignments\ otherDb rn3\ priority 173\ shortLabel Rat Blastz\ spectrum on\ track blastzRn3\ type psl xeno rn3\ visibility hide\ blastzBestRn3 Rat Best psl xeno rn3 Rat (June 2003 (Baylor 3.1/rn3)) Blastz Best-in-Genome Alignments 0 174 100 50 0 255 240 200 1 0 0

    Description

    \

    \ This track shows blastz alignments of the rat assembly\ (rn3, June 2003 (Baylor 3.1/rn3)) to the human genome, filtered to display only the \ best alignment for any given region of the human genome. The track has\ an optional feature that color codes alignments to indicate the chromosomes\ from which they are derived in the aligning assembly. To activate the color\ feature, click the on button next to "Color track\ based on chromosome" on the track description page.

    \ \

    Methods

    \

    \ For blastz, 12 of 19 seeds were used and then scored using:\

    \
          A     C     G     T\
    A    91  -114   -31  -123\
    C  -114   100  -125   -31\
    G   -31  -125   100  -114\
    T  -123   -31  -114    91\
    \
    O = 400, E = 30, K = 3000, L = 3000, M = 50\
    

    \

    \ A second pass was made at reduced stringency (7mer seeds and\ MSP threshold of K=2200) to attempt to fill in gaps of up to about 10K bp.\ Lineage-specific repeats were abridged during this alignment.

    \ \

    Using the Filter

    \

    \ This track has a filter that can be used to change the display mode,\ turn on the chromosome color track, or filter the display output by\ chromosome. The filter is located at the top of the track description page,\ which is accessed via the small button to the left of the track's graphical\ display or through the link on the track's control menu.\

      \
    • Color track: To display the chromosome color track, click the\ on button next to "Color track based on chromosome".\ When the color track is activated, each of the items within the annotation\ track will be colored to show the chromosome in the aligning genome\ assembly from which the alignment originated.\
    • Chromosome filter: To display only alignments from a specific\ chromomsome in the aligning assembly, type the chromosome number (in the\ form chrN) in the text box to the right of "Filter by\ chromosome". For example, to display alignments from chromosome 6,\ type "chr6".\
    \

    \ When you have finished configuring the filter, click the Submit\ button.

    \ \

    Credits

    \

    \ These alignments were contributed by Scott Schwartz of the\ Penn State Bioinformatics\ Group. The best-in-genome filtering was done using UCSC's\ axtBest program.

    \ \

    References

    \

    \ Chiaromonte, F., Yap, V.B., Miller, W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput 2002, 115-26 (2002).

    \

    \ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-mouse alignments with BLASTZ.\ Genome Res. 13(1), 103-107 (2003).

    \ \ compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz Best-in-Genome Alignments\ otherDb rn3\ priority 174\ shortLabel Rat Best\ spectrum on\ track blastzBestRn3\ type psl xeno rn3\ visibility hide\ blastzTightRn3 Rat Tight psl xeno rn3 Rat (June 2003 (Baylor 3.1/rn3)) Blastz Tight Subset of Best Alignments 0 175 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz Tight Subset of Best Alignments\ otherDb rn3\ priority 175\ shortLabel Rat Tight\ spectrum on\ track blastzTightRn3\ type psl xeno rn3\ visibility hide\ syntenyRn3 Rat Synteny bed 4 + Rat (June 2003 (Baylor 3.1/rn3)) Synteny Using Blastz Single Coverage (100k window) 0 178 0 100 0 255 240 200 0 0 0

    Description

    \

    \ This track shows syntenous (corresponding) regions between human and rat chromosomes. The June 2003 (rn3) assembly of the rat genome was used to produce this annotation.

    \ \

    Methods

    \

    \ We passed a 100k non-overlapping window over the genome and - using the blastz best in rat \ genome alignments - looked for high-scoring regions with at least 40% of the bases aligning \ with the same region in rat. 100k segments were joined together if they agreed in direction and\ were within 500 kb of each other in the human genome and within 4 Mb of each other in the rat. \ Gaps were joined between syntenic anchors if the bases between two flanking regions agreed with \ synteny (direction and rat location). Finally, we extended the syntenic block to include those \ areas.

    \ \

    Credits

    \

    \ Contact Robert \ Baertsch at UCSC for more information about this track.\ Thanks to the Rat Genome Sequencing Consortium for providing the rat sequence \ data. For more information, see the Baylor College Human Genome Sequencing Center\ Rat Genome Project page.

    \ compGeno 1 altColor 255,240,200\ color 0,100,0\ group compGeno\ longLabel $o_Organism ($o_date) Synteny Using Blastz Single Coverage (100k window)\ otherDb rn3\ priority 178\ shortLabel Rat Synteny\ track syntenyRn3\ type bed 4 +\ visibility hide\ syntenyNetRn3 Rat Syntenic Net netAlign rn3 chainRn3 Rat (June 2003 (Baylor 3.1/rn3)) Syntenic Alignment Net 0 179 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb rn3\ priority 179\ shortLabel Rat Syntenic Net\ spectrum on\ track syntenyNetRn3\ type netAlign rn3 chainRn3\ visibility hide\ mouseSyn NCBI Synteny bed 4 + Corresponding Chromosome in Mouse (NCBI) 0 180 120 70 30 187 162 142 0 0 0

    Description

    \

    This track shows syntenous (corresponding) regions between human and mouse\ chromosomes.

    \

    Method

    \

    This track was created by looking for homology to known mouse genes in the draft \ assembly. The mouse data are provided at the chromosome level (not cytoband).

    \

    Credits

    \

    The data for this track were kindly provided by Deanna Church at NCBI. Refer to the \ NCBI Homology site for more\ details.

    \ \

    Credits

    \

    This track is produced from mouse sequence data provided by the \ Mouse Genome Sequencing Consortium. \ compGeno 1 altColor 0,0,0\ color 120,70,30\ group compGeno\ longLabel Corresponding Chromosome in Mouse (NCBI)\ priority 180\ shortLabel NCBI Synteny\ track mouseSyn\ type bed 4 +\ visibility hide\ slamNonCodingMouse Slam Non-Coding Mouse bed 5 . Slam Predictions of Human/Mouse Conserved Non-Coding Regions 0 180 30 130 210 200 220 255 1 0 0

    Description and Credits

    \

    \ Slam predicts coding exons and conserved noncoding regions in a pair of\ homologous DNA sequences, incorporating both statistical sequence properties\ and degree of conservation into predictions. The model is symmetric and the\ same structure (with possibly different lengths) is predicted in both\ sequences.

    \

    \ The CNS (conserved non-coding sequence) predictions are ab initio\ predictions of conserved regions that do not fit in with a gene structure.\ Thus, SLAM is not simply trying to predict conserved regions to be coding,\ but is classifying such regions according to an overall probabilistic model\ of gene structure. The set of SLAM CNS predictions is therefore highly\ enriched for conserved non-coding regions.

    \

    \ More information and a web server can be found on the \ Slam website.

    \ \

    References

    \

    \ Alexandersson, M., Cawley, S., and Pachter, L. \ SLAM - Cross-species gene finding and alignment with a \ generalized pair hidden Markov model. \ Genome Res. 13(3), 496-502.

    \

    \ Cawley, S., Pachter, L., and Alexandersson, M. \ SLAM web server for comparative gene finding and alignment.\ Nucleic Acids Res. 31(13), 3507-3509 (2003).

    \

    \ Pachter, L., Alexandersson, M., and Cawley, S. \ Applications of generalized pair hidden Markov models to \ alignment and gene finding problems. \ J Comput Biol. 9(2), 389-99 (2002).

    \

    \ Pachter, L., Alexandersson, M., and Cawley, S. Applications of generalized \ pair hidden Markov models to alignment and gene finding problems. \ Proceedings of the Fifth Annual International Conference on Computational \ Molecular Biology (RECOMB 2001) (2001).

    \ \ compGeno 1 altColor 200,220,255\ color 30,130,210\ group compGeno\ longLabel Slam Predictions of Human/Mouse Conserved Non-Coding Regions\ priority 180\ shortLabel Slam Non-Coding Mouse\ spectrum on\ track slamNonCodingMouse\ type bed 5 .\ visibility hide\ mouseSynWhd Mouse Synteny bed 6 + Whitehead Corresponding Chromosome in Mouse (300k window) 0 181 120 70 30 187 162 142 0 0 0

    Description

    \

    \ This track shows orthologous (syntenic) regions between mouse and human\ chromosomes.\

    \ See \ \ http://www-genome.wi.mit.edu/mouse/synteny/index.html \ for genomic dotplots and additional information or the following site for\ an alternative synteny map based on orthologous genes:\ \ http://www.ncbi.nlm.nih.gov/Homology/ .\ \

    Credits

    \

    \ The data for this track are kindly provided by \ Michael Kamal \ at the \ \ Whitehead Institute. \ Mouse sequence data is provided by the \ Mouse Genome Sequencing Consortium. \ \ compGeno 1 altColor 0,0,0\ color 120,70,30\ group compGeno\ longLabel Whitehead Corresponding Chromosome in Mouse (300k window)\ priority 181\ shortLabel Mouse Synteny\ track mouseSynWhd\ type bed 6 +\ visibility hide\ blastzMm4 mm4 Blastz psl xeno mm4 mm4 (mm4) Blastz All Alignments 0 182 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz All Alignments\ otherDb mm4\ priority 182\ shortLabel $o_Organism Blastz\ spectrum on\ track blastzMm4\ type psl xeno mm4\ visibility hide\ blastzMm5 Mouse Blastz psl xeno mm5 mm5 (mm5) Blastz All Alignments 0 182 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz All Alignments\ otherDb mm5\ priority 182\ shortLabel Mouse Blastz\ spectrum on\ track blastzMm5\ type psl xeno mm5\ visibility hide\ blastzBestMm4 mm4 Best psl xeno mm4 mm4 (mm4) Blastz Best-in-Genome Alignments 0 183 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz Best-in-Genome Alignments\ otherDb mm4\ priority 183\ shortLabel $o_Organism Best\ spectrum on\ track blastzBestMm4\ type psl xeno mm4\ visibility hide\ blastzBestMm5 Mouse Best psl xeno mm5 mm5 (mm5) Blastz Best-in-Genome Alignments 0 183 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz Best-in-Genome Alignments\ otherDb mm5\ priority 183\ shortLabel Mouse Best\ spectrum on\ track blastzBestMm5\ type psl xeno mm5\ visibility hide\ blastzTightMm4 mm4 Tight psl xeno mm4 mm4 (mm4) Blastz Tight Subset of Best Alignments 0 184 100 50 0 255 240 200 1 0 0

    Description

    \

    \ This track displays blastz alignments of the mm4 assembly \ (mm4, mm4) to the human genome, filtered by axtBest and \ subsetAxt with very stringent constraints as described below.\ The track has an optional feature that color codes alignments to indicate \ the chromosomes from which they are derived in the aligning assembly. To \ activate the color feature, click the on button next to \ "Color track based on chromosome" on the track description page.

    \

    \ Each item in the display is identified by the chromosome, strand, and \ location of the match (in thousands).

    \ \

    Methods

    \

    \ For blastz, 12 of 19 seeds were used and then scored using:\

    \
          A     C     G     T\
    A    91  -114   -31  -123\
    C  -114   100  -125   -31\
    G   -31  -125   100  -114\
    T  -123   -31  -114    91\
    \
    O = 400, E = 30, K = 3000, L = 3000, M = 50\
    

    \

    \ A second pass was made at reduced stringency (7mer seeds and\ MSP threshold of K=2200) to attempt to fill in gaps of up to about 10K bp.\ Lineage-specific repeats were abridged during this alignment.

    \ AxtBest was used to select only the best alignment for any given region\ of the genome. SubsetAxt was then run on axtBest-filtered alignments \ with this matrix:\
    \
          A     C     G     T\
    A   100  -200  -100  -200\
    C  -200   100  -200  -100\
    G  -100  -200   100  -200\
    T  -200  -100  -200   100\
    
    \ with a gap open penalty of 2000 and a gap extension penalty of 50. \ The minimum score threshold was 3400.

    \ \

    Using the Filter

    \

    \ This track has a filter that can be used to change the display mode,\ turn on the chromosome color track, or filter the display output by\ chromosome. The filter is located at the top of the track description page,\ which is accessed via the small button to the left of the track's graphical\ display or through the link on the track's control menu.\

      \
    • Color track: To display the chromosome color track, click the\ on button next to "Color track based on chromosome".\ When the color track is activated, each of the items within the annotation\ track will be colored to show the chromosome in the aligning genome\ assembly from which the alignment originated.\
    • Chromosome filter: To display only alignments from a specific\ chromomsome in the aligning assembly, type the chromosome number (in the\ form chrN) in the text box to the right of "Filter by\ chromosome". For example, to display alignments from chromosome 6,\ type "chr6".\

    \ When you have finished configuring the filter, click the Submit\ button.

    \ \

    Credits

    \

    \ The alignments are contributed by Scott Schwartz from the \ Penn State Bioinformatics \ Group. The best in genome filtering is done by UCSC's \ axtBest and subsetAxt programs. Mouse sequence data were provided by the \ Mouse Genome Sequencing Consortium.

    \ \

    References

    \

    \ Chiaromonte, F., Yap, V.B., Miller, W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput 2002, 115-26 (2002).

    \

    \ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-mouse alignments with BLASTZ.\ Genome Res. 13(1), 103-107 (2003).

    \ \ compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz Tight Subset of Best Alignments\ otherDb mm4\ priority 184\ shortLabel $o_Organism Tight\ spectrum on\ track blastzTightMm4\ type psl xeno mm4\ visibility hide\ blastzTightMm5 Mouse Tight psl xeno mm5 mm5 (mm5) Blastz Tight Subset of Best Alignments 0 184 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz Tight Subset of Best Alignments\ otherDb mm5\ priority 184\ shortLabel Mouse Tight\ spectrum on\ track blastzTightMm5\ type psl xeno mm5\ visibility hide\ netMm4 mm4 Net netAlign mm4 chainMm4 mm4 (mm4) Alignment Net 0 186 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Alignment Net\ otherDb mm4\ priority 186\ shortLabel $o_Organism Net\ spectrum on\ track netMm4\ type netAlign mm4 chainMm4\ visibility hide\ syntenyMm4 mm4 Synteny bed 4 . mm4 (mm4) Synteny Using Blastz Single Coverage (100k window) 0 187 0 100 0 255 240 200 0 0 0

    Description

    \

    \ This track shows syntenous (corresponding) regions between human and mouse chromosomes. \ The Oct. 2003 (mm4) assembly of the mouse genome was used to produce this annotation.

    \ \

    Methods

    \

    \ We passed a 100k non-overlapping window over the genome and - using the blastz best in mouse \ genome alignments - looked for high-scoring regions with at least 40% of the bases aligning \ with the same region in mouse. 100k segments were joined together if they agreed in direction and\ were within 500 kb of each other in the human genome and within 4 Mb of each other in the mouse. \ Gaps were joined between syntenic anchors if the bases between two flanking regions agreed with \ synteny (direction and mouse location). Finally, we extended the syntenic block to include those \ areas.

    \ \

    Credits

    \

    \ Contact Robert \ Baertsch at UCSC for more information about this track.\ Thanks to the Mouse Genome Sequencing Consortium for providing the \ mouse sequence data.

    \ compGeno 1 altColor 255,240,200\ color 0,100,0\ group compGeno\ longLabel $o_Organism ($o_date) Synteny Using Blastz Single Coverage (100k window)\ otherDb mm4\ priority 187\ shortLabel $o_Organism Synteny\ track syntenyMm4\ type bed 4 .\ visibility hide\ syntenyMm5 Mm5 Synteny bed 4 . mm5 (mm5) Synteny Using Blastz Single Coverage (100k window) 0 187 0 100 0 255 240 200 0 0 0 compGeno 1 altColor 255,240,200\ color 0,100,0\ group compGeno\ longLabel $o_Organism ($o_date) Synteny Using Blastz Single Coverage (100k window)\ otherDb mm5\ priority 187\ shortLabel Mm5 Synteny\ track syntenyMm5\ type bed 4 .\ visibility hide\ syntenyNetMm4 mm4 Syntenic Net netAlign mm4 chainMm4 mm4 (mm4) Syntenic Alignment Net 0 188 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb mm4\ priority 188\ shortLabel $o_Organism Syntenic Net\ spectrum on\ track syntenyNetMm4\ type netAlign mm4 chainMm4\ visibility hide\ syntenyNetMm5 Mm5 Syntenic Net netAlign mm5 chainMm5 mm5 (mm5) Syntenic Alignment Net 0 188 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb mm5\ priority 188\ shortLabel Mm5 Syntenic Net\ spectrum on\ track syntenyNetMm5\ type netAlign mm5 chainMm5\ visibility hide\ syntenyCow Cow Synteny bed 6 . Cow Synteny Using RH Mapping 0 188.7 0 100 0 255 240 200 0 0 0

    Description

    \

    \ This track depicts human-cattle synteny segments as defined on the basis\ of a cattle-human comparative map containing 3,200 BAC-end sequences and\ EST markers with a single significant hit (E-value less than 0.00001) \ in the human genome sequence (hg15) as defined by the TimeLogic \ Tera-BLASTn program (Everts-van der Wind et al., 2005). \ The synteny blocks were defined according to the rules described in \ Murphy et al. (2005). \ \

    \

    Credits

    \

    \ Thanks to Harris Lewin, Denis Larkin, and Annelie Everts-van der Wind,\ University of Illinois at Urbana-Champaign, for providing these data.\

    \

    References

    \

    Everts-van der Wind, A., Larkin, D., Green, C., Elliott, J., Olmstead, C., \ Chiu, R., Schein, J., Marra, M., Womack, J., and Lewin, H.\ \ A high-resolution whole-genome cattle-human comparative map reveals details \ of mammalian chromosome evolution.\ Proc Natl Acad Sci 102(51) 18526-18531 (2005).\

    \

    \ Murphy, W., Larkin, D., Everts-van der Wind, A., Bourque, G., Tesler, G., \ Auvil, L.,\ Beever, J., Chowdhary, B., Galibert, F., Gatzke, L., Hitte, C., Meyers, S.,\ Milan, D., Ostrander, E., Pape, G., Parker, H., Raudsepp, T., Rogatcheva, M.,\ Schook, L., Skow, L., Welge, M., Womack, J., O'Brien, S., Pevzner, P., and\ Lewin, H.\ \ Dynamics of Mammalian Chromosome Evolution Inferred from Multispecies Comparative Maps.\ Science 309(5734) 613-617 (2005).\

    \ compGeno 1 altColor 255,240,200\ color 0,100,0\ group compGeno\ longLabel Cow Synteny Using RH Mapping\ priority 188.70\ shortLabel Cow Synteny\ track syntenyCow\ type bed 6 .\ visibility hide\ blastzMm3 Mm3 Blastz psl xeno mm3 mm3 (mm3) Blastz All Alignments 0 189.1 100 50 0 255 240 200 1 0 0

    Description

    \

    \ This track displays blastz alignments of the mm3 assembly (mm3,\ mm3) to the human genome. The track has an optional feature that\ color codes alignments to indicate the chromosomes from which they are\ derived in the aligning assembly. To activate the color feature, click\ the on radio button next to "Color track based on\ chromosome" on the track description page.

    \ \

    Methods

    \

    \ For blastz, 12 of 19 seeds were used and then scored using:\

    \
          A     C     G     T\
    A    91  -114   -31  -123\
    C  -114   100  -125   -31\
    G   -31  -125   100  -114\
    T  -123   -31  -114    91\
    \
    O = 400, E = 30, K = 3000, L = 3000, M = 50\
    
    \

    \

    \ A second pass was made at reduced stringency (7mer seeds and\ MSP threshold of K=2200) to attempt to fill in gaps of up to about 10K bp.\ Lineage-specific repeats were abridged during this alignment.

    \ \

    Using the Filter

    \

    \ This track has a filter that can be used to change the display mode,\ turn on the chromosome color track, or filter the display output by\ chromosome. The filter is located at the top of the track description page,\ which is accessed via the small button to the left of the track's graphical\ display or through the link on the track's control menu.\

      \
    • Color track: To display the chromosome color track, click the\ on button next to "Color track based on chromosome".\ When the color track is activated, each of the items within the annotation\ track will be colored to show the chromosome in the aligning genome\ assembly from which the alignment originated.\
    • Chromosome filter: To display only alignments from a specific\ chromomsome in the aligning assembly, type the chromosome number (in the\ form chrN) in the text box to the right of "Filter by\ chromosome". For example, to display alignments from chromosome 6,\ type "chr6".\
    \

    \ When you have finished configuring the filter, click the Submit\ button.

    \ \

    Credits

    \

    \ These alignments were contributed by Scott Schwartz of the\ Penn State Bioinformatics\ Group. The best-in-genome filtering was done using UCSC's\ axtBest program.

    \ \

    References

    \

    \ Chiaromonte, F., Yap, V.B., Miller, W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput 2002, 115-26 (2002).

    \

    \ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-mouse alignments with BLASTZ.\ Genome Res. 13(1), 103-107 (2003).

    \ \ compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz All Alignments\ otherDb mm3\ priority 189.1\ shortLabel Mm3 Blastz\ spectrum on\ track blastzMm3\ type psl xeno mm3\ visibility hide\ blastzBestMm3 Mm3 Best psl xeno mm3 mm3 (mm3) Blastz Best-in-Genome Alignments 0 189.2 100 50 0 255 240 200 1 0 0

    Description

    \

    \ This track shows blastz alignments of the mm3 assembly\ (mm3, mm3) to the human genome, filtered to display only the \ best alignment for any given region of the human genome. The track has\ an optional feature that color codes alignments to indicate the chromosomes\ from which they are derived in the aligning assembly. To activate the color\ feature, click the on button next to "Color track\ based on chromosome" on the track description page.

    \ \

    Methods

    \

    \ For blastz, 12 of 19 seeds were used and then scored using:\

    \
          A     C     G     T\
    A    91  -114   -31  -123\
    C  -114   100  -125   -31\
    G   -31  -125   100  -114\
    T  -123   -31  -114    91\
    \
    O = 400, E = 30, K = 3000, L = 3000, M = 50\
    

    \

    \ A second pass was made at reduced stringency (7mer seeds and\ MSP threshold of K=2200) to attempt to fill in gaps of up to about 10K bp.\ Lineage-specific repeats were abridged during this alignment.

    \ \

    Using the Filter

    \

    \ This track has a filter that can be used to change the display mode,\ turn on the chromosome color track, or filter the display output by\ chromosome. The filter is located at the top of the track description page,\ which is accessed via the small button to the left of the track's graphical\ display or through the link on the track's control menu.\

      \
    • Color track: To display the chromosome color track, click the\ on button next to "Color track based on chromosome".\ When the color track is activated, each of the items within the annotation\ track will be colored to show the chromosome in the aligning genome\ assembly from which the alignment originated.\
    • Chromosome filter: To display only alignments from a specific\ chromomsome in the aligning assembly, type the chromosome number (in the\ form chrN) in the text box to the right of "Filter by\ chromosome". For example, to display alignments from chromosome 6,\ type "chr6".\
    \

    \ When you have finished configuring the filter, click the Submit\ button.

    \ \

    Credits

    \

    \ These alignments were contributed by Scott Schwartz of the\ Penn State Bioinformatics\ Group. The best-in-genome filtering was done using UCSC's\ axtBest program.

    \ \

    References

    \

    \ Chiaromonte, F., Yap, V.B., Miller, W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput 2002, 115-26 (2002).

    \

    \ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-mouse alignments with BLASTZ.\ Genome Res. 13(1), 103-107 (2003).

    \ \ compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz Best-in-Genome Alignments\ otherDb mm3\ priority 189.2\ shortLabel Mm3 Best\ spectrum on\ track blastzBestMm3\ type psl xeno mm3\ visibility hide\ blastzTightMm3 Mm3 Tight psl xeno mm3 mm3 (mm3) Blastz Tight Subset of Best Alignments 0 189.3 100 50 0 255 240 200 1 0 0

    Description

    \

    \ This track displays blastz alignments of the mm3 assembly (mm3,\ mm3) to the human genome, filtered by axtBest and subsetAxt with \ very stringent constraints as described below. The track has an optional \ feature that color codes alignments to indicate the chromosomes from which \ they are derived in the aligning assembly. To activate the color feature, \ click the on button next to "Color track based on\ chromosome" on the track description page.

    \ \ Each item in the display is identified by the chromosome, strand, and \ location of the match (in thousands).

    \ \

    Methods

    \

    \ For blastz, 12 of 19 seeds were used and then scored using:\

    \
          A     C     G     T\
    A    91  -114   -31  -123\
    C  -114   100  -125   -31\
    G   -31  -125   100  -114\
    T  -123   -31  -114    91\
    \
    O = 400, E = 30, K = 3000, L = 3000, M = 50\
    

    \

    \ A second pass was made at reduced stringency (7mer seeds and\ MSP threshold of K=2200) to attempt to fill in gaps of up to about 10K bp.\ Lineage-specific repeats were abridged during this alignment.

    \ AxtBest was used to select only the best alignment for any given region\ of the genome. SubsetAxt was then run on axtBest-filtered alignments \ with this matrix:\
    \
          A     C     G     T\
    A   100  -200  -100  -200\
    C  -200   100  -200  -100\
    G  -100  -200   100  -200\
    T  -200  -100  -200   100\
    
    \ with a gap open penalty of 2000 and a gap extension penalty of 50. \ The minimum score threshold was 3400.

    \ \

    Using the Filter

    \

    \ This track has a filter that can be used to change the display mode,\ turn on the chromosome color track, or filter the display output by\ chromosome. The filter is located at the top of the track description page,\ which is accessed via the small button to the left of the track's graphical\ display or through the link on the track's control menu.\

      \
    • Color track: To display the chromosome color track, click the\ on button next to "Color track based on chromosome".\ When the color track is activated, each of the items within the annotation\ track will be colored to show the chromosome in the aligning genome\ assembly from which the alignment originated.\
    • Chromosome filter: To display only alignments from a specific\ chromomsome in the aligning assembly, type the chromosome number (in the\ form chrN) in the text box to the right of "Filter by\ chromosome". For example, to display alignments from chromosome 6,\ type "chr6".\

    \ When you have finished configuring the filter, click the Submit\ button.

    \ \

    Credits

    \

    \ The alignments are contributed by Scott Schwartz from the \ Penn State Bioinformatics \ Group. The best in genome filtering is done by UCSC's \ axtBest and subsetAxt programs. Mouse sequence data were provided by the \ Mouse Genome Sequencing Consortium.

    \ \

    References

    \

    \ Chiaromonte, F., Yap, V.B., Miller, W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput 2002, 115-26 (2002).

    \

    \ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-mouse alignments with BLASTZ.\ Genome Res. 13(1), 103-107 (2003).

    \ \ compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Blastz Tight Subset of Best Alignments\ otherDb mm3\ priority 189.3\ shortLabel Mm3 Tight\ spectrum on\ track blastzTightMm3\ type psl xeno mm3\ visibility hide\ netMm3 Mm3 Net netAlign mm3 chainMm3 mm3 (mm3) Alignment Net 0 189.5 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Alignment Net\ otherDb mm3\ priority 189.5\ shortLabel Mm3 Net\ spectrum on\ track netMm3\ type netAlign mm3 chainMm3\ visibility hide\ syntenyNetMm3 Mm3 Syntenic Net netAlign mm3 chainMm3 mm3 (mm3) Syntenic Alignment Net 0 189.6 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb mm3\ priority 189.6\ shortLabel Mm3 Syntenic Net\ spectrum on\ track syntenyNetMm3\ type netAlign mm3 chainMm3\ visibility hide\ multizPanTro1Rm1 3 Primate maf Human/Chimp/Rhesus Multiz Al. (panTro1 - macaca mulatta trimmed reads) Reciprocal Best 0 190 0 0 0 127 127 127 0 0 0 compGeno 0 group compGeno\ longLabel Human/Chimp/Rhesus Multiz Al. (panTro1 - macaca mulatta trimmed reads) Reciprocal Best\ priority 190\ shortLabel 3 Primate\ track multizPanTro1Rm1\ type maf\ visibility hide\ chainNetMacEug1 macEug1 Chain/Net bed 3 Wallaby (Nov. 2007 (Baylor 1.0/macEug1)), Chain and Net Alignments 0 191 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of wallaby (Nov. 2007 (Baylor 1.0/macEug1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ wallaby and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ wallaby assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best wallaby/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The wallaby sequence used in this annotation is from\ the Nov. 2007 (Baylor 1.0/macEug1) (macEug1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the wallaby/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single wallaby chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb macEug1\ priority 191\ settingsByView chain:spectrum=on\ shortLabel $o_db Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetMacEug1\ type bed 3\ visibility hide\ visibilityViewDefaults chain=pack net=dense\ chainNetLoxAfr3 Elephant Chain/Net bed 3 Elephant (Jul. 2009 (Broad/loxAfr3)), Chain and Net Alignments 0 191 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of elephant (Jul. 2009 (Broad/loxAfr3)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ elephant and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ elephant assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best elephant/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The elephant sequence used in this annotation is from\ the Jul. 2009 (Broad/loxAfr3) (loxAfr3) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the elephant/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single elephant chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb loxAfr3\ priority 191\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetLoxAfr3\ type bed 3\ visibility hide\ chainNetLoxAfr3Viewchain Chain bed 3 Elephant (Jul. 2009 (Broad/loxAfr3)), Chain and Net Alignments 3 191 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetLoxAfr3\ shortLabel Chain\ spectrum on\ track chainNetLoxAfr3Viewchain\ view chain\ visibility pack\ macaca Macaca Blastz maf Human/Rhesus Blastz (panTro1 - macaca mulatta trimmed reads) Reciprocal Best 0 191 0 0 0 127 127 127 0 0 0 compGeno 0 group compGeno\ longLabel Human/Rhesus Blastz (panTro1 - macaca mulatta trimmed reads) Reciprocal Best\ priority 191\ shortLabel Macaca Blastz\ track macaca\ type maf\ visibility hide\ chainNetLoxAfr3Viewnet Net bed 3 Elephant (Jul. 2009 (Broad/loxAfr3)), Chain and Net Alignments 2 191 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetLoxAfr3\ shortLabel Net\ track chainNetLoxAfr3Viewnet\ view net\ visibility full\ syntenyNetCanFam1 Dog Syntenic Net netAlign canFam1 chainCanFam1 Dog (July 2004 (Broad/canFam1)) Syntenic Alignment Net 0 195.3 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Syntenic Alignment Net\ otherDb canFam1\ priority 195.3\ shortLabel Dog Syntenic Net\ spectrum on\ track syntenyNetCanFam1\ type netAlign canFam1 chainCanFam1\ visibility hide\ chainCanHg12 canHg12 Chain chain canHg12 canHg12 (canHg12) Chained Alignments 0 200 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Chained Alignments\ otherDb canHg12\ priority 200\ shortLabel $o_Organism Chain\ spectrum on\ track chainCanHg12\ type chain canHg12\ visibility hide\ chainBorEut13 borEut13 Chain chain borEut13 borEut13 (borEut13) Chained Alignments 0 200 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Chained Alignments\ otherDb borEut13\ priority 200\ shortLabel $o_Organism Chain\ spectrum on\ track chainBorEut13\ type chain borEut13\ visibility hide\ regPotential2X 2x Reg Potential wig -0.0732 0.153 2-Way Regulatory Potential - Human (hg16), Mouse (Oct. 2003/mm4) 0 200 0 128 255 255 128 0 0 0 0

    Description

    \ \

    This track displays the 2-way regulatory potential (RP) score computed from\ alignments of human (hg16, Jul. '03) and mouse (mm4, Oct. '03). A 3-way RP\ score computed from alignments of human, mouse and rat is also available on\ this browser.

    \ \

    RP scores compare frequencies of short alignment patterns between known\ regulatory elements and neutral DNA. Preliminary results from a calibration\ study investigating sensitivity and specificity of 2-way RP scores on the\ hemoglobin beta gene cluster suggest the use of a threshold just above 0 for\ identifying new putative regulatory elements.

    \ \

    The default viewing range for this track is from 0.00 to 0.01\ (score values below the 0.00 default lower limit indicate resemblance to \ alignment patterns typical of neutral DNA, while score values above the 0.01 \ default upper limit indicate very\ marked resemblance to alignment patterns typical of regulatory elements in the\ training set). The range of RP scores from 0.00 to 0.01 contains the prediction\ threshold suggested by calibration studies, and provides an effective\ visualization of the score for most genomic loci. However, the user can specify\ different viewing ranges if desired. Note: Absence of a score value at a given\ location indicates lack of a 2-way alignment.

    \ \

    This track may be configured in a variety of ways to highlight different\ aspects of the displayed information. Click the \ Graph \ configuration help link for an explanation of the configuration options.

    \ \

    Methods

    \ \

    The comparison employs log-ratios of transitionu probabilities from two\ Markov models. Training the score entails selecting appropriate alphabet\ (alignment column symbols) and order (length of the patterns = order + 1) for\ the Markov models, and estimating their transition probabilities, based on\ alignment data from known regulatory elements and ancestral repeats. The 2-way\ RP score uses a 5-symbol alphabet and order 5.

    \ \

    In the track, score values are displayed using a system of overlapping\ windows of size 100 bp along aligned portions of the human sequence Log-ratios\ are added over positions in a window, and the sum normalized for length.

    \ \

    Credits

    \ \

    Work on RP scores is performed by members of the Comparative Genomics and\ Bioinformatics Center at Penn State University. More information on this\ research and the collection of known regulatory elements used in training the\ score can be found at this site.

    \ \

    Mouse sequence data were provided by the Mouse Sequencing Consortium. The\ alignment data were created in collaboration with the UCSC Genome\ Bioinformatics group.

    \ \

    References

    \ \

    Elnitski L, Hardison R, Li J, Yang S, Kolbe D, Eswara P, O'Connor M,\ Schwartz S, Miller W, Chiaromonte F. \ Distinguishing regulatory DNA from neutral sites. \ Genome Res. 2003 Jan;13(1):64-72.

    \ \

    Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D,\ Miller W. \ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ regulation 0 altColor 255,128,0\ autoScale Off\ color 0,128,255\ graphTypeDefault Bar\ gridDefault OFF\ group regulation\ longLabel 2-Way Regulatory Potential - Human (hg16), Mouse (Oct. 2003/mm4)\ maxHeightPixels 128:36:16\ priority 200\ shortLabel 2x Reg Potential\ spanList 5\ track regPotential2X\ type wig -0.0732 0.153\ viewLimits 0.0:0.01\ visibility hide\ regPotential3X 3x Reg Potential wig -0.0983 0.210 3-Way Regulatory Potential - Human, Mouse (Feb. 2003/mm3), Rat (June 2003/rn3) 0 200 0 128 255 255 128 0 0 0 0

    Description

    \ \

    This track displays the 3-way regulatory potential (RP) score computed from\ alignments of human (hg16, Jul. '03), mouse (mm3, Feb. '03) and rat (rn3, Jun.\ '03). A 2-way RP score, computed from alignments of human and mouse only is\ also available on this browser.

    \ \

    RP scores compare frequencies of short alignment patterns between known\ regulatory elements and neutral DNA. Preliminary results from a calibration study\ investigating sensitivity and specificity of 3-way RP scores on the hemoglobin\ beta gene cluster suggest the use of a threshold ~0.0006 for identifying new\ putative regulatory elements.

    \ \

    The default viewing range for this track is from 0.00 to 0.01\ (score values below the 0.00 default indicate resemblance to alignment patterns\ typical of neutral DNA, while score values above the 0.01 default indicate very\ marked resemblance to alignment patterns typical of regulatory elements in the\ training set). The range of RP scores from 0.00 to 0.01 contains the prediction\ threshold suggested by calibration studies, and provides an effective\ visualization of the score for most genomic loci. However, the user can specify\ different viewing ranges if desired. Note: Absence of a score value at a given\ location indicates lack of 3-way alignment.

    \ \

    This track may be configured in a variety of ways to highlight different\ aspects of the displayed information. Click the \ Graph \ configuration help link for an explanation of the configuration options.

    \ \

    Methods

    \ \

    The comparison employs log-ratios of transitions probabilities from two\ Markov models. Training the score entails selecting appropriate alphabet\ (alignment column symbols) and order (length of the patterns = order + 1) for\ the Markov models, and estimating their transition probabilities, based on\ alignment data from known regulatory elements and ancestral repeats. The 3-way\ RP score uses a 10-symbol alphabet and order 2.

    \ \

    In the track, score values are displayed using a system of overlapping\ windows of size 100 bp along aligned portions of the human sequence. Log-ratios\ are added over positions in a window, and the sum is normalized for length.

    \ \

    Credits

    \ \

    Work on RP scores is performed by members of the Comparative Genomics and\ Bioinformatics Center at Penn State University. More information on this\ research and the collection of known regulatory elements used in training the\ score can be found at this site.

    \ \

    Mouse and rat sequence data were provided by the Mouse and Rat Sequencing\ Consortia. The alignment data were created in collaboration with the UCSC\ Genome Bioinformatics group.

    \ \

    References

    \ \

    Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch\ R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W. \ Aligning Multiple Genomic Sequences with the Threaded Blockset \ Aligner. \ Genome Res. 2004 Apr;14(4):708-15.

    \ \

    Kolbe D, Taylor J, Elnitski L, Eswara P, Li J, Miller W, Hardison RC,\ Chiaromonte F.\ Regulatory potential scores from genome-wide three-way\ alignments of human, mouse, and rat.\ Genome Res. 2004 Apr;14(4):700-7.

    \ \ regulation 0 altColor 255,128,0\ autoScale Off\ color 0,128,255\ graphTypeDefault Bar\ gridDefault OFF\ group regulation\ longLabel 3-Way Regulatory Potential - Human, Mouse (Feb. 2003/mm3), Rat (June 2003/rn3)\ maxHeightPixels 128:36:16\ priority 200\ shortLabel 3x Reg Potential\ spanList 5\ track regPotential3X\ type wig -0.0983 0.210\ viewLimits 0.0:0.01\ visibility hide\ regPotential7X 7X Reg Potential wig 0.0 1.0 ESPERR Regulatory Potential (7 Species) 0 200 0 128 255 255 128 0 0 0 0

    Description

    \

    \ This track displays regulatory potential (RP) scores computed from\ alignments of human, chimpanzee (panTro2), macaque (rheMac2), mouse (mm8), rat\ (rn4), dog (canFam2), and cow (bosTau2).

    \ \

    RP scores compare frequencies of short alignment patterns between known\ regulatory elements and neutral DNA. \ The sensitivity and specificity of RP scores were calibrated on the\ hemoglobin beta gene cluster. These results suggest a threshold of ~0.00 \ for the identification of new putative regulatory elements.

    \ \

    The default viewing range for this track is from 0.0 to 0.1.\ Score values below the 0.0 default lower limit indicate resemblance to \ alignment patterns\ typical of neutral DNA, while score values above the 0.1 default upper limit \ indicate very\ marked resemblance to alignment patterns typical of regulatory elements in the\ training set. The range of RP scores from 0.0 to 0.1 contains the prediction\ threshold suggested by calibration studies, and provides an effective\ visualization of the score for most genomic loci. However, the user can specify\ different viewing ranges if desired. Note: Absence of a score value at a given\ location indicates lack of sufficient alignment -- scores are computed\ for all regions of the reference genome in which no region of more than\ 100 bases lacks alignment in at least three non-human species.

    \ \

    This track may be configured in a variety of ways to highlight different\ aspects of the displayed information. Click the \ Graph \ configuration help link for an explanation of the configuration options.

    \ \

    Methods

    \ \

    The comparison employs log-ratios of transitions probabilities from two\ variable order Markov models. Training the score entails selecting\ appropriate alphabet (alignment column symbols) and maximal order (length of\ the longest patterns = order + 1) for the Markov models, and estimating their\ transition probabilities, based on alignment data from known\ regulatory elements and ancestral repeats. The scores in this track \ are computed using a maximal order of 2.

    \ \

    In the track, score values are displayed using a system of overlapping\ windows of size 100 bp along sufficiently alignable portions of the\ human sequence. Log-ratios are added over positions in a window, and\ the sum is normalized for length.

    \ \

    Credits

    \ \

    Work on RP scores is performed by members of the \ Comparative Genomics and\ Bioinformatics Center at Penn State University. More information on this\ research and the collection of known regulatory elements used in training the\ score can be found at this site.

    \ \

    References

    \ \

    King DC, Taylor J, Elnitski L, Chiaromonte F, Miller W, Hardison\ RC. \ Evaluation of regulatory potential and conservation scores for \ detecting cis-regulatory modules in aligned mammalian genome sequences.. \ Genome Res. 2005 Aug;15(8):1051-60.

    \ \

    Kolbe D, Taylor J, Elnitski L, Eswara P, Li J, Miller W, Hardison RC,\ Chiaromonte F. \ Regulatory potential scores from genome-wide three-way\ alignments of human, mouse, and rat. \ Genome Res. 2004 Apr;14(4):700-7.

    \ regulation 0 altColor 255,128,0\ autoScale Off\ color 0,128,255\ graphTypeDefault Bar\ gridDefault OFF\ group regulation\ longLabel ESPERR Regulatory Potential (7 Species)\ maxHeightPixels 128:36:16\ priority 200\ shortLabel 7X Reg Potential\ spanList 1\ track regPotential7X\ type wig 0.0 1.0\ viewLimits 0.0:0.10\ visibility hide\ windowingFunction mean\ blastSacCer1SG Yeast Proteins psl protein Yeast Proteins from SGD Mapped by Chained tBLASTn 0 200 0 0 0 127 127 127 0 0 0 http://db.yeastgenome.org/cgi-bin/SGD/locus.pl?locus= genes 1 blastRef sacCer1.blastSGRef00\ colorChromDefault off\ group genes\ longLabel Yeast Proteins from SGD Mapped by Chained tBLASTn\ pred sacCer1.blastSGPep00\ priority 200\ shortLabel Yeast Proteins\ track blastSacCer1SG\ type psl protein\ url http://db.yeastgenome.org/cgi-bin/SGD/locus.pl?locus=\ visibility hide\ cgapSage CGAP SAGE bed 8 + CGAP Long SAGE 0 200.1 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This track displays genomic mappings for human LongSAGE tags from the\ The Cancer Genome Anatomy\ Project. SAGE (Serial Analysis of Gene Expression) [Velculescu 1995] is a\ quantitative technique for measuring gene expression. For a brief overview\ of SAGE, see the CGAP SAGE information page.\

    \ \

    Display Conventions and Configuration

    \

    \ Genomic mappings of 17-base LongSAGE tags are displayed. Tag counts are\ normalized to tags per million (TPM) in each tissue or library. Tags with higher TPM are\ more darkly shaded. The CATG restriction site before the start of the\ tag is rendered as a thick line; the 17 bases of the tag are drawn as a thinner\ line. Thus the thin end of the tag points in the direction of transcription.\ \ The track display modes are:\

      \
    • dense - Draws locations of mapped tags on a single line.\
    • squish - Draws one item per tag per library without labels.\
    • pack - Draws one item per tag per tissue with labels. The label\ includes the number of libraries of each tissue type containing the tag.\ Clicking on an item lists the libraries containing the tag, with the libraries\ from the selected tissue in bold. Clicking on a library in the list\ displays detailed information about that library. \
    • full - Draws one item per tag per library.\ Clicking on an item displays information about the library, along with other\ libraries containing the tag.\
    \

    \

    \ The track can be configured to display only tags from a selected tissue.\

    \ \

    Methods

    \

    \ Tag and library data, along with genomic mappers, were obtained\ from The Cancer Genome Anatomy Project.\

    \

    \ Information about the various SAGE libraries, data downloads and other tools\ for exploring and analyzing these data is available from the \ CGAP SAGE Genie web\ site.\

    \ \

    Mapping SAGE tags to the human genome

    \

    \ The goal of the SAGE tag mapping is to identify the genomic\ loci of the associated mRNAs. Since it is impossible to disambiguate tags\ that map to multiple loci, only unique genomic mappings are kept. To compensate\ for polypmorphisms between the reference genome and the mRNA libraries, \ SNPs are considered by the mapping algorithm.\

    \

    \ For each position in the genome on both strands, all\ possible 21-mers, given all combinations of SNPs, were considered. The 21-mers\ beginning with CATG were generated for use in mapping. Only 21-mers\ that were unique across the genome were used in placing SAGE tags.\

    \

    \ Only SNPs from dbSNP with the following characteristics were used:\

      \
    • single-base\
    • maps to a single genomic location\
    • reference allele matches reference genome\
    • does not occur in a tandem repeat\
    \

    \

    Human embryonic stem cell (ESC) library construction

    \

    \ Detailed information regarding the human ESC lines used in this study can be\ found at http://stemcells.nih.gov and in Hirst, et al. 2007.\ The ESC tags were generated from RNA purified from human ESCs maintained under\ conditions that promote their maintenance in an undifferentiated state.\

    \ \

    \ A complete set of embryonic stem cell LongSAGE tags is available through the\ CGAP web portal.\

    \ \

    Credits

    \

    \ Many thanks to Martin Hirst of Canada's Michael Smith Genome Sciences Centre for his\ assistance in developing this track.\

    \ \

    \ The LongSAGE data and genomic mappings were provided by the \ The Cancer Genome Anatomy\ Project of the National\ Cancer Institute, U.S. National\ Institutes of Health.\

    \ \

    The human embryonic stem cell library was supported by funds from the\ National Cancer Institute, National Institutes of Health, under Contract\ No. N01-C0-12400 and by grants from Genome Canada, Genome British Columbia and\ the Canadian Stem Cell Network.\

    \ \

    References

    \ \

    \ Velculescu VE, Zhang L, Vogelstein B, Kinzler KW.\ Serial analysis of gene expression. Science. 1995\ Oct 20;270(5235):484-7. \

    \ \

    \ Hirst M, Delaney A, Rogers SA, Schnerch A, Persaud DR, O'Connor MD, Zeng T,\ Moksa M, Fichter K, Mah D, et al. \ LongSAGE profiling of nine human embryonic stem cell lines.\ Genome Biol. 2007 Jun 14;8(6):R113.\

    \ \

    \ Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW,\ Velculescu VE.\ Using the transcriptome to annotate the genome.\ Nat Biotechnol. 2002 May;20(5):508-12. \

    \ \

    \ Siddiqui AS, Khattra J, Delaney AD, Zhao Y, Astell C, Asano J, Babakaiff R,\ Barber S, Beland J, Bohacec S, et al.\ \ A mouse atlas of gene expression: Large-scale digital gene-expression profiles\ from precisely defined developing C57BL/6J mouse tissues and cells.\ Proc Natl Acad Sci U S A. 2005 Dec 20;102(51):18485-90. \

    \ \

    \ Khattra J, Delaney AD, Zhao Y, Siddiqui A, Asano J, McDonald H, Pandoh P,\ Dhalla N, Prabhu AL, Ma K, et al.\ \ Large-scale production of SAGE libraries from microdissected tissues,\ flow-sorted cells, and cell lines.\ Genome Res. 2007 Jan;17(1):108-16. \

    \ \

    \ Lal A, Lash AE, Altschul SF, Velculescu V, Zhang L, McLendon RE, Marra MA,\ Prange C, Morin PJ, Polyak K, et al.\ A public database for gene expression in human cancers.\ Cancer Res. 1999 Nov 1;59(21):5403-7.\

    \ \

    \ Riggins GJ, Strausberg RL. \ Genome and genetic resources from the Cancer Genome Anatomy\ Project. Hum Mol Genet. 2001 Apr;10(7):663-7.\

    \ \

    \ Boon K, Osorio EC, Greenhut SF, Schaefer CF, Shoemaker J, Polyak K, Morin PJ,\ Buetow KH, Strausberg RL, De Souza SJ, Riggins GJ.\ \ An anatomy of normal and malignant gene expression.\ Proc Natl Acad Sci U S A. 2002 Aug 20;99(17):11287-92.\

    \ \

    \ Liang P.\ \ SAGE Genie: a suite with panoramic view of gene expression.\ Proc Natl Acad Sci U S A. 2002 Sep 3;99(18):11547-8.\

    \ rna 1 group rna\ longLabel CGAP Long SAGE\ priority 200.1\ shortLabel CGAP SAGE\ track cgapSage\ type bed 8 +\ visibility hide\ blastzBestMm3X Mm3X Best psl xeno mm3X Mm3X Blastz Best-in-Genome Alignments 0 200.8 0 0 0 127 127 127 1 0 0 This track is created by running simpleChain on the\ results from running axtBest on the alignments from the\ standard blastz process.\ x 1 group x\ longLabel Mm3X Blastz Best-in-Genome Alignments\ otherDb mm3X\ priority 200.8\ shortLabel Mm3X Best\ spectrum on\ track blastzBestMm3X\ type psl xeno mm3X\ visibility hide\ blastDm1FB D. mel. Proteins psl protein D. melanogaster Proteins 0 201 0 0 0 127 127 127 0 0 0 http://flybase.bio.indiana.edu/.bin/fbidq.html?

    Description

    \

    \ This track contains tBLASTn alignments of the peptides\ from the predicted and known genes identified in the D. melanogaster\ FlyBase as of 24 July 2004.

    \ \

    Methods

    \

    \ First, the predicted proteins from the D. melanogaster FlyBase track \ were aligned with the D. melanogaster genome using the blat program \ to discover exon boundaries. \ Next, the amino acid sequences that make up each exon were aligned with the \ human sequence using the tBLASTn program.\ Finally, the putative human exons were chained together using an \ organism-specific maximum gap size but no gap penalty. The single best exon \ chains extending over more than 60% of the query protein were included. Exon \ chains that extended over 60% of the query and matched at least 60% of the \ protein's amino acids were also included.

    \ \

    Credits

    \

    \ tBLASTn is part of the NCBI Blast tool set. For more information on Blast, see\ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.\ Basic local alignment search tool.\ J Mol Biol. 1990 Oct 5;215(3):403-410.

    \

    \ Blat was written by Jim Kent. The remaining utilities \ required to produce this track were written by Jim Kent or Brian Raney.

    \ genes 1 blastRef dm1.blastFBRef00\ colorChromDefault off\ group genes\ longLabel D. melanogaster Proteins\ pred dm1.blastFBPep00\ priority 201\ shortLabel D. mel. Proteins\ track blastDm1FB\ type psl protein\ url http://flybase.bio.indiana.edu/.bin/fbidq.html?\ visibility hide\ blastzBestMm2X Mm2X Best psl xeno mm2X Mm2X Blastz Best-in-Genome Alignments 0 201.8 0 0 0 127 127 127 1 0 0 This track is created by running simpleChain on the\ results from running axtBest on the alignments from the\ standard blastz process.\ x 1 group x\ longLabel Mm2X Blastz Best-in-Genome Alignments\ otherDb mm2X\ priority 201.8\ shortLabel Mm2X Best\ spectrum on\ track blastzBestMm2X\ type psl xeno mm2X\ visibility hide\ hetChimp Chimp Heterozygosity wig 0.0 7.3 Chimp (Nov. 2003/panTro1) Single Nucleotide Mutation Rate %, 10Kb Bins 0 207 0 128 255 255 128 0 0 0 0

    Description

    \

    \ This track shows the fixed alignment differences between the Chimp and Human\ genomes in percent difference in 10K base windows. The differences were\ detected by using ssahaSNP (1) to align all chimp reads to the human assembly.\ Uniquely placed reads were scanned for high quality single nucleotide\ differences; indels were not considered. For a 10 kb bin calculation,\ the number of single nucleotide differences are summed and this is divided\ by the number of aligned bases that match at the same quality thresholds\ used for variation detection. Measured in this way, the two genomes differ\ on average by 1.2%.\

    \ The analysis for the track was prepared by Jim Mullikin.
    \ Chimp trace data was generated by WIBR and WUGSC though an\ NHGRI sequencing grant and obtained from the\ \ NCBI Trace Archive.\

    \ The track display was prepared by Hiram Clawson.\ (\ hiram@soe.ucsc.\ edu)\

    \ Known bugs: The data scale display to the left of the track\ does not function correctly when auto-scaling is selected.\ And it only show integers even though fractional levels are requested.\


    \ 1) Ning Z, Cox AJ, Mullikin JC, SSAHA: A Fast Search Method\ \ for Large DNA Databases, Genome Res. 2001 Oct;11(10):1725-9.\

    \ compGeno 0 altColor 255,128,0\ autoScale Off\ color 0,128,255\ group compGeno\ longLabel Chimp (Nov. 2003/panTro1) Single Nucleotide Mutation Rate %, 10Kb Bins\ maxHeightPixels 128\ priority 207\ shortLabel Chimp Heterozygosity\ track hetChimp\ type wig 0.0 7.3\ visibility hide\ chainPt0 Pt0 Chain chain seq Chimp (Nov. 2003/pt0) Scaffold Chained Alignments 0 208 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel Chimp (Nov. 2003/pt0) Scaffold Chained Alignments\ otherDb pt0\ priority 208\ shortLabel Pt0 Chain\ spectrum on\ track chainPt0\ type chain seq\ visibility hide\ chainNetMm5 mm5 Chain/Net bed 3 mm5 (mm5), Chain and Net Alignments 0 209 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of mm5 (mm5) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ mm5 and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ mm5 assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best mm5/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The mm5 sequence used in this annotation is from\ the mm5 (mm5) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the mm5/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single mm5 chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb mm5\ priority 209\ settingsByView chain:spectrum=on\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetMm5\ type bed 3\ visibility hide\ visibilityViewDefaults chain=pack net=dense\ netRxBestPt0 Pt0 Net netAlign seq chainPt0 Chimp (Nov. 2003/pt0) Reciprocal Best Net (Colored by Scaffold) 0 209 0 0 0 127 127 127 1 0 0

    Description

    \

    \ This track shows the "reciprocal best" human/panTro0 \ alignment net.\ It is useful for finding orthologous regions and for studying genome\ rearrangement.

    \ \

    Display Conventions and Configuration

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \

    \ Note: The panTro0 data set used to generate this track is based on \ scaffolds rather than chromosomes. Because of this, the track coloring scheme \ does not correspond to the chromosome color key displayed on the track page. \ A somewhat random scheme (scaffold# modulo #colors) was used to color this\ track simply to make the scaffold boundaries easier to distinguish.

    \ \

    Methods

    \

    \ These alignments were generated using blastz and blat alignments of \ panTro0 genomic sequence from the 13 Nov. 2003 Arachne panTro0 draft \ assembly. The initial alignments were generated using blastz on \ repeatmasked sequence using the following human/panTro0 scoring \ matrix:\

    \
         A    C    G    T\
    A   100 -300 -150 -300\
    C  -300  100 -300 -150\
    G  -150 -300  100 -300\
    T  -300 -150 -300  100\
    \
    K = 4500, L = 3000,  Y=3400, H=2000\
    

    \

    \ The resulting alignments were processed by the axtChain program, which\ organizes all the alignments between a single panTro0 scaffold\ and a single human chromosome into a group and makes a kd-tree out\ of all the gapless subsections (blocks) of the alignments.\ The maximally-scoring chains of these blocks were found by running a\ dynamic program over the kd-tree. Chains scoring below a certain\ threshold were discarded.

    \

    \ To place additional panTro0 scaffolds that weren't initially aligned by \ blastz, a DNA blat of the unmasked sequence was performed. The resulting\ blat alignments were also chained, and then merged with the\ blastz-based chains produced in the previous step to produce "all \ chains".

    \

    \ These chaines were sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain.

    \

    \ Due to the draft nature of this initial genome assembly,\ this net track (and the companion chain track) was generated using\ a "reciprocal best" strategy. This strategy attempts to minimize\ paralog fill-in for missing orthologous panTro0 sequence by filtering\ from the human net all sequences not found in the panTro0 side of the\ net. After generating the human alignment net, \ the subset of chains in the panTro0 reference net was extracted\ and used for an additional netting step, which was then filtered\ for non-syntenic sequences smaller than 50 bases.

    \ \

    Credits

    \

    \ The panTro0 sequence used in this track was obtained from the 13 Nov. 2003\ Arachne assembly. We'd like to thank the National Human Genome Research \ Institute (NHGRI), the Eli & Edythe L. Broad Institute at MIT/Harvard, and \ Washington University School of Medicine for providing this sequence.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ Blastz was developed at Pennsylvania State University by\ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

    \ \

    References

    \

    \ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

    \

    \ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

    \ \ \ \ compGeno 0 group compGeno\ longLabel Chimp (Nov. 2003/pt0) Reciprocal Best Net (Colored by Scaffold)\ otherDb panTro0\ priority 209\ shortLabel Pt0 Net\ spectrum on\ track netRxBestPt0\ type netAlign seq chainPt0\ visibility hide\ rhb100i96 Recent Human bed 4 + Conserved w/ prob>0 of increased substitution rate in human 1 210 0 0 0 127 127 127 0 0 0 \ Recent Human non-CpG Substitutions and Indels\ hg16-mm3-rn3-pt0
    \ Collected and merged blocks of 100 bp 96% id between mm-rn-pt.

    \ Compared human to blocks.
    \ Computed probability of rejecting the null hypothesis of equal substitution rate in whole tree
    \ * using a phyloHMM fit on phastCons conserverd regions
    \ * alternative hypothesis: human branch has faster rate
    \ * deleted all columns with gap characters from probability computation, maintaining only columns with a base in all four species.

    \ Only blocks with probability > 0 of rejecting the null hypotheses are included.\ \ compGeno 1 group compGeno\ longLabel Conserved w/ prob>0 of increased substitution rate in human\ priority 210\ shortLabel Recent Human\ track rhb100i96\ type bed 4 +\ visibility dense\ chainNetPanTro1 panTro1 Chain/Net bed 3 Chimp (Nov. 2003 (CGSC 1.1/panTro1)), Chain and Net Alignments 0 210.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of chimp (Nov. 2003 (CGSC 1.1/panTro1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ chimp and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ chimp assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best chimp/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The chimp sequence used in this annotation is from\ the Nov. 2003 (CGSC 1.1/panTro1) (panTro1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the chimp/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single chimp chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A90-330-236-356
    C-330100-318-236
    G-236-318100-330
    T-356-236-33090

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 90,-330,-236,-356,-330,100,-318,-236,-236,-318,100,-330,-356,-236,-330,90\ matrixHeader A, C, G, T\ noInherit on\ otherDb panTro1\ priority 210.3\ shortLabel $o_db Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetPanTro1\ type bed 3\ visibility hide\ chainNetCaePb1 C. brenneri Chain/Net bed 3 C. brenneri (Jan. 2007 (WUGSC 4.0/caePb1)), Chain and Net Alignments 0 210.3 0 0 0 255 255 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of C. brenneri (Jan. 2007 (WUGSC 4.0/caePb1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ C. brenneri and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ C. brenneri assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best C. brenneri/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The C. brenneri sequence used in this annotation is from\ the Jan. 2007 (WUGSC 4.0/caePb1) (caePb1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the C. brenneri/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single C. brenneri chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 255,255,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb caePb1\ priority 210.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetCaePb1\ type bed 3\ visibility hide\ chainNetCaePb2 C. brenneri Chain/Net bed 3 C. brenneri (Feb. 2008 (WUGSC 6.0.1/caePb2)), Chain and Net Alignments 0 210.3 0 0 0 255 255 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of C. brenneri (Feb. 2008 (WUGSC 6.0.1/caePb2)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ C. brenneri and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ C. brenneri assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best C. brenneri/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The C. brenneri sequence used in this annotation is from\ the Feb. 2008 (WUGSC 6.0.1/caePb2) (caePb2) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the C. brenneri/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single C. brenneri chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 255,255,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb caePb2\ priority 210.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetCaePb2\ type bed 3\ visibility hide\ chainNetPanTro2 Chimp Chain/Net bed 3 Chimp (Mar. 2006 (CGSC 2.1/panTro2)), Chain and Net Alignments 0 210.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of chimp (Mar. 2006 (CGSC 2.1/panTro2)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ chimp and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ chimp assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best chimp/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The chimp sequence used in this annotation is from\ the Mar. 2006 (CGSC 2.1/panTro2) (panTro2) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the chimp/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single chimp chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A90-330-236-356
    C-330100-318-236
    G-236-318100-330
    T-356-236-33090

    \ \ \ Chains scoring below a minimum score of '5000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 5000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 90,-330,-236,-356,-330,100,-318,-236,-236,-318,100,-330,-356,-236,-330,90\ matrixHeader A, C, G, T\ noInherit on\ otherDb panTro2\ priority 210.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetPanTro2\ type bed 3\ visibility hide\ chainNetCaePb1Viewchain Chain bed 3 C. brenneri (Jan. 2007 (WUGSC 4.0/caePb1)), Chain and Net Alignments 3 210.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCaePb1\ shortLabel Chain\ spectrum on\ track chainNetCaePb1Viewchain\ view chain\ visibility pack\ chainNetCaePb2Viewchain Chain bed 3 C. brenneri (Feb. 2008 (WUGSC 6.0.1/caePb2)), Chain and Net Alignments 3 210.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCaePb2\ shortLabel Chain\ spectrum on\ track chainNetCaePb2Viewchain\ view chain\ visibility pack\ chainNetPanTro1Viewchain Chain bed 3 Chimp (Nov. 2003 (CGSC 1.1/panTro1)), Chain and Net Alignments 3 210.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetPanTro1\ shortLabel Chain\ spectrum on\ track chainNetPanTro1Viewchain\ view chain\ visibility pack\ chainNetPanTro2Viewchain Chain bed 3 Chimp (Mar. 2006 (CGSC 2.1/panTro2)), Chain and Net Alignments 3 210.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetPanTro2\ shortLabel Chain\ spectrum on\ track chainNetPanTro2Viewchain\ view chain\ visibility pack\ chainNetCaePb1Viewnet Net bed 3 C. brenneri (Jan. 2007 (WUGSC 4.0/caePb1)), Chain and Net Alignments 1 210.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCaePb1\ shortLabel Net\ track chainNetCaePb1Viewnet\ view net\ visibility dense\ chainNetCaePb2Viewnet Net bed 3 C. brenneri (Feb. 2008 (WUGSC 6.0.1/caePb2)), Chain and Net Alignments 1 210.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCaePb2\ shortLabel Net\ track chainNetCaePb2Viewnet\ view net\ visibility dense\ chainNetPanTro1Viewnet Net bed 3 Chimp (Nov. 2003 (CGSC 1.1/panTro1)), Chain and Net Alignments 2 210.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetPanTro1\ shortLabel Net\ track chainNetPanTro1Viewnet\ view net\ visibility full\ chainNetPanTro2Viewnet Net bed 3 Chimp (Mar. 2006 (CGSC 2.1/panTro2)), Chain and Net Alignments 2 210.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetPanTro2\ shortLabel Net\ track chainNetPanTro2Viewnet\ view net\ visibility full\ allb100i96 chimp/mouse/rat conserved bed 4 + 100bp or longer, 96% or greater identical 1 215 0 0 0 127 127 127 0 0 0 \ Recent Human non-CpG Substitutions and Indels\ hg16-mm3-rn3-pt0
    \ Collected and merged blocks of 100 bp 96% id between mm-rn-pt.

    \ All 34,945 blocks are included (see rh-b100i96 track for blocks with human diffs).\ \ \ compGeno 1 group compGeno\ longLabel 100bp or longer, 96% or greater identical\ priority 215\ shortLabel chimp/mouse/rat conserved\ track allb100i96\ type bed 4 +\ visibility dense\ chainNetGorGor1 gorGor1 Chain/Net bed 3 Gorilla (Oct. 2008 (Sanger 0.1/gorGor1)), Chain and Net Alignments 0 220.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of gorilla (Oct. 2008 (Sanger 0.1/gorGor1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ gorilla and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ gorilla assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best gorilla/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The gorilla sequence used in this annotation is from\ the Oct. 2008 (Sanger 0.1/gorGor1) (gorGor1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the gorilla/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single gorilla chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb gorGor1\ priority 220.3\ shortLabel $o_db Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetGorGor1\ type bed 3\ visibility hide\ chainNetGorGor2 gorGor2 Chain/Net bed 3 gorGor2 (gorGor2), Chain and Net Alignments 0 220.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of gorGor2 (gorGor2) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ gorGor2 and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ gorGor2 assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best gorGor2/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The gorGor2 sequence used in this annotation is from\ the gorGor2 (gorGor2) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the gorGor2/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single gorGor2 chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb gorGor2\ priority 220.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetGorGor2\ type bed 3\ visibility hide\ chainNetCaeRem3 C. remanei Chain/Net bed 3 C. remanei (May 2007 (WUGSC 15.0.1/caeRem3)), Chain and Net Alignments 0 220.3 0 0 0 255 255 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of C. remanei (May 2007 (WUGSC 15.0.1/caeRem3)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ C. remanei and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ C. remanei assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best C. remanei/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The C. remanei sequence used in this annotation is from\ the May 2007 (WUGSC 15.0.1/caeRem3) (caeRem3) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the C. remanei/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single C. remanei chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 255,255,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb caeRem3\ priority 220.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetCaeRem3\ type bed 3\ visibility hide\ chainNetGorGor1Viewchain Chain bed 3 Gorilla (Oct. 2008 (Sanger 0.1/gorGor1)), Chain and Net Alignments 3 220.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetGorGor1\ shortLabel Chain\ spectrum on\ track chainNetGorGor1Viewchain\ view chain\ visibility pack\ chainNetGorGor2Viewchain Chain bed 3 gorGor2 (gorGor2), Chain and Net Alignments 3 220.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetGorGor2\ shortLabel Chain\ spectrum on\ track chainNetGorGor2Viewchain\ view chain\ visibility pack\ chainNetCaeRem3Viewchain Chain bed 3 C. remanei (May 2007 (WUGSC 15.0.1/caeRem3)), Chain and Net Alignments 3 220.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCaeRem3\ shortLabel Chain\ spectrum on\ track chainNetCaeRem3Viewchain\ view chain\ visibility pack\ chainNetGorGor1Viewnet Net bed 3 Gorilla (Oct. 2008 (Sanger 0.1/gorGor1)), Chain and Net Alignments 2 220.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetGorGor1\ shortLabel Net\ track chainNetGorGor1Viewnet\ view net\ visibility full\ chainNetGorGor2Viewnet Net bed 3 gorGor2 (gorGor2), Chain and Net Alignments 2 220.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetGorGor2\ shortLabel Net\ track chainNetGorGor2Viewnet\ view net\ visibility full\ chainNetCaeRem3Viewnet Net bed 3 C. remanei (May 2007 (WUGSC 15.0.1/caeRem3)), Chain and Net Alignments 1 220.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCaeRem3\ shortLabel Net\ track chainNetCaeRem3Viewnet\ view net\ visibility dense\ chainNetOviAri1 Sheep Chain/Net bed 3 Sheep (Feb. 2010 (ISGC Ovis_aries_1.0/oviAri1)), Chain and Net Alignments 0 230 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of sheep (Feb. 2010 (ISGC Ovis_aries_1.0/oviAri1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ sheep and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ sheep assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best sheep/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The sheep sequence used in this annotation is from\ the Feb. 2010 (ISGC Ovis_aries_1.0/oviAri1) (oviAri1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the sheep/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single sheep chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb oviAri1\ priority 230\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetOviAri1\ type bed 3\ visibility hide\ chainNetOviAri1Viewchain Chain bed 3 Sheep (Feb. 2010 (ISGC Ovis_aries_1.0/oviAri1)), Chain and Net Alignments 3 230 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetOviAri1\ shortLabel Chain\ spectrum on\ track chainNetOviAri1Viewchain\ view chain\ visibility pack\ chainNetOviAri1Viewnet Net bed 3 Sheep (Feb. 2010 (ISGC Ovis_aries_1.0/oviAri1)), Chain and Net Alignments 2 230 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetOviAri1\ shortLabel Net\ track chainNetOviAri1Viewnet\ view net\ visibility full\ chainNetCb3 C. briggsae Chain/Net bed 3 C. briggsae (Jan. 2007 (WUGSC 1.0/cb3)), Chain and Net Alignments 0 230.3 0 0 0 255 255 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of C. briggsae (Jan. 2007 (WUGSC 1.0/cb3)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ C. briggsae and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ C. briggsae assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best C. briggsae/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The C. briggsae sequence used in this annotation is from\ the Jan. 2007 (WUGSC 1.0/cb3) (cb3) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the C. briggsae/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single C. briggsae chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 255,255,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb cb3\ priority 230.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetCb3\ type bed 3\ visibility hide\ chainNetPonAbe2 Orangutan Chain/Net bed 3 Orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)), Chain and Net Alignments 0 230.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ orangutan and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ orangutan assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best orangutan/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The orangutan sequence used in this annotation is from\ the July 2007 (WUGSC 2.0.2/ponAbe2) (ponAbe2) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the orangutan/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single orangutan chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A90-330-236-356
    C-330100-318-236
    G-236-318100-330
    T-356-236-33090

    \ \ \ Chains scoring below a minimum score of '5000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 5000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 90,-330,-236,-356,-330,100,-318,-236,-236,-318,100,-330,-356,-236,-330,90\ matrixHeader A, C, G, T\ noInherit on\ otherDb ponAbe2\ priority 230.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetPonAbe2\ type bed 3\ visibility hide\ chainNetCb3Viewchain Chain bed 3 C. briggsae (Jan. 2007 (WUGSC 1.0/cb3)), Chain and Net Alignments 3 230.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCb3\ shortLabel Chain\ spectrum on\ track chainNetCb3Viewchain\ view chain\ visibility pack\ chainNetPonAbe2Viewchain Chain bed 3 Orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)), Chain and Net Alignments 3 230.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetPonAbe2\ shortLabel Chain\ spectrum on\ track chainNetPonAbe2Viewchain\ view chain\ visibility pack\ chainNetCb3Viewnet Net bed 3 C. briggsae (Jan. 2007 (WUGSC 1.0/cb3)), Chain and Net Alignments 1 230.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCb3\ shortLabel Net\ track chainNetCb3Viewnet\ view net\ visibility dense\ chainNetPonAbe2Viewnet Net bed 3 Orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)), Chain and Net Alignments 2 230.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetPonAbe2\ shortLabel Net\ track chainNetPonAbe2Viewnet\ view net\ visibility full\ chainNetDasNov1 dasNov1 Chain/Net bed 3 Armadillo (May 2005 (Broad/dasNov1)), Chain and Net Alignments 0 240 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of armadillo (May 2005 (Broad/dasNov1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ armadillo and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ armadillo assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best armadillo/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The armadillo sequence used in this annotation is from\ the May 2005 (Broad/dasNov1) (dasNov1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the armadillo/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single armadillo chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb dasNov1\ priority 240\ shortLabel $o_db Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetDasNov1\ type bed 3\ visibility hide\ chainNetDasNov1Viewchain Chain bed 3 Armadillo (May 2005 (Broad/dasNov1)), Chain and Net Alignments 3 240 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetDasNov1\ shortLabel Chain\ spectrum on\ track chainNetDasNov1Viewchain\ view chain\ visibility pack\ chainNetDasNov1Viewnet Net bed 3 Armadillo (May 2005 (Broad/dasNov1)), Chain and Net Alignments 2 240 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetDasNov1\ shortLabel Net\ track chainNetDasNov1Viewnet\ view net\ visibility full\ chainNetPapHam1 Baboon Chain/Net bed 3 Baboon (Nov. 2008 (Baylor 1.0/papHam1)), Chain and Net Alignments 0 240.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of baboon (Nov. 2008 (Baylor 1.0/papHam1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ baboon and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ baboon assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best baboon/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The baboon sequence used in this annotation is from\ the Nov. 2008 (Baylor 1.0/papHam1) (papHam1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the baboon/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single baboon chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A90-330-236-356
    C-330100-318-236
    G-236-318100-330
    T-356-236-33090

    \ \ \ Chains scoring below a minimum score of '5000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 5000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 90,-330,-236,-356,-330,100,-318,-236,-236,-318,100,-330,-356,-236,-330,90\ matrixHeader A, C, G, T\ noInherit on\ otherDb papHam1\ priority 240.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetPapHam1\ type bed 3\ visibility hide\ chainNetPapHam1Viewchain Chain bed 3 Baboon (Nov. 2008 (Baylor 1.0/papHam1)), Chain and Net Alignments 3 240.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetPapHam1\ shortLabel Chain\ spectrum on\ track chainNetPapHam1Viewchain\ view chain\ visibility pack\ chainNetPapHam1Viewnet Net bed 3 Baboon (Nov. 2008 (Baylor 1.0/papHam1)), Chain and Net Alignments 2 240.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetPapHam1\ shortLabel Net\ track chainNetPapHam1Viewnet\ view net\ visibility full\ chainHg16 Human Chain chain hg16 Human (July 2003 (NCBI34/hg16)) Chained Alignments 0 250 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Chained Alignments\ otherDb hg16\ priority 250\ shortLabel $o_Organism Chain\ spectrum on\ track chainHg16\ type chain hg16\ visibility hide\ chainHg15 Human Chain Hg15 chain hg15 Human (Apr. 2003 (NCBI33/hg15)) Chained Alignments 1 250 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Chained Alignments\ otherDb hg15\ priority 250\ shortLabel $o_Organism Chain Hg15\ spectrum on\ track chainHg15\ type chain hg15\ visibility dense\ fox2ClipSeqCompViewclusters Clusters bed 3 . FOX2 adaptor-trimmed CLIP-seq reads 3 250 0 0 0 127 127 127 0 0 0 regulation 1 parent fox2ClipSeqComp\ shortLabel Clusters\ track fox2ClipSeqCompViewclusters\ view clusters\ visibility pack\ fox2ClipSeqCompViewdensity Density bed 3 . FOX2 adaptor-trimmed CLIP-seq reads 2 250 0 0 0 127 127 127 0 0 0 regulation 1 parent fox2ClipSeqComp\ shortLabel Density\ track fox2ClipSeqCompViewdensity\ view density\ viewLimitsMax 0:2401\ visibility full\ fox2ClipSeqComp FOX2 CLIP-seq bed 3 . FOX2 adaptor-trimmed CLIP-seq reads 0 250 0 0 0 127 127 127 0 0 0

    Description

    \

    \ The FOX2 CLIP-seq track shows adaptor-trimmed CLIP-seq reads that mapped\ uniquely to the repeat-masked human genome (hg17). The reads were converted \ to hg18 coordinates using the UCSC LiftOver tool. Reads on the forward \ strand are displayed in blue; those on the reverse strand are shown in red.\

    \ \

    Methods

    \

    \ Cross-linking immunoprecipitation coupled with high-throughput\ sequencing (CLIP-seq) of cell type-specific splicing regulator FOX2\ (also known as RBM9) was performed in human embryonic stem cells.\ MosaikAligner was utilized to align the reads to the repeat-masked\ genome.\

    \

    \ Briefly, HUES6 human embryonic stem cells were treated with UV irradiation to \ stabilize in vivo protein-RNA interactions, followed by antibody-mediated \ precipitation of specific RNA-protein complexes. SDS-PAGE was then utilized \ to isolate protein-RNA adducts after RNA trimming with nuclease, 3'RNA linkers \ were ligated, and nucleotides were 5' end labeled with γ-32P-ATP. \ Recovered RNA was ligated to a 5' linker before amplification by RT-PCR. \ Both linkers were designed to be compatible with Illumina 1G genome analyzer \ sequencing. Approximately 4 million reads were uniquely mapped to the \ repeat-masked human genome by MosaikAligner.\

    \

    \ To identify CLIP clusters, we performed the following steps: (i) CLIP reads\ were associated with protein-coding genes as defined by the region from the\ annotated transcriptional start to the end of each gene locus. (ii) CLIP reads\ were separated into the categories of sense or antisense to the transcriptional\ direction of the gene. (iii) Sense CLIP reads were extended by 100 nt in the\ 5'-to-3' direction. The height of each nucleotide position is the number of\ reads that overlap that position. (iv) The count distribution of heights is as\ follows from 1, 2, ...h, ...H-1, H: {n1, \ n2, ...nh, ...nH-1,\ nH; N = Σni\ (i = 1:H)}. For a particular height, h, the associated \ probability of observing a height of at least h is \ Ph = Σni(i = h:H)\ /N. (v) We computed the background frequency after randomly placing the same number of \ extended reads within the gene for 100 iterations. This controls for the length of the gene\ and the number of reads. For each iteration, the count distribution and\ probabilities for the randomly placed reads (Ph,random) was generated as in\ step (iv). (vi) Our modified FDR for a peak height was computed as FDR(h) =\ (μh + σh)/Ph, where μh and \ σh is the average and s.d., respectively, of Ph,random \ across the 100 iterations. For each gene loci, we chose a threshold peak height h* \ as the smallest height equivalent to FDR(h*) < 0.001.\ We identified FOX2 binding clusters by grouping nucleotide positions satisfying\ h > h* and occurred within 50 nt of each other.\

    \ For further details of the method used to generate this annotation\ please refer to Yeo et al. (2009).\

    \ \

    Credits

    \

    \ Thanks to Gene Yeo at the University of California, San Diego for\ providing this annotation. For additional information on FOX2 CLIP-seq reads,\ please contact \ geneyeo@ucsd.\ edu directly.\ \

    \ \

    References

    \

    \ Yeo GW, Coufal NG, Liang YL, Peng GE, Fu XD, Gage FH. \ \ An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein\ interactions in stem cells. Nat. Struct. Mol. Biol. 2009 Jan 11;16:130-137.\

    \ regulation 1 compositeTrack on\ dataVersion January 2009\ group regulation\ longLabel FOX2 adaptor-trimmed CLIP-seq reads\ priority 250\ shortLabel FOX2 CLIP-seq\ subGroup1 view Tracks reads=Reads density=Density clusters=Clusters\ track fox2ClipSeqComp\ type bed 3 .\ visibility hide\ fox2ClipSeqCompViewreads Reads bed 3 . FOX2 adaptor-trimmed CLIP-seq reads 3 250 0 0 0 127 127 127 0 0 0 regulation 1 parent fox2ClipSeqComp\ shortLabel Reads\ track fox2ClipSeqCompViewreads\ view reads\ visibility pack\ blastHg18KG Human Proteins psl protein Human Proteins Mapped by Chained tBLASTn 3 250.2 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This track contains tBLASTn alignments of the peptides from the predicted and \ known genes identified in the hg18 UCSC Genes track.

    \ \

    Methods

    \ First, the predicted proteins from the human UCSC Genes track were aligned \ with the human genome using the Blat program to discover exon boundaries. \ Next, the amino acid sequences that make up each exon were aligned with the \ human sequence using the tBLASTn program.\ Finally, the putative human exons were chained together using an \ organism-specific maximum gap size but no gap penalty. The single best exon \ chains extending over more than 60% of the query protein were included. Exon \ chains that extended over 60% of the query and matched at least 60% of the \ protein's amino acids were also included.

    \ \

    Credits

    \

    \ tBLASTn is part of the NCBI BLAST tool set. For more information on BLAST, see\ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. \ Basic local alignment search tool. \ J Mol Biol. 1990 Oct 5;215(3):403-410.

    \

    \ Blat was written by Jim Kent. The remaining utilities \ used to produce this track were written by Jim Kent or Brian Raney.

    \ genes 1 blastRef hg18.blastKGRef04\ colorChromDefault off\ group genes\ longLabel Human Proteins Mapped by Chained tBLASTn\ pred hg18.blastKGPep04\ priority 250.2\ shortLabel Human Proteins\ track blastHg18KG\ type psl protein\ visibility pack\ chainNetRheMac1 rheMac1 Chain/Net bed 3 Rhesus (Jan. 2005 (Baylor Mmul_0.1/rheMac1)), Chain and Net Alignments 0 250.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of rhesus (Jan. 2005 (Baylor Mmul_0.1/rheMac1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ rhesus and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ rhesus assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best rhesus/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The rhesus sequence used in this annotation is from\ the Jan. 2005 (Baylor Mmul_0.1/rheMac1) (rheMac1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the rhesus/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single rhesus chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb rheMac1\ priority 250.3\ shortLabel $o_db Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetRheMac1\ type bed 3\ visibility hide\ chainNetCe6 ce6 Chain/Net bed 3 C. elegans (May 2008 (WS190/ce6)), Chain and Net Alignments 0 250.3 0 0 0 255 255 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of C. elegans (May 2008 (WS190/ce6)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ C. elegans and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ C. elegans assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best C. elegans/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The C. elegans sequence used in this annotation is from\ the May 2008 (WS190/ce6) (ce6) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the C. elegans/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single C. elegans chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 255,255,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb ce6\ priority 250.3\ shortLabel $o_db Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetCe6\ type bed 3\ visibility hide\ chainNetRheMac2 Rhesus Chain/Net bed 3 Rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)), Chain and Net Alignments 0 250.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ rhesus and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ rhesus assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best rhesus/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The rhesus sequence used in this annotation is from\ the Jan. 2006 (MGSC Merged 1.0/rheMac2) (rheMac2) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the rhesus/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single rhesus chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A90-330-236-356
    C-330100-318-236
    G-236-318100-330
    T-356-236-33090

    \ \ \ Chains scoring below a minimum score of '5000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 5000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 90,-330,-236,-356,-330,100,-318,-236,-236,-318,100,-330,-356,-236,-330,90\ matrixHeader A, C, G, T\ noInherit on\ otherDb rheMac2\ priority 250.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetRheMac2\ type bed 3\ visibility hide\ chainNetCe9 ce9 Chain/Net bed 3 ce9 (ce9), Chain and Net Alignments 0 250.3 0 0 0 255 255 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of ce9 (ce9) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ ce9 and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ ce9 assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best ce9/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The ce9 sequence used in this annotation is from\ the ce9 (ce9) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the ce9/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single ce9 chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 255,255,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb ce9\ priority 250.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetCe9\ type bed 3\ visibility hide\ chainNetRheMac1Viewchain Chain bed 3 Rhesus (Jan. 2005 (Baylor Mmul_0.1/rheMac1)), Chain and Net Alignments 3 250.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetRheMac1\ shortLabel Chain\ spectrum on\ track chainNetRheMac1Viewchain\ view chain\ visibility pack\ chainNetRheMac2Viewchain Chain bed 3 Rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)), Chain and Net Alignments 3 250.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetRheMac2\ shortLabel Chain\ spectrum on\ track chainNetRheMac2Viewchain\ view chain\ visibility pack\ chainNetCe6Viewchain Chain bed 3 C. elegans (May 2008 (WS190/ce6)), Chain and Net Alignments 3 250.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCe6\ shortLabel Chain\ spectrum on\ track chainNetCe6Viewchain\ view chain\ visibility pack\ chainNetCe9Viewchain Chain bed 3 ce9 (ce9), Chain and Net Alignments 3 250.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCe9\ shortLabel Chain\ spectrum on\ track chainNetCe9Viewchain\ view chain\ visibility pack\ blastHg17KG Hg17 Proteins psl protein Hg17 Proteins Mapped by Chained tBLASTn 3 250.3 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This track contains tBLASTn alignments of the peptides from the predicted and \ known genes identified in the hg17 Known Genes track.

    \ \

    Methods

    \ First, the predicted proteins from the human Known Genes track were aligned \ with the human genome using the blat program to discover exon boundaries. \ Next, the amino acid sequences that make up each exon were aligned with the \ human sequence using the tBLASTn program.\ Finally, the putative human exons were chained together using an \ organism-specific maximum gap size but no gap penalty. The single best exon \ chains extending over more than 60% of the query protein were included. Exon \ chains that extended over 60% of the query and matched at least 60% of the \ protein's amino acids were also included.

    \ \

    Credits

    \

    \ tBLASTn is part of the NCBI Blast tool set. For more information on Blast, see\ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. \ Basic local alignment search tool. \ J Mol Biol. 1990 Oct 5;215(3):403-410.

    \

    \ Blat was written by Jim Kent. The remaining utilities \ used to produce this track were written by Jim Kent or Brian Raney.

    \ genes 1 blastRef hg17.blastKGRef01\ colorChromDefault off\ group genes\ longLabel Hg17 Proteins Mapped by Chained tBLASTn\ pred hg17.blastKGPep01\ priority 250.3\ shortLabel Hg17 Proteins\ track blastHg17KG\ type psl protein\ visibility pack\ chainNetRheMac1Viewnet Net bed 3 Rhesus (Jan. 2005 (Baylor Mmul_0.1/rheMac1)), Chain and Net Alignments 2 250.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetRheMac1\ shortLabel Net\ track chainNetRheMac1Viewnet\ view net\ visibility full\ chainNetRheMac2Viewnet Net bed 3 Rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)), Chain and Net Alignments 2 250.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetRheMac2\ shortLabel Net\ track chainNetRheMac2Viewnet\ view net\ visibility full\ chainNetCe6Viewnet Net bed 3 C. elegans (May 2008 (WS190/ce6)), Chain and Net Alignments 1 250.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCe6\ shortLabel Net\ track chainNetCe6Viewnet\ view net\ visibility dense\ chainNetCe9Viewnet Net bed 3 ce9 (ce9), Chain and Net Alignments 1 250.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCe9\ shortLabel Net\ track chainNetCe9Viewnet\ view net\ visibility dense\ blastHg16KG Human Proteins psl protein Human Proteins (hg16) Mapped by Chained tBLASTn 0 250.4 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This track contains tBLASTn alignments of the peptides from the predicted \ and known genes identified in the hg16 Known Genes track.\

    \ \

    Methods

    \

    \ First, the predicted proteins from the human Known Genes track were aligned \ with the human genome using the Blat program to discover exon boundaries. \ Next, the amino acid sequences that make up each exon were aligned with the \ human sequence using the tBLASTn program.\ Finally, the putative human exons were chained together using an \ organism-specific maximum gap size but no gap penalty. The single best exon \ chains extending over more than 60% of the query protein were included. Exon \ chains that extended over 60% of the query and matched at least 60% of the \ protein's amino acids were also included.

    \ \

    Credits

    \

    \ tBLASTn is part of the NCBI BLAST tool set. For more information on BLAST, see\ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.\ Basic local alignment search tool.\ J Mol Biol. 1990 Oct 5;215(3):403-410.

    \

    \ Blat was written by Jim Kent. The remaining utilities \ used to produce this track were written by Jim Kent or Brian Raney.

    \ genes 1 blastRef hg16.blastKGRef00\ colorChromDefault off\ group genes\ longLabel Human Proteins (hg16) Mapped by Chained tBLASTn\ pred hg16.blastKGPep00\ priority 250.4\ shortLabel Human Proteins\ track blastHg16KG\ type psl protein\ visibility hide\ chainNetCalJac1 calJac1 Chain/Net bed 3 Marmoset (June 2007 (WUGSC 2.0.2/calJac1)), Chain and Net Alignments 0 260.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of marmoset (June 2007 (WUGSC 2.0.2/calJac1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ marmoset and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ marmoset assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best marmoset/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The marmoset sequence used in this annotation is from\ the June 2007 (WUGSC 2.0.2/calJac1) (calJac1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the marmoset/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single marmoset chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A90-330-236-356
    C-330100-318-236
    G-236-318100-330
    T-356-236-33090

    \ \ \ Chains scoring below a minimum score of '5000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 5000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 90,-330,-236,-356,-330,100,-318,-236,-236,-318,100,-330,-356,-236,-330,90\ matrixHeader A, C, G, T\ noInherit on\ otherDb calJac1\ priority 260.3\ shortLabel $o_db Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetCalJac1\ type bed 3\ visibility hide\ chainNetCalJac3 Marmoset Chain/Net bed 3 Marmoset (March 2009 (WUGSC 3.2/calJac3)), Chain and Net Alignments 0 260.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of marmoset (March 2009 (WUGSC 3.2/calJac3)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ marmoset and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ marmoset assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best marmoset/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The marmoset sequence used in this annotation is from\ the March 2009 (WUGSC 3.2/calJac3) (calJac3) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the marmoset/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single marmoset chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A90-330-236-356
    C-330100-318-236
    G-236-318100-330
    T-356-236-33090

    \ \ \ Chains scoring below a minimum score of '5000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 5000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 90,-330,-236,-356,-330,100,-318,-236,-236,-318,100,-330,-356,-236,-330,90\ matrixHeader A, C, G, T\ noInherit on\ otherDb calJac3\ priority 260.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetCalJac3\ type bed 3\ visibility hide\ chainNetCalJac1Viewchain Chain bed 3 Marmoset (June 2007 (WUGSC 2.0.2/calJac1)), Chain and Net Alignments 3 260.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetCalJac1\ shortLabel Chain\ spectrum on\ track chainNetCalJac1Viewchain\ view chain\ visibility pack\ chainNetCalJac3Viewchain Chain bed 3 Marmoset (March 2009 (WUGSC 3.2/calJac3)), Chain and Net Alignments 3 260.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetCalJac3\ shortLabel Chain\ spectrum on\ track chainNetCalJac3Viewchain\ view chain\ visibility pack\ chainNetCalJac1Viewnet Net bed 3 Marmoset (June 2007 (WUGSC 2.0.2/calJac1)), Chain and Net Alignments 2 260.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetCalJac1\ shortLabel Net\ track chainNetCalJac1Viewnet\ view net\ visibility full\ chainNetCalJac3Viewnet Net bed 3 Marmoset (March 2009 (WUGSC 3.2/calJac3)), Chain and Net Alignments 2 260.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetCalJac3\ shortLabel Net\ track chainNetCalJac3Viewnet\ view net\ visibility full\ chainNetTarSyr1 Tarsier Chain/Net bed 3 Tarsier (Aug. 2008 (Broad/tarSyr1)), Chain and Net Alignments 0 270.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of tarsier (Aug. 2008 (Broad/tarSyr1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ tarsier and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ tarsier assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best tarsier/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The tarsier sequence used in this annotation is from\ the Aug. 2008 (Broad/tarSyr1) (tarSyr1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the tarsier/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single tarsier chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb tarSyr1\ priority 270.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetTarSyr1\ type bed 3\ visibility hide\ chainNetCaeJap3 caeJap3 Chain/Net bed 3 caeJap3 (caeJap3), Chain and Net Alignments 0 270.3 0 0 0 255 255 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of caeJap3 (caeJap3) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ caeJap3 and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ caeJap3 assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best caeJap3/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The caeJap3 sequence used in this annotation is from\ the caeJap3 (caeJap3) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the caeJap3/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single caeJap3 chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 255,255,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb caeJap3\ priority 270.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetCaeJap3\ type bed 3\ visibility hide\ chainNetTarSyr1Viewchain Chain bed 3 Tarsier (Aug. 2008 (Broad/tarSyr1)), Chain and Net Alignments 3 270.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetTarSyr1\ shortLabel Chain\ spectrum on\ track chainNetTarSyr1Viewchain\ view chain\ visibility pack\ chainNetCaeJap3Viewchain Chain bed 3 caeJap3 (caeJap3), Chain and Net Alignments 3 270.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCaeJap3\ shortLabel Chain\ spectrum on\ track chainNetCaeJap3Viewchain\ view chain\ visibility pack\ chainNetTarSyr1Viewnet Net bed 3 Tarsier (Aug. 2008 (Broad/tarSyr1)), Chain and Net Alignments 2 270.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetTarSyr1\ shortLabel Net\ track chainNetTarSyr1Viewnet\ view net\ visibility full\ chainNetCaeJap3Viewnet Net bed 3 caeJap3 (caeJap3), Chain and Net Alignments 1 270.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetCaeJap3\ shortLabel Net\ track chainNetCaeJap3Viewnet\ view net\ visibility dense\ chainOtoGar1Best Bushbaby Best Chain chain otoGar1 Bushbaby (Dec. 2006 (Broad/otoGar1)) Chained Alignments Recip Best 0 274.1 100 50 0 255 240 200 1 0 0 compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Chained Alignments Recip Best\ otherDb otoGar1\ priority 274.1\ shortLabel $o_Organism Best Chain\ spectrum on\ track chainOtoGar1Best\ type chain otoGar1\ visibility hide\ chainNetOtoGar1 Bushbaby Chain/Net bed 3 Bushbaby (Dec. 2006 (Broad/otoGar1)), Chain and Net Alignments 0 280.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of bushbaby (Dec. 2006 (Broad/otoGar1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ bushbaby and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ bushbaby assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best bushbaby/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The bushbaby sequence used in this annotation is from\ the Dec. 2006 (Broad/otoGar1) (otoGar1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the bushbaby/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single bushbaby chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb otoGar1\ priority 280.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetOtoGar1\ type bed 3\ visibility hide\ chainNetOtoGar1Viewchain Chain bed 3 Bushbaby (Dec. 2006 (Broad/otoGar1)), Chain and Net Alignments 3 280.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetOtoGar1\ shortLabel Chain\ spectrum on\ track chainNetOtoGar1Viewchain\ view chain\ visibility pack\ chainNetOtoGar1Viewnet Net bed 3 Bushbaby (Dec. 2006 (Broad/otoGar1)), Chain and Net Alignments 2 280.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetOtoGar1\ shortLabel Net\ track chainNetOtoGar1Viewnet\ view net\ visibility full\ chainNetMicMur1 Mouse lemur Chain/Net bed 3 Mouse lemur (Jun. 2003 (Broad/micMur1)), Chain and Net Alignments 0 290.3 0 0 0 100 50 0 0 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of mouse lemur (Jun. 2003 (Broad/micMur1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ mouse lemur and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ mouse lemur assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best mouse lemur/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The mouse lemur sequence used in this annotation is from\ the Jun. 2003 (Broad/micMur1) (MicMur1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the mouse lemur/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single mouse lemur chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb MicMur1\ priority 290.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ subGroup1 view Views chain=Chain net=Net\ track chainNetMicMur1\ type bed 3\ visibility hide\ chainNetHaeCon1 haeCon1 Chain/Net bed 3 haeCon1 (haeCon1), Chain and Net Alignments 0 290.3 0 0 0 255 255 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of haeCon1 (haeCon1) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ haeCon1 and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ haeCon1 assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best haeCon1/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The haeCon1 sequence used in this annotation is from\ the haeCon1 (haeCon1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the haeCon1/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single haeCon1 chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 255,255,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb haeCon1\ priority 290.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetHaeCon1\ type bed 3\ visibility hide\ chainNetMicMur1Viewchain Chain bed 3 Mouse lemur (Jun. 2003 (Broad/micMur1)), Chain and Net Alignments 3 290.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetMicMur1\ shortLabel Chain\ spectrum on\ track chainNetMicMur1Viewchain\ view chain\ visibility pack\ chainNetHaeCon1Viewchain Chain bed 3 haeCon1 (haeCon1), Chain and Net Alignments 3 290.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetHaeCon1\ shortLabel Chain\ spectrum on\ track chainNetHaeCon1Viewchain\ view chain\ visibility pack\ chainNetMicMur1Viewnet Net bed 3 Mouse lemur (Jun. 2003 (Broad/micMur1)), Chain and Net Alignments 2 290.3 0 0 0 100 50 0 0 0 0 compGeno 1 parent chainNetMicMur1\ shortLabel Net\ track chainNetMicMur1Viewnet\ view net\ visibility full\ chainNetHaeCon1Viewnet Net bed 3 haeCon1 (haeCon1), Chain and Net Alignments 1 290.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetHaeCon1\ shortLabel Net\ track chainNetHaeCon1Viewnet\ view net\ visibility dense\ rBestChainPanTro1 Chimp Recip Chain chain panTro1 Chimp (Nov. 2003 (CGSC 1.1/panTro1)) Reciprocal Best Chained Alignments 0 291 100 50 0 255 240 200 1 0 0

    Description

    \

    \ This track shows "reciprocal best" alignments of chimp \ (panTro1, Nov. 2003 (CGSC 1.1/panTro1)) to the human genome. These alignments were generated \ using blastz and blat alignments of chimp genomic sequence from the \ Nov. 2003 Arachne draft assembly. Alignments were made using a gap scoring \ system that allows longer gaps than traditional affine gap scoring systems. \ It can also tolerate gaps in both chimp and human simultaneously. \ These "double-sided" gaps can be caused by local inversions and\ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ chimp assembly or an insertion in the human\ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often\ due to processed pseudogenes, while chains with double-lined gaps are more\ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Methods

    \

    \ The alignments were generated by blastz on repeatmasked sequence using\ the following human/chimp scoring matrix:\

    \
              A    C    G    T     A   100 -300 -150 -300\
         C  -300  100 -300 -150     G  -150 -300  100 -300\
         T  -300 -150 -300  100\
    \
         K = 4500, L = 3000,  Y = 3400, H = 2000\
    

    \

    \ The resulting alignments were fed into axtChain, which organizes all\ alignments between a single chimp chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks. Chains scoring below a threshold were discarded.

    \

    \ To place additional chimp sequences not initially aligned by blastz,\ a DNA blat of the unmasked sequence was performed. The resulting\ blat alignments were also chained and then merged with the blastz-based \ chains from the previous step to produce a set of "all chains".

    \

    \ Due to the draft nature of this initial genome assembly,\ the chain track and the companion net track were generated using\ a "reciprocal best" strategy. This strategy attempts to minimize\ paralog fill-in for missing orthologous chimp sequence by filtering\ out of the human net all sequences not on the chimp side of the net.

    \

    \ First, the merged blastz and blat chains were used to generate an alignment \ net using the program chainNet (described on the Chimp Recip Net track \ description page). Next, the subset of chains in the chimp-reference net were \ extracted and used for an additional netting step. The resulting \ human-reference net was used to generate the reciprocal best Chimp Recip Net \ browser track. Non-syntenic sequences smaller than 50 bases were filtered \ out. Chains extracted from this net are displayed on the Chimp Recip Chain\ browser track.

    \ \

    Credits

    \

    \ The chimp sequence used in this track was obtained from the 13 Nov.\ 2003 Arachne assembly. We'd like to thank the National Human Genome Research\ Institute (NHGRI), the Broad Institute at MIT/Harvard, and Washington\ University St. Louis School of Medicine for providing this sequence.

    \

    \ Blastz was developed at Pennsylvania State University by\ Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his\ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at\ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains were generated\ by Robert Baertsch and Jim Kent.

    \ \

    References

    \

    \ Chiaromonte, F., Yap, V.B., Miller, W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput 2002, 115-26 (2002).

    \

    \ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

    \

    \ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

    \ compGeno 1 altColor 255,240,200\ color 100,50,0\ group compGeno\ longLabel $o_Organism ($o_date) Reciprocal Best Chained Alignments\ otherDb panTro1\ priority 291.0\ shortLabel Chimp Recip Chain\ spectrum on\ track rBestChainPanTro1\ type chain panTro1\ visibility hide\ rBestNetPanTro1 Chimp Recip Net netAlign panTro1 rBestChainPanTro1 Chimp (Nov. 2003/panTro1) Reciprocal Best Net 0 291.1 0 0 0 127 127 127 1 0 0

    Description

    \

    \ This track shows the "reciprocal best" human/chimpanzee alignment \ net. It is useful for finding orthologous regions and for studying genome\ rearrangement. \

    \ In the graphical display, the boxes represent ungapped \ alignments, while the lines represent gaps. \ In full display mode, the top-level (Level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases, gaps exist in the\ top-level chains. When possible, these are filled in by\ other chains displayed at Level 2. The gaps in \ Level 2 chains may be filled by Level 3 chains and so\ forth.\ Clicking on a box displays detailed information about the chain\ as a whole, while clicking on a line shows information\ on the gap. The detailed information is useful in determining\ the cause of the gap or, for lower-level chains, the genomic\ rearrangement. \

    \ Individual track features are categorized as one of four types\ (other than gap):\

      \
    • Top - the best, longest match. Displayed on Level 1.\
    • Syn - aligns to the same chromosome as the gap in the level\ above it.\
    • Inv - aligns to the same chromosome as the gap above it, but in the\ opposite orientation.\
    • NonSyn - matches a different chromosome from the gap in the level \ above it.\

    \ \

    Methods

    \

    \ These alignments were generated using blastz and blat alignments of chimpanzee\ genomic sequence from the 13 Nov. 2003 Arachne chimpanzee draft assembly.\ The initial alignments were generated using blastz on \ repeatmasked sequence using the following chimp/human scoring matrix:\

    \
         A    C    G    T\
    A   100 -300 -150 -300\
    C  -300  100 -300 -150\
    G  -150 -300  100 -300\
    T  -300 -150 -300  100\
    \
    K = 4500, L = 3000,  Y=3400, H=2000\
    
    \

    \

    \ The resulting alignments were processed by the axtChain program.\ AxtChain organizes all the alignments between a single chimp chromosome\ and a single human chromosome into a group and makes a kd-tree out\ of all the gapless subsections (blocks) of the alignments.\ The maximally-scoring chains of these blocks were found by running a\ dynamic program over the kd-tree. Chains scoring below a certain\ threshold were discarded.\

    \ To place additional chimp scaffolds that weren't initially aligned by blastz,\ a DNA blat of the unmasked sequence was performed. The resulting\ blat alignments were also chained, and then merged with the\ blastz-based chains produced in the previous step to produce "all chains".\

    \ These chains were sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain.

    \

    \ Due to the draft nature of this initial genome assembly,\ this net track (and the companion chain track) was generated using\ a "reciprocal best" strategy. This strategy attempts to minimize\ paralog fill-in for missing orthologous chimp sequence by filtering\ from the human net all sequences not found in the chimp side of the\ net. After generating the human alignment net, \ the subset of chains in the chimp-reference net was extracted\ and used for an additional netting step, which was then filtered\ for non-syntenic sequences smaller than 50 bases.

    \ \

    Credits

    \

    \ The chimp sequence used in this track was obtained from the 13 Nov. 2003 \ Arachne assembly. We'd like to thank the National Human Genome Research \ Institute (NHGRI), the Eli & Edythe L. Broad Institute at MIT/Harvard, and \ Washington University School of Medicine for providing this sequence.

    \

    \ Blastz was developed at Pennsylvania State University by\ Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his program \ RepeatMasker.

    \

    \ The browser display and database storage of the nets were made\ by Robert Baertsch and Jim Kent.

    \ \

    References

    \

    \ Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

    \

    \ Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,\ Haussler, D., and Miller, W.\ Human-Mouse Alignments with BLASTZ.\ Genome Res. 13(1), 103-7 (2003).

    \ compGeno 0 group compGeno\ longLabel Chimp (Nov. 2003/panTro1) Reciprocal Best Net\ priority 291.1\ shortLabel Chimp Recip Net\ spectrum on\ track rBestNetPanTro1\ type netAlign panTro1 rBestChainPanTro1\ visibility hide\ chimpDels Chimp Deletions bed 4 . Deletions in Chimp (Nov. 2003/panTro1) Relative to Human 0 291.2 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This track displays regions of the human genome assembly (hg16) that are deleted \ in the chimpanzee draft assembly (panTro1). Only regions of between \ 80 and 12000 bases are included. The name of each deletion is a unique pointer to \ that deletion followed by an underscore and then its length. A similar track, \ showing human deletions in the chimpanzee assembly, appears in the chimp \ Genome Browser.

    \ \

    Methods

    \

    \ The human/chimpanzee alignments were created at UCSC with \ blastz and \ blat, \ using a reciprocal best strategy with chaining and \ netting. The initial alignments were generated using blastz on repeatmasked \ sequence with following matrix:\ \

    \
           A    C    G    T\
     A   100 -300 -150 -300\
     C  -300  100 -300 -150\
     G  -150 -300  100 -300\
     T  -300 -150 -300  100\
    \
     O = 400, E = 30, K = 4500, L = 4500, M = 50\
    

    \

    \ The overall score is the sum of the score over all pairs.

    \

    \ The resulting alignments were processed by the axtChain program. To place \ additional chimp scaffolds that weren't initially aligned by blastz, a DNA blat \ of the unmasked sequence was performed. The resulting blat alignments were also \ chained, and then merged with the blastz-based chains produced in the previous \ step to produce "all chains", which were further processed by the \ chainNet and netSyntenic programs. Finally, a "reciprocal best" \ strategy was employed to minimize paralog fill-in for missing orthologous chimp \ sequence. Details of the alignment methods can be found in the descriptions of the Chimp Chain and Chimp Net tracks.\

    \ \

    Chimp deletions in human were determined from the collection of indels implied by these alignments. The criteria for inclusion in the list of deletions were (i) within, not between, scaffolds; (ii) simple gaps only (no opposing, unmatched bases or double gaps); (iii) 80-12000 bp long; and (iv) not a missed overlap or incorrect gap size in assembly. These criteria aim to include plausible repeat insertions and exclude assembly and alignment artifacts.

    \ \

    Credits

    \

    \ The chimpanzee sequence used in this track was obtained from the 13 Nov. 2003 \ Arachne assembly. This sequence was provided by the National Human Genome \ Research Institute (NHGRI), the Eli & Edythe L. Broad Institute at MIT/Harvard, \ and Washington University School of Medicine.

    \

    \ The BLASTZ program was created by Webb Miller of the \ Penn State Bioinformatics \ Group.

    \

    \ Jim Kent at UCSC wrote the blat program, the chaining and netting programs, and \ the scripts for displaying the alignments in this browser.

    \

    \ The list of mid-sized (80-12000 bp) chimp deletions relative to human was \ provided by Tarjei Mikkelsen at MIT. The UCSC alignments of complete \ chimpanzee scaffolds to the human genome assembly were used to generate this list.

    \ \

    References

    \

    \ ARACHNE: A Whole-Genome Shotgun Assembler. \ Serafim Batzoglou, David B. Jaffe, Ken Stanley, Jonathan Butler, Sante Gnerre, \ Evan Mauceli, Bonnie Berger, Jill P. Mesirov, and Eric S. Lander. \ Genome Research 2002 Jan;12:177-189.

    \

    \ Whole-Genome Sequence Assembly for Mammalian Genomes: ARACHNE 2.\ David B. Jaffe, Jonathan Butler, Sante Gnerre, Evan Mauceli, Kerstin Lindblad-Toh,\ Jill P. Mesirov, Michael C. Zody, and Eric S. Lander. \ Genome Research 2003 Jan;13(1):91-96.

    \

    \ Human-Mouse Alignments with BLASTZ. Schwartz S, Kent WJ, \ Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, and Miller W. \ Genome Research 2003 Jan;13(1):103-7.

    \

    \ Scoring pairwise genomic sequence alignments. \ Chiaromonte F, Yap VB, Miller W. Pac Symp Biocomput 2002;:115-26.

    \ compGeno 1 group compGeno\ longLabel Deletions in Chimp (Nov. 2003/panTro1) Relative to Human\ priority 291.2\ shortLabel Chimp Deletions\ track chimpDels\ type bed 4 .\ visibility hide\ chimpSimpleDiff Chimp Diff bed 3 + Chimp (Nov. 2003 (CGSC 1.1/panTro1)) Simple Differences in Regions of High Quality Sequence 0 292.3 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This track shows simple differences between chimp alignments and the human\ assembly within regions of high quality chimp sequence. The chimp data \ was obtained from the 13 Nov. 2003 Arachne assembly.\ A total of 28,889,041 differences are displayed. The difference rate in \ coding sequence is approximately half that of the chimp genome as a whole.

    \ \

    Methods

    \

    \ For a difference to be included in this track, it had to meet the following\ criteria: \

      \
    • the difference must occur at a base of quality 30 or better \
    • all bases within an 11-base window around this base must have a quality \ of 25 or better\
    • the 11-base window must contain no more than two base differences\
    • no insertions or deletions may be present within the window \

    \

    \ Only reciprocal best chimp alignments were considered for this track (see the\ Chimp Net description for more information about this alignment strategy).

    \ \

    Credits

    \

    \ This track was generated at UCSC by Jim Kent.

    \

    \ The chimp sequence was obtained from the 13 Nov. 2003 Arachne assembly. We'd \ like to thank the National Human Genome Research Institute (NHGRI), the Broad \ Institute, and Washington University School of Medicine in St. Louis for \ providing this sequence.

    \ \ compGeno 1 group compGeno\ longLabel $o_Organism ($o_date) Simple Differences in Regions of High Quality Sequence\ otherDb panTro1\ priority 292.3\ shortLabel Chimp Diff\ track chimpSimpleDiff\ type bed 3 +\ visibility hide\ darned Human RNA Editing bed 9 Human RNA Editing from the DAtabase of RNa EDiting 0 300 0 0 0 127 127 127 0 0 0

    Description

    \

    \ This track provides information on RNA nucleotides that are edited after transcription and their \ corresponding genomic coordinates. Only post-transcriptional editing that results in small changes \ to the identity of a nucleic acid are included in this track; it does not include other RNA \ processing such as splicing or methylation. The track contains information on A-to-I \ (adenosine-to-inosine) and C-to-U (cytidine-to-uridine) editing that occur due to deamination by \ ADAR and APOBEC enzymes, respectively. Most of the data in this track are on A-to-I editing, which \ is known to be highly abundant in humans.\

    \ \

    Display

    \

    \ Track items are colored depending on their occurrence within RNA transcripts:\

    \
      \
    • Dark Green: 5' UTR
    • \
    • Blue: CDS
    • \
    • Red: Intron
    • \
    • Deep Pink: 3' UTR
    • \
    • Black: Other (exon/intron status is unclear or unknown)
    • \
    \ \

    Methods

    \

    \ The data were obtained from several research papers on RNA editing and were mapped to the \ reference genome. More information can be obtained from DARNED \ database.\

    \ \

    References:

    \

    \ Kiran A, Baranov PV. \ \ DARNED: a DAtabase of RNa EDiting in humans. Bioinformatics. 2010;26(14):1772-6.\

    \ rna 1 group rna\ itemRgb on\ longLabel Human RNA Editing from the DAtabase of RNa EDiting\ noScoreFilter .\ priority 300\ shortLabel Human RNA Editing\ track darned\ type bed 9\ visibility hide\ laminB1 LaminB1 (Tig3) wig -6.602 5.678 NKI LaminB1 DamID Map (log2-ratio scores, Tig3 cells) 0 300 0 0 127 127 127 191 0 0 0

    Description

    \ \

    \ Please see the NKI Nuc Lamina "super-track" link above for description and methods.\

    \ regulation 0 autoScale Off\ color 0,0,127\ group regulation\ longLabel NKI LaminB1 DamID Map (log2-ratio scores, Tig3 cells)\ maxHeightPixels 100:40:11\ priority 300\ shortLabel LaminB1 (Tig3)\ smoothingWindow 2\ spanList 60\ superTrack laminB1Super dense\ track laminB1\ type wig -6.602 5.678\ viewLimits -2:2\ visibility hide\ windowingFunction mean\ chainMm3XSingle Mm3X Best Recip chain mm3X Mm3X Best Reciprocal (best w/no overlap) 0 300 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel Mm3X Best Reciprocal (best w/no overlap)\ otherDb mm3X\ priority 300\ shortLabel Mm3X Best Recip\ spectrum on\ track chainMm3XSingle\ type chain mm3X\ visibility hide\ laminB1Lads NKI LADs (Tig3) bed 3 NKI LADs (Lamina Associated Domains, Tig3 cells) 0 300 0 0 127 127 127 191 0 0 0

    Description

    \ \

    \ Please see the NKI Nuc Lamina "super-track" link above for description and methods.\

    \ regulation 1 color 0,0,127\ group regulation\ longLabel NKI LADs (Lamina Associated Domains, Tig3 cells)\ priority 300\ shortLabel NKI LADs (Tig3)\ superTrack laminB1Super dense\ track laminB1Lads\ type bed 3\ visibility hide\ laminB1Super NKI Nuc Lamina NKI Nuclear Lamina Associated Domains (LaminB1 DamID) 0 300 0 0 0 127 127 127 0 0 0

    Overview

    \ \
    \

    Nuclear Lamina and Chromosomal Organization\

    \ Model of chromosome organization in interphase, summarizing the main results\ presented in this paper. Large, discrete chromosomal domains are dynamically\ associated (double arrows) with the nuclear lamina, and demarcated by putative\ insulator elements that include CTCF binding sites, promoters that are oriented\ away from the lamina, and CpG islands (Fig. S1, Guelen et al., 2008).\

    \ \

    \ The architecture of human chromosomes in interphase nuclei is\ still largely unknown. Microscopy studies have indicated that specific\ regions of chromosomes are located in close proximity to the\ nuclear lamina (NL, a dense fibrillar network associated with the inner face \ of the nuclear envelope). \ This has led to the idea that certain genomic elements may be attached to the \ NL, which may contribute to\ the spatial organization of chromosomes inside the nucleus.\ This track represents a high-resolution map of genome-NL interactions in human \ Tig3 lung fibroblasts, \ as determined by the DamID technique. \

    \ \

    NKI LaminB1 track

    \

    \ The LaminB1 track shows a high resolution\ map of the interaction sites of the entire genome with\ Lamin B1, (a key NL component) in human fibroblasts.\ This map shows that genome-lamina interactions occur through more than 1,300 \ sharply defined large domains 0.1-10 megabases in size. \ Microscopy evidence indicates that most of these domains are preferentially \ located at nuclear periphery. \ These lamina associated domains (LADs) are characterized by low gene-expression\ levels,\ indicating that LADs represent a repressive chromatin environment. \ The borders of LADs are demarcated by the insulator\ protein CTCF, by promoters that are oriented away from\ LADs, or by CpG islands, suggesting possible mechanisms of\ LAD confinement. \ Taken together, these results demonstrate that\ the human genome is divided into large, discrete domains that are\ units of chromosome organization within the nucleus (see Guelen et al., \ 2008).\

    \ \

    NKI LADs track

    \

    The LADs track shows Lamina Associated Domains, or LADs, based on a \ genome-wide DamID profile of LaminB1 (above). \ For the definition of LADs, the full-genome lamin B1 DamID data set was\ binarized by setting tiling array probes with positive DamID log ratios to 1 and\ otherwise to 21. Next, a two-step algorithm was used to identify LADs. First,\ sharp transitions were identified with a sliding edge filter, which calculates the\ difference in average binary values in two windows of 99 neighbouring probes\ immediately left and right of a queried probe. The cutoff for this difference was\ chosen such that the number of edges detected in randomly permuted data sets\ was less than 5% of the number of edges detected in the original lamin B1 data\ set. Second, pairs of adjacent 'left' and 'right' edges were identified that together\ enclosed a region of arbitrary size with at least 70% of the enclosed probes\ reporting a positive log2 ratio. A total of 1,344 regions fulfilled these criteria\ and were termed LADs. In 20 randomly permuted data sets, fewer than 13\ domains were identified by the same criteria. Note that there are also\ lamin-B1-positive domains flanked by one or two gradual or irregular transitions.\ Because it is difficult to define the borders of such domains precisely, these \ 'fuzzy' domains are not analyzed here. \ (see Guelen et al., 2008).\

    \ \

    Display Conventions and Configuration

    \ \

    The LaminB1 wiggle track values range from -6.602 to 5.678 and were \ normalized so have a median of 0 and standard deviation of 1.037. The \ default vertical viewing range for the wiggle track was chosen from -2 \ to 2 because this is roughly +/- 2 standard deviations.\

    \ \

    For an example region see genomic location: \ chr4:35,000,001-45,000,000 (Fig 1, Guelen et al., 2008).\

    \ \ \

    Methods

    \ \

    The DamID technique was applied to generate a high-resolution map of NL \ interactions for the entire human genome. \ DamID is based on targeted adenine methylation of DNA sequences that interact \ in vivo with a protein of interest. \

    \ \

    DamID was performed with lentiviral transduction as described\ (Guelen et al., 2008). In short, a fusion protein consisting\ of Escherichia coli DNA adenine methyltransferase (Dam) fused to human\ LaminB1 was introduced into cultured Tig3 human lung fibroblasts. \ Dam methylates adenines in the sequence GATC, a mark absent in most eukaryotes. \ Here, the LaminB1-Dam fusion protein incorporates\ in the nuclear lamina, as verified by immunofluorescence\ staining. Hence, the sequences near the nuclear lamina are marked with\ a unique methylation tag. The adenine methylation pattern was detected with \ genomic tiling arrays. \ Unfused Dam was used as a\ reference (http://research.nki.nl/vansteensellab/DamID.htm). The data\ shown are the log2-ratio of LaminB1-Dam fusion protein over Dam-only.\

    \ \

    Sample labelling and hybridizations were performed by NimbleGen\ Inc., on a set of 8 custom-designed oligonucleotide arrays, with a median\ probe spacing of ~750 bp. All probes recognize unique (non-repetitive) sequences. \ The raw data was log2 transformed and loess\ normalized. Between array median/scale normalization was based on 6979\ probes common to all arrays. Replicate arrays were averaged and the\ full data set normalized to genome-wide median.\

    \ \

    Verification

    \ \

    The data are based on two independent biological replicates. \ Fluorescence in situ hybridization microscopy confirmed \ that most of the LaminB1 associated regions are preferentially located at \ the nuclear periphery. \ The array platform, the raw \ and normalized data have been deposited at the NCBI Gene Expression Omnibus \ (GEO) (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE8854.\

    \ \

    Credits

    \ \

    The data for this track were generated by Lars Guelen, Ludo Pagie,\ and Bas van Steensel at the Van Steensel Lab, Netherlands Cancer Institute.\

    \ \

    References

    \ \

    Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, Talhout W,\ Eussen BH, de Klein A, Wessels L, de Laat W, van Steensel B. Domain organization of human chromosomes revealed by\ mapping of nuclear lamina interactions Nature. 2008 June 12;453:948-951.\

    \ \ regulation 0 group regulation\ longLabel NKI Nuclear Lamina Associated Domains (LaminB1 DamID)\ priority 300\ shortLabel NKI Nuc Lamina\ superTrack on\ track laminB1Super\ syntenicNet Syntenic Nets netAlign Syntenic Alignment Nets for Chimp, Macaque, Mouse, Rat, and Dog 0 300 0 0 0 127 127 127 1 0 0 compGeno 0 compositeTrack on\ group compGeno\ longLabel Syntenic Alignment Nets for Chimp, Macaque, Mouse, Rat, and Dog\ priority 300\ shortLabel Syntenic Nets\ spectrum on\ track syntenicNet\ type netAlign\ visibility hide\ chainMm2XSingle Mm2X Best Recip chain mm2X Mm2X Best Reciprocal (best w/no overlap) 0 301 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel Mm2X Best Reciprocal (best w/no overlap)\ otherDb mm2X\ priority 301\ shortLabel Mm2X Best Recip\ spectrum on\ track chainMm2XSingle\ type chain mm2X\ visibility hide\ rBestNet Reciprocal Best Nets netAlign Reciprocal Best Alignment Nets 0 301 0 0 0 127 127 127 1 0 0 compGeno 0 compositeTrack on\ group compGeno\ longLabel Reciprocal Best Alignment Nets\ priority 301\ shortLabel Reciprocal Best Nets\ spectrum on\ track rBestNet\ type netAlign\ visibility hide\ netRBestEchTel1 Tenrec RBest Net netAlign echoTel1 chainEchTel1 Tenrec (July 2005 (Broad/echTel1)) Reciprocal Best Alignment Net 0 301.11 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Reciprocal Best Alignment Net\ otherDb echTel1\ parent rBestNet\ priority 301.11\ shortLabel Tenrec RBest Net\ spectrum on\ track netRBestEchTel1\ type netAlign echoTel1 chainEchTel1\ visibility hide\ netRBestLoxAfr1 Elephant RBest Net netAlign loxAfr1 chainLoxAfr1 Elephant (May 2005 (Broad/loxAfr1)) Reciprocal Best Alignment Net 0 301.13 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Reciprocal Best Alignment Net\ otherDb loxAfr1\ parent rBestNet\ priority 301.13\ shortLabel Elephant RBest Net\ spectrum on\ track netRBestLoxAfr1\ type netAlign loxAfr1 chainLoxAfr1\ visibility hide\ netRBestDasNov1 Armadillo RBest Net netAlign dasNov1 chainDasNov1 Armadillo (May 2005 (Broad/dasNov1)) Reciprocal Best Alignment Net 0 301.15 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Reciprocal Best Alignment Net\ otherDb dasNov1\ parent rBestNet\ priority 301.15\ shortLabel Armadillo RBest Net\ spectrum on\ track netRBestDasNov1\ type netAlign dasNov1 chainDasNov1\ visibility hide\ netRBestOryCun1 Rabbit RBest Net netAlign oryCun1 chainOryCun1 Rabbit (May 2005 (Broad/oryCun1)) Reciprocal Best Alignment Net 0 301.7 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Reciprocal Best Alignment Net\ otherDb oryCun1\ parent rBestNet\ priority 301.7\ shortLabel Rabbit RBest Net\ spectrum on\ track netRBestOryCun1\ type netAlign oryCun1 chainOryCun1\ visibility hide\ netRBestCavPor2 Guineapig RBest Net netAlign cavPor2 chainCavPor2 Guinea pig (Oct. 2005 (Broad/cavPor2)) Reciprocal Best Alignment Net 0 301.9 0 0 0 127 127 127 1 0 0 compGeno 0 group compGeno\ longLabel $o_Organism ($o_date) Reciprocal Best Alignment Net\ otherDb cavPor2\ parent rBestNet\ priority 301.9\ shortLabel Guineapig RBest Net\ spectrum on\ track netRBestCavPor2\ type netAlign cavPor2 chainCavPor2\ visibility hide\ chainHg16ProtEx chainHg16ProtEx chain hg16 chainHg16ProtEx 0 302 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel chainHg16ProtEx\ otherDb hg16\ priority 302\ shortLabel chainHg16ProtEx\ spectrum on\ track chainHg16ProtEx\ type chain hg16\ visibility hide\ chainHg16MergeEx chainHg16MergeEx chain hg16 chainHg16MergeEx 0 303 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel chainHg16MergeEx\ otherDb hg16\ priority 303\ shortLabel chainHg16MergeEx\ spectrum on\ track chainHg16MergeEx\ type chain hg16\ visibility hide\ chainDm1MergeEx chainDm1MergeEx chain dm1 chainDm1MergeEx 0 304 100 50 0 255 240 200 1 0 0 x 1 altColor 255,240,200\ color 100,50,0\ group x\ longLabel chainDm1MergeEx\ otherDb dm1\ priority 304\ shortLabel chainDm1MergeEx\ spectrum on\ track chainDm1MergeEx\ type chain dm1\ visibility hide\ chainNetPriPac1 P. pacificus Chain/Net bed 3 P. pacificus (Feb. 2007 (WUGSC 5.0/priPac1)), Chain and Net Alignments 0 310.3 0 0 0 255 255 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of P. pacificus (Feb. 2007 (WUGSC 5.0/priPac1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ P. pacificus and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ P. pacificus assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best P. pacificus/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The P. pacificus sequence used in this annotation is from\ the Feb. 2007 (WUGSC 5.0/priPac1) (priPac1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the P. pacificus/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single P. pacificus chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 255,255,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb priPac1\ priority 310.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetPriPac1\ type bed 3\ visibility hide\ chainNetPriPac1Viewchain Chain bed 3 P. pacificus (Feb. 2007 (WUGSC 5.0/priPac1)), Chain and Net Alignments 3 310.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetPriPac1\ shortLabel Chain\ spectrum on\ track chainNetPriPac1Viewchain\ view chain\ visibility pack\ chainNetPriPac1Viewnet Net bed 3 P. pacificus (Feb. 2007 (WUGSC 5.0/priPac1)), Chain and Net Alignments 1 310.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetPriPac1\ shortLabel Net\ track chainNetPriPac1Viewnet\ view net\ visibility dense\ chainNetOryCun1 oryCun1 Chain/Net bed 3 Rabbit (May 2005 (Broad/oryCun1)), Chain and Net Alignments 0 320.3 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of rabbit (May 2005 (Broad/oryCun1)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ rabbit and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ rabbit assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best rabbit/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The rabbit sequence used in this annotation is from\ the May 2005 (Broad/oryCun1) (oryCun1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the rabbit/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single rabbit chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb oryCun1\ priority 320.3\ shortLabel $o_db Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetOryCun1\ type bed 3\ visibility hide\ chainNetOryCun2 Rabbit Chain/Net bed 3 Rabbit (Apr. 2009 (Broad/oryCun2)), Chain and Net Alignments 0 320.3 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of rabbit (Apr. 2009 (Broad/oryCun2)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ rabbit and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ rabbit assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best rabbit/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The rabbit sequence used in this annotation is from\ the Apr. 2009 (Broad/oryCun2) (oryCun2) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the rabbit/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single rabbit chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb oryCun2\ priority 320.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetOryCun2\ type bed 3\ visibility hide\ chainNetOryCun1Viewchain Chain bed 3 Rabbit (May 2005 (Broad/oryCun1)), Chain and Net Alignments 3 320.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetOryCun1\ shortLabel Chain\ spectrum on\ track chainNetOryCun1Viewchain\ view chain\ visibility pack\ chainNetOryCun2Viewchain Chain bed 3 Rabbit (Apr. 2009 (Broad/oryCun2)), Chain and Net Alignments 3 320.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetOryCun2\ shortLabel Chain\ spectrum on\ track chainNetOryCun2Viewchain\ view chain\ visibility pack\ chainNetOryCun1Viewnet Net bed 3 Rabbit (May 2005 (Broad/oryCun1)), Chain and Net Alignments 2 320.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetOryCun1\ shortLabel Net\ track chainNetOryCun1Viewnet\ view net\ visibility full\ chainNetOryCun2Viewnet Net bed 3 Rabbit (Apr. 2009 (Broad/oryCun2)), Chain and Net Alignments 2 320.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetOryCun2\ shortLabel Net\ track chainNetOryCun2Viewnet\ view net\ visibility full\ chimpParalogy Chimp Seg Dups bed 3 . Chimp Segmental Duplications 0 330 0 0 0 127 127 127 0 0 0

    Description

    \

    \ High-depth sequence reads from an individual male chimpanzee were used to\ detect paralogy in the human genome reference sequence. This track shows\ confirmed chimpanzee segmental duplications on human reference, defined as\ having greater than 94% similarity to sequences with at least 10 kb in length.\

    \

    Credits

    \ \

    \ The data were provided by \ Ginger Cheng and \ Evan Eichler as part of their\ efforts to map chimp paralogy at the University of Washington.

    \ \

    References

    \ \

    \ \ Cheng Z, Ventura M, She X, Khaitovich P, Graves T, Osoegawa K, Church D, DeJong\ P, Wilson RK, Paabo S et al. A genome-wide comparison of recent chimpanzee and human\ segmental duplications. Nature. 2005 Sep 1;437(7055):88-93.

    \

    \ \

    \ Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers\ EW, Li PW, Eichler EE.\ Recent segmental duplications in the human genome. \ Science. 2002 Aug 9;297(5583):1003-7.

    \

    \ \ varRep 1 group varRep\ longLabel Chimp Segmental Duplications\ priority 330\ shortLabel Chimp Seg Dups\ track chimpParalogy\ type bed 3 .\ visibility hide\ chainNetMelInc1 melInc1 Chain/Net bed 3 melInc1 (melInc1), Chain and Net Alignments 0 330.3 0 0 0 255 255 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of melInc1 (melInc1) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ melInc1 and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ melInc1 assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best melInc1/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The melInc1 sequence used in this annotation is from\ the melInc1 (melInc1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the melInc1/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single melInc1 chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 255,255,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb melInc1\ priority 330.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetMelInc1\ type bed 3\ visibility hide\ chainNetMelInc1Viewchain Chain bed 3 melInc1 (melInc1), Chain and Net Alignments 3 330.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetMelInc1\ shortLabel Chain\ spectrum on\ track chainNetMelInc1Viewchain\ view chain\ visibility pack\ chainNetMelInc1Viewnet Net bed 3 melInc1 (melInc1), Chain and Net Alignments 1 330.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetMelInc1\ shortLabel Net\ track chainNetMelInc1Viewnet\ view net\ visibility dense\ chainNetCavPor2 cavPor2 Chain/Net bed 3 Guinea pig (Oct. 2005 (Broad/cavPor2)), Chain and Net Alignments 0 340.3 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of guinea pig (Oct. 2005 (Broad/cavPor2)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ guinea pig and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ guinea pig assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best guinea pig/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The guinea pig sequence used in this annotation is from\ the Oct. 2005 (Broad/cavPor2) (cavPor2) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the guinea pig/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single guinea pig chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb cavPor2\ priority 340.3\ shortLabel $o_db Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetCavPor2\ type bed 3\ visibility hide\ chainNetCavPor3 Guinea pig Chain/Net bed 3 Guinea pig (Feb. 2008 (Broad/cavPor3)), Chain and Net Alignments 0 340.3 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of guinea pig (Feb. 2008 (Broad/cavPor3)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ guinea pig and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ guinea pig assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best guinea pig/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The guinea pig sequence used in this annotation is from\ the Feb. 2008 (Broad/cavPor3) (cavPor3) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the guinea pig/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single guinea pig chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb cavPor3\ priority 340.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetCavPor3\ type bed 3\ visibility hide\ chainNetMelHap1 melHap1 Chain/Net bed 3 melHap1 (melHap1), Chain and Net Alignments 0 340.3 0 0 0 255 255 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of melHap1 (melHap1) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ melHap1 and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ melHap1 assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best melHap1/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The melHap1 sequence used in this annotation is from\ the melHap1 (melHap1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the melHap1/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single melHap1 chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 255,255,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb melHap1\ priority 340.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetMelHap1\ type bed 3\ visibility hide\ chainNetCavPor2Viewchain Chain bed 3 Guinea pig (Oct. 2005 (Broad/cavPor2)), Chain and Net Alignments 3 340.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetCavPor2\ shortLabel Chain\ spectrum on\ track chainNetCavPor2Viewchain\ view chain\ visibility pack\ chainNetCavPor3Viewchain Chain bed 3 Guinea pig (Feb. 2008 (Broad/cavPor3)), Chain and Net Alignments 3 340.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetCavPor3\ shortLabel Chain\ spectrum on\ track chainNetCavPor3Viewchain\ view chain\ visibility pack\ chainNetMelHap1Viewchain Chain bed 3 melHap1 (melHap1), Chain and Net Alignments 3 340.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetMelHap1\ shortLabel Chain\ spectrum on\ track chainNetMelHap1Viewchain\ view chain\ visibility pack\ chainNetCavPor2Viewnet Net bed 3 Guinea pig (Oct. 2005 (Broad/cavPor2)), Chain and Net Alignments 2 340.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetCavPor2\ shortLabel Net\ track chainNetCavPor2Viewnet\ view net\ visibility full\ chainNetCavPor3Viewnet Net bed 3 Guinea pig (Feb. 2008 (Broad/cavPor3)), Chain and Net Alignments 2 340.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetCavPor3\ shortLabel Net\ track chainNetCavPor3Viewnet\ view net\ visibility full\ chainNetMelHap1Viewnet Net bed 3 melHap1 (melHap1), Chain and Net Alignments 1 340.3 0 0 0 255 255 0 1 0 0 compGeno 1 parent chainNetMelHap1\ shortLabel Net\ track chainNetMelHap1Viewnet\ view net\ visibility dense\ chainNetRn3 Rat Chain/Net bed 3 Rat (June 2003 (Baylor 3.1/rn3)), Chain and Net Alignments 0 360.3 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of rat (June 2003 (Baylor 3.1/rn3)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ rat and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ rat assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best rat/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The rat sequence used in this annotation is from\ the June 2003 (Baylor 3.1/rn3) (rn3) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the rat/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single rat chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb rn3\ priority 360.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetRn3\ type bed 3\ visibility hide\ chainNetRn4 Rat Chain/Net bed 3 Rat (Nov. 2004 (Baylor 3.4/rn4)), Chain and Net Alignments 0 360.3 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of rat (Nov. 2004 (Baylor 3.4/rn4)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ rat and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ rat assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best rat/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The rat sequence used in this annotation is from\ the Nov. 2004 (Baylor 3.4/rn4) (rn4) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the rat/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single rat chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb rn4\ priority 360.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetRn4\ type bed 3\ visibility hide\ chainNetRn3Viewchain Chain bed 3 Rat (June 2003 (Baylor 3.1/rn3)), Chain and Net Alignments 3 360.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetRn3\ shortLabel Chain\ spectrum on\ track chainNetRn3Viewchain\ view chain\ visibility pack\ chainNetRn4Viewchain Chain bed 3 Rat (Nov. 2004 (Baylor 3.4/rn4)), Chain and Net Alignments 3 360.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetRn4\ shortLabel Chain\ spectrum on\ track chainNetRn4Viewchain\ view chain\ visibility pack\ chainNetRn3Viewnet Net bed 3 Rat (June 2003 (Baylor 3.1/rn3)), Chain and Net Alignments 2 360.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetRn3\ shortLabel Net\ track chainNetRn3Viewnet\ view net\ visibility full\ chainNetRn4Viewnet Net bed 3 Rat (Nov. 2004 (Baylor 3.4/rn4)), Chain and Net Alignments 2 360.3 0 0 0 100 50 0 1 0 0 compGeno 1 parent chainNetRn4\ shortLabel Net\ track chainNetRn4Viewnet\ view net\ visibility full\ chainNetMm7 mm7 Chain/Net bed 3 Mouse (Aug. 2005 (NCBI35/mm7)), Chain and Net Alignments 0 370.3 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of mouse (Aug. 2005 (NCBI35/mm7)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ mouse and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ mouse assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best mouse/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The mouse sequence used in this annotation is from\ the Aug. 2005 (NCBI35/mm7) (mm7) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the mouse/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single mouse chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb mm7\ priority 370.3\ shortLabel $o_db Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetMm7\ type bed 3\ visibility hide\ chainNetMm8 mm8 Chain/Net bed 3 Mouse (Feb. 2006 (NCBI36/mm8)), Chain and Net Alignments 0 370.3 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of mouse (Feb. 2006 (NCBI36/mm8)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ mouse and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ mouse assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best mouse/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The mouse sequence used in this annotation is from\ the Feb. 2006 (NCBI36/mm8) (mm8) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the mouse/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single mouse chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb mm8\ priority 370.3\ shortLabel $o_db Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetMm8\ type bed 3\ visibility hide\ chainNetBruMal1 bruMal1 Chain/Net bed 3 bruMal1 (bruMal1), Chain and Net Alignments 0 370.3 0 0 0 255 255 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of bruMal1 (bruMal1) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ bruMal1 and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ bruMal1 assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best bruMal1/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The bruMal1 sequence used in this annotation is from\ the bruMal1 (bruMal1) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the bruMal1/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single bruMal1 chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '1000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=loose\
    \
    tablesize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111  72111  152111  252111\
    qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600\
    bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 255,255,0\ chainLinearGap loose\ chainMinScore 1000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb bruMal1\ priority 370.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetBruMal1\ type bed 3\ visibility hide\ chainNetMm6 Mouse Chain/Net bed 3 Mouse (Mar. 2005 (NCBI34/mm6)), Chain and Net Alignments 0 370.3 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of mouse (Mar. 2005 (NCBI34/mm6)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ mouse and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ mouse assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best mouse/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The mouse sequence used in this annotation is from\ the Mar. 2005 (NCBI34/mm6) (mm6) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the mouse/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single mouse chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb mm6\ priority 370.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetMm6\ type bed 3\ visibility hide\ chainNetMm9 Mouse Chain/Net bed 3 Mouse (July 2007 (NCBI37/mm9)), Chain and Net Alignments 0 370.3 0 0 0 100 50 0 1 0 0

    Description

    \

    Chain Track

    \

    \ The chain track shows alignments of mouse (July 2007 (NCBI37/mm9)) to the\ human genome using a gap scoring system that allows longer gaps \ than traditional affine gap scoring systems. It can also tolerate gaps in both\ mouse and human simultaneously. These \ "double-sided" gaps can be caused by local inversions and \ overlapping deletions in both species. \

    \ The chain track displays boxes joined together by either single or\ double lines. The boxes represent aligning regions.\ Single lines indicate gaps that are largely due to a deletion in the\ mouse assembly or an insertion in the human \ assembly. Double lines represent more complex gaps that involve substantial\ sequence in both species. This may result from inversions, overlapping\ deletions, an abundance of local mutation, or an unsequenced gap in one\ species. In cases where multiple chains align over a particular region of\ the human genome, the chains with single-lined gaps are often \ due to processed pseudogenes, while chains with double-lined gaps are more \ often due to paralogs and unprocessed pseudogenes.

    \

    \ In the "pack" and "full" display\ modes, the individual feature names indicate the chromosome, strand, and\ location (in thousands) of the match for each matching alignment.

    \ \

    Net Track

    \

    \ The net track shows the best mouse/human chain for \ every part of the human genome. It is useful for\ finding orthologous regions and for studying genome\ rearrangement. The mouse sequence used in this annotation is from\ the July 2007 (NCBI37/mm9) (mm9) assembly.

    \ \

    Display Conventions and Configuration

    \

    Chain Track

    \

    By default, the chains to chromosome-based assemblies are colored\ based on which chromosome they map to in the aligning organism. To turn\ off the coloring, check the "off" button next to: Color\ track based on chromosome.

    \

    \ To display only the chains of one chromosome in the aligning\ organism, enter the name of that chromosome (e.g. chr4) in box next to: \ Filter by chromosome.

    \ \

    Net Track

    \

    \ In full display mode, the top-level (level 1)\ chains are the largest, highest-scoring chains that\ span this region. In many cases gaps exist in the\ top-level chain. When possible, these are filled in by\ other chains that are displayed at level 2. The gaps in \ level 2 chains may be filled by level 3 chains and so\ forth.

    \

    \ In the graphical display, the boxes represent ungapped \ alignments; the lines represent gaps. Click\ on a box to view detailed information about the chain\ as a whole; click on a line to display information\ about the gap. The detailed information is useful in determining\ the cause of the gap or, for lower level chains, the genomic\ rearrangement.

    \

    \ Individual items in the display are categorized as one of four types\ (other than gap):

    \

      \
    • Top - the best, longest match. Displayed on level 1.\
    • Syn - line-ups on the same chromosome as the gap in the level above\ it.\
    • Inv - a line-up on the same chromosome as the gap above it, but in \ the opposite orientation.\
    • NonSyn - a match to a chromosome different from the gap in the \ level above.\

    \ \

    Methods

    \

    Chain track

    \

    \ Transposons that have been inserted since the mouse/human\ split were removed from the assemblies. The abbreviated genomes were\ aligned with lastz, and the transposons were added back in.\ The resulting alignments were converted into axt format using the lavToAxt\ program. The axt alignments were fed into axtChain, which organizes all\ alignments between a single mouse chromosome and a single\ human chromosome into a group and creates a kd-tree out\ of the gapless subsections (blocks) of the alignments. A dynamic program\ was then run over the kd-trees to find the maximally scoring chains of these\ blocks.\ \ The following matrix was used:

    \

    \ \ \ \ \ \
     ACGT
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    \ \ \ Chains scoring below a minimum score of '3000' were discarded;\ the remaining chains are displayed in this track. The linear gap\ matrix used with axtChain:
    \
    -linearGap=medium\
    \
    tableSize    11\
    smallSize   111\
    position  1   2   3   11  111  2111  12111  32111   72111  152111  252111\
    qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900\
    bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300\
    
    \

    \ \

    Net track

    \

    \ Chains were derived from lastz alignments, using the methods\ described on the chain tracks description pages, and sorted with the \ highest-scoring chains in the genome ranked first. The program\ chainNet was then used to place the chains one at a time, trimming them as \ necessary to fit into sections not already covered by a higher-scoring chain. \ During this process, a natural hierarchy emerged in which a chain that filled \ a gap in a higher-scoring chain was placed underneath that chain. The program \ netSyntenic was used to fill in information about the relationship between \ higher- and lower-level chains, such as whether a lower-level\ chain was syntenic or inverted relative to the higher-level chain. \ The program netClass was then used to fill in how much of the gaps and chains \ contained Ns (sequencing gaps) in one or both species and how much\ was filled with transposons inserted before and after the two organisms \ diverged.

    \ \

    Credits

    \

    \ Lastz (previously known as blastz) was developed at\ Pennsylvania State University by \ Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from\ Ross Hardison.

    \

    \ Lineage-specific repeats were identified by Arian Smit and his \ RepeatMasker\ program.

    \

    \ The axtChain program was developed at the University of California at \ Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

    \

    \ The browser display and database storage of the chains and nets were created\ by Robert Baertsch and Jim Kent.

    \

    \ The chainNet, netSyntenic, and netClass programs were\ developed at the University of California\ Santa Cruz by Jim Kent.

    \

    \ \

    References

    \

    \ Chiaromonte F, Yap VB, Miller W. \ Scoring pairwise genomic sequence alignments. \ Pac Symp Biocomput. 2002;:115-26.

    \

    \ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron: Duplication, deletion, and rearrangement\ in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.

    \

    \ Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC,\ Haussler D, Miller W.\ Human-Mouse Alignments with BLASTZ. \ Genome Res. 2003 Jan;13(1):103-7.

    \ compGeno 1 altColor 100,50,0\ chainLinearGap medium\ chainMinScore 3000\ color 0,0,0\ compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html chainNet\ longLabel $o_Organism ($o_date), Chain and Net Alignments\ matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91\ matrixHeader A, C, G, T\ noInherit on\ otherDb mm9\ priority 370.3\ shortLabel $o_Organism Chain/Net\ sortOrder view=+\ spectrum on\ subGroup1 view Views chain=Chain net=Net\ track chainNetMm9\ type bed 3\ visibility hide\ chainNetBruMal1Viewchain Chain bed 3 bruMal1 (bruMal1), Chain and Net Alignments 3 370.3 0 0 0 255 255 0 1 0 0 compGeno 1 chainNetMm6Viewchain Chain bed 3 Mouse (Mar. 2005 (NCBI34/mm6)), Chain and Net Alignments 3 370.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMm7Viewchain Chain bed 3 Mouse (Aug. 2005 (NCBI35/mm7)), Chain and Net Alignments 3 370.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMm8Viewchain Chain bed 3 Mouse (Feb. 2006 (NCBI36/mm8)), Chain and Net Alignments 3 370.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMm9Viewchain Chain bed 3 Mouse (July 2007 (NCBI37/mm9)), Chain and Net Alignments 3 370.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBruMal1Viewnet Net bed 3 bruMal1 (bruMal1), Chain and Net Alignments 1 370.3 0 0 0 255 255 0 1 0 0 compGeno 1 chainNetMm6Viewnet Net bed 3 Mouse (Mar. 2005 (NCBI34/mm6)), Chain and Net Alignments 2 370.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMm7Viewnet Net bed 3 Mouse (Aug. 2005 (NCBI35/mm7)), Chain and Net Alignments 2 370.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMm8Viewnet Net bed 3 Mouse (Feb. 2006 (NCBI36/mm8)), Chain and Net Alignments 2 370.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMm9Viewnet Net bed 3 Mouse (July 2007 (NCBI37/mm9)), Chain and Net Alignments 2 370.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetSorAra1 Shrew Chain/Net bed 3 Shrew (June 2006 (Broad/sorAra1)), Chain and Net Alignments 0 380.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetSorAra1Viewchain Chain bed 3 Shrew (June 2006 (Broad/sorAra1)), Chain and Net Alignments 3 380.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetSorAra1Viewnet Net bed 3 Shrew (June 2006 (Broad/sorAra1)), Chain and Net Alignments 2 380.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetEriEur1 Hedgehog Chain/Net bed 3 Hedgehog (June 2006 (Broad/eriEur1)), Chain and Net Alignments 0 390.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetEriEur1Viewchain Chain bed 3 Hedgehog (June 2006 (Broad/eriEur1)), Chain and Net Alignments 3 390.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetEriEur1Viewnet Net bed 3 Hedgehog (June 2006 (Broad/eriEur1)), Chain and Net Alignments 2 390.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainSelf Self Chain chain hg16 Chained Self Alignments 0 400 100 50 0 255 240 200 1 0 0 varRep 1 netSelf Self Net netAlign hg16 chainSelf Chained Self Alignment Net 0 401 0 0 0 127 127 127 1 0 0 varRep 0 syntenySelf Self Synteny bed 4 . Self Synteny Using Blastz Single Coverage (100k window) 0 402 0 100 0 255 240 200 0 0 0 varRep 1 chainNetAilMel1 Panda Chain/Net bed 3 Panda (Dec. 2009 (BGI-Shenzhen 1.0/ailMel1)), Chain and Net Alignments 0 415.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetAilMel1Viewchain Chain bed 3 Panda (Dec. 2009 (BGI-Shenzhen 1.0/ailMel1)), Chain and Net Alignments 3 415.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetAilMel1Viewnet Net bed 3 Panda (Dec. 2009 (BGI-Shenzhen 1.0/ailMel1)), Chain and Net Alignments 2 415.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetCanFam2 Dog Chain/Net bed 3 Dog (May 2005 (Broad/canFam2)), Chain and Net Alignments 0 420.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetCanFam2Viewchain Chain bed 3 Dog (May 2005 (Broad/canFam2)), Chain and Net Alignments 3 420.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetCanFam2Viewnet Net bed 3 Dog (May 2005 (Broad/canFam2)), Chain and Net Alignments 2 420.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetFelCat3 Cat Chain/Net bed 3 Cat (Mar. 2006 (Broad/felCat3)), Chain and Net Alignments 0 430.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetFelCat4 Cat Chain/Net bed 3 Cat (Dec. 2008 (NHGRI/GTB V17e/felCat4)), Chain and Net Alignments 0 430.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetFelCat3Viewchain Chain bed 3 Cat (Mar. 2006 (Broad/felCat3)), Chain and Net Alignments 3 430.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetFelCat4Viewchain Chain bed 3 Cat (Dec. 2008 (NHGRI/GTB V17e/felCat4)), Chain and Net Alignments 3 430.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetFelCat3Viewnet Net bed 3 Cat (Mar. 2006 (Broad/felCat3)), Chain and Net Alignments 2 430.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetFelCat4Viewnet Net bed 3 Cat (Dec. 2008 (NHGRI/GTB V17e/felCat4)), Chain and Net Alignments 2 430.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetEquCab1 Horse Chain/Net bed 3 Horse (Jan. 2007 (Broad/equCab1)), Chain and Net Alignments 0 440.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetEquCab2 Horse Chain/Net bed 3 Horse (Sep. 2007 (Broad/equCab2)), Chain and Net Alignments 0 440.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetEquCab1Viewchain Chain bed 3 Horse (Jan. 2007 (Broad/equCab1)), Chain and Net Alignments 3 440.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetEquCab2Viewchain Chain bed 3 Horse (Sep. 2007 (Broad/equCab2)), Chain and Net Alignments 3 440.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetEquCab1Viewnet Net bed 3 Horse (Jan. 2007 (Broad/equCab1)), Chain and Net Alignments 2 440.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetEquCab2Viewnet Net bed 3 Horse (Sep. 2007 (Broad/equCab2)), Chain and Net Alignments 2 440.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau1 bosTau1 Chain/Net bed 3 Cow (Sep. 2004 (Baylor 1.0/bosTau1)), Chain and Net Alignments 0 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau2 bosTau2 Chain/Net bed 3 Cow (Mar. 2005 (Baylor 2.0/bosTau2)), Chain and Net Alignments 0 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau3 bosTau3 Chain/Net bed 3 Cow (Aug. 2006 (Baylor 3.1/bosTau3)), Chain and Net Alignments 0 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau6 bosTau6 Chain/Net bed 3 Cow (Nov. 2009 (Bos_taurus_UMD_3.1/bosTau6)), Chain and Net Alignments 0 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau4 Cow Chain/Net bed 3 Cow (Oct. 2007 (Baylor 4.0/bosTau4)), Chain and Net Alignments 0 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau1Viewchain Chain bed 3 Cow (Sep. 2004 (Baylor 1.0/bosTau1)), Chain and Net Alignments 3 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau2Viewchain Chain bed 3 Cow (Mar. 2005 (Baylor 2.0/bosTau2)), Chain and Net Alignments 3 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau3Viewchain Chain bed 3 Cow (Aug. 2006 (Baylor 3.1/bosTau3)), Chain and Net Alignments 3 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau4Viewchain Chain bed 3 Cow (Oct. 2007 (Baylor 4.0/bosTau4)), Chain and Net Alignments 3 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau6Viewchain Chain bed 3 Cow (Nov. 2009 (Bos_taurus_UMD_3.1/bosTau6)), Chain and Net Alignments 3 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau1Viewnet Net bed 3 Cow (Sep. 2004 (Baylor 1.0/bosTau1)), Chain and Net Alignments 2 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau2Viewnet Net bed 3 Cow (Mar. 2005 (Baylor 2.0/bosTau2)), Chain and Net Alignments 2 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau3Viewnet Net bed 3 Cow (Aug. 2006 (Baylor 3.1/bosTau3)), Chain and Net Alignments 2 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau4Viewnet Net bed 3 Cow (Oct. 2007 (Baylor 4.0/bosTau4)), Chain and Net Alignments 2 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBosTau6Viewnet Net bed 3 Cow (Nov. 2009 (Bos_taurus_UMD_3.1/bosTau6)), Chain and Net Alignments 2 450.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetSusScr1 susScr1 Chain/Net bed 3 susScr1 (susScr1), Chain and Net Alignments 0 475.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetSusScr2 Pig Chain/Net bed 3 Pig (Nov. 2009 (SGSC Sscrofa9.2/susScr2)), Chain and Net Alignments 0 475.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetSusScr1Viewchain Chain bed 3 susScr1 (susScr1), Chain and Net Alignments 3 475.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetSusScr2Viewchain Chain bed 3 Pig (Nov. 2009 (SGSC Sscrofa9.2/susScr2)), Chain and Net Alignments 3 475.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetSusScr1Viewnet Net bed 3 susScr1 (susScr1), Chain and Net Alignments 2 475.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetSusScr2Viewnet Net bed 3 Pig (Nov. 2009 (SGSC Sscrofa9.2/susScr2)), Chain and Net Alignments 2 475.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDasNov2 dasNov2 Chain/Net bed 3 Armadillo (Jul. 2008 (Broad/dasNov2)), Chain and Net Alignments 0 490.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDasNov2Viewchain Chain bed 3 Armadillo (Jul. 2008 (Broad/dasNov2)), Chain and Net Alignments 3 490.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDasNov2Viewnet Net bed 3 Armadillo (Jul. 2008 (Broad/dasNov2)), Chain and Net Alignments 2 490.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetHg19Patch2 hg19Patch2 Chain/Net bed 3 hg19Patch2/GRCh37.p2 (Aug. 2009 (GRCh37.p2/hg19Patch2)), Chain and Net Alignments 0 498 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetHg17 Human Chain/Net bed 3 Human (May 2004 (NCBI35/hg17)), Chain and Net Alignments 0 498 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetHg18 Human Chain/Net bed 3 Human (Mar. 2006 (NCBI36/hg18)), Chain and Net Alignments 0 498 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetHg19 Human Chain/Net bed 3 Human (Feb. 2009 (GRCh37/hg19)), Chain and Net Alignments 0 498 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetPanTro3 Chimp Chain/Net bed 3 Chimp (Oct. 2010 (CGSC 2.1.3/panTro3)), Chain and Net Alignments 0 498 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetTupBel1 Tree shrew Chain/Net bed 3 Tree shrew (Dec. 2006 (Broad/tupBel1)), Chain and Net Alignments 0 498 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetHg17Viewchain Chain bed 3 Human (May 2004 (NCBI35/hg17)), Chain and Net Alignments 3 498 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetHg18Viewchain Chain bed 3 Human (Mar. 2006 (NCBI36/hg18)), Chain and Net Alignments 3 498 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetHg19Viewchain Chain bed 3 Human (Feb. 2009 (GRCh37/hg19)), Chain and Net Alignments 3 498 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetHg19Patch2Viewchain Chain bed 3 hg19Patch2/GRCh37.p2 (Aug. 2009 (GRCh37.p2/hg19Patch2)), Chain and Net Alignments 3 498 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetPanTro3Viewchain Chain bed 3 Chimp (Oct. 2010 (CGSC 2.1.3/panTro3)), Chain and Net Alignments 3 498 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetTupBel1Viewchain Chain bed 3 Tree shrew (Dec. 2006 (Broad/tupBel1)), Chain and Net Alignments 3 498 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetHg17Viewnet Net bed 3 Human (May 2004 (NCBI35/hg17)), Chain and Net Alignments 2 498 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetHg18Viewnet Net bed 3 Human (Mar. 2006 (NCBI36/hg18)), Chain and Net Alignments 2 498 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetHg19Viewnet Net bed 3 Human (Feb. 2009 (GRCh37/hg19)), Chain and Net Alignments 2 498 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetHg19Patch2Viewnet Net bed 3 hg19Patch2/GRCh37.p2 (Aug. 2009 (GRCh37.p2/hg19Patch2)), Chain and Net Alignments 2 498 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetPanTro3Viewnet Net bed 3 Chimp (Oct. 2010 (CGSC 2.1.3/panTro3)), Chain and Net Alignments 2 498 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetTupBel1Viewnet Net bed 3 Tree shrew (Dec. 2006 (Broad/tupBel1)), Chain and Net Alignments 2 498 0 0 0 100 50 0 0 0 0 compGeno 1 pgSnp1kG 1000GenomesPilot bed 4 + Personal Genome Variants 3 500 0 0 0 127 127 127 0 0 0 varRep 1 chainNetVenterViewchain Chains bed 3 J. Craig Venter Chain and Net Alignments 3 500 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetPoodleViewchain Chains bed 3 Poodle Chain and Net Alignments 3 500 0 0 0 100 50 0 1 0 0 compGeno 1 pgSnp Genome Variants bed 4 + Personal Genome Variants 0 500 0 0 0 127 127 127 0 0 0 varRep 1 chainNetVenterViewnet Nets bed 3 J. Craig Venter Chain and Net Alignments 2 500 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetPoodleViewnet Nets bed 3 Poodle Chain and Net Alignments 2 500 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetPoodle Poodle Chain/Net bed 3 Poodle Chain and Net Alignments 0 500 0 0 0 100 50 0 0 0 0 compGeno 1 pgSnpPSU PSU Bushmen bed 4 + Personal Genome Variants 3 500 0 0 0 127 127 127 0 0 0 varRep 1 pgSnp1off Single Genomes bed 4 + Personal Genome Variants 3 500 0 0 0 127 127 127 0 0 0 varRep 1 chainNetVenter Venter Chain/Net bed 3 J. Craig Venter Chain and Net Alignments 0 500 0 0 0 100 50 0 0 0 0 compGeno 1 pgSnpCg2 Complete Genomics bed 4 + Personal Genome Variants - Hgwdev only 3 500.01 0 0 0 127 127 127 0 0 0 varRep 1 pgSnpHgwdev GenomeVarsHgwdev bed 4 + Personal Genome Variants - Hgwdev only 0 500.01 0 0 0 127 127 127 0 0 0 varRep 1 pgSnpPGP Personal Genome Project bed 4 + Personal Genome Variants - Hgwdev only 3 500.01 0 0 0 127 127 127 0 0 0 varRep 1 pgSnpPSU2 PSU Bushmen bed 4 + Personal Genome Variants - Hgwdev only 3 500.01 0 0 0 127 127 127 0 0 0 varRep 1 pgSnp1off2 Single Genomes bed 4 + Personal Genome Variants - Hgwdev only 3 500.01 0 0 0 127 127 127 0 0 0 varRep 1 chainNetEchTel1 Tenrec Chain/Net bed 3 Tenrec (July 2005 (Broad/echTel1)), Chain and Net Alignments 0 500.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetEchTel1Viewchain Chain bed 3 Tenrec (July 2005 (Broad/echTel1)), Chain and Net Alignments 3 500.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetEchTel1Viewnet Net bed 3 Tenrec (July 2005 (Broad/echTel1)), Chain and Net Alignments 2 500.3 0 0 0 100 50 0 1 0 0 compGeno 1 numtSeq NumtS Sequence bed 3 . Human NumtS mitochondrial sequence 0 500.5 0 0 0 127 127 127 0 0 0 varRep 1 encodeGencodeGeneFiltered Gencode Filtered genePred Gencode Filtered (Known and Novel) 0 501 0 0 0 127 127 127 0 0 20 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr8,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr9,chrX, encodeAnalysis 1 encodePromoters Promoters bed 3 + ENCODE Promoters 0 503 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 encodeTransFrags Transfrags bed 4 ENCODE Transfrags 0 504 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 encodeDnase DNAseI HS bed 4 ENCODE DNAseI HS 0 505 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 encodeRegulatory Regulatory Elements bed 3 + ENCODE Regulatory Elements 0 506 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 encodeDNDS dN/dS bed 4 + ENCODE Exons by dN/dS bins 0 507 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 encodeGencodeRegions Gencode Unann bed 4 . Gencode Unannotated Region Classification 0 508 0 0 0 127 127 127 0 0 0 encodeAnalysis 1 encodeWorkshopIntersections Intersect bed 5 ENCODE Workshop Intersections 0 508 0 0 0 127 127 127 0 0 0 encodeAnalysis 1 encodeWorkshopSelections Consens Unann TF bed 4 ENCODE Consensus Unannotated Transfrags 0 509 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 encodeNoncodingTransFrags Unann Transfrags bed 4 . Yale and Affymetrix Unannotated TransFrags 0 510 0 0 0 127 127 127 0 0 21 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX, encodeAnalysis 1 chainNetLoxAfr2 loxAfr2 Chain/Net bed 3 Elephant (Jul. 2008 (Broad/loxAfr2)), Chain and Net Alignments 0 520.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetLoxAfr1 Elephant Chain/Net bed 3 Elephant (May 2005 (Broad/loxAfr1)), Chain and Net Alignments 0 520.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetLoxAfr1Viewchain Chain bed 3 Elephant (May 2005 (Broad/loxAfr1)), Chain and Net Alignments 3 520.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetLoxAfr2Viewchain Chain bed 3 Elephant (Jul. 2008 (Broad/loxAfr2)), Chain and Net Alignments 3 520.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetLoxAfr1Viewnet Net bed 3 Elephant (May 2005 (Broad/loxAfr1)), Chain and Net Alignments 2 520.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetLoxAfr2Viewnet Net bed 3 Elephant (Jul. 2008 (Broad/loxAfr2)), Chain and Net Alignments 2 520.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMonDom1 Opossum Chain/Net bed 3 Opossum (Oct. 2004 (Broad prelim/monDom1)), Chain and Net Alignments 0 540.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMonDom4 Opossum Chain/Net bed 3 Opossum (Jan. 2006 (Broad/monDom4)), Chain and Net Alignments 0 540.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMonDom5 Opossum Chain/Net bed 3 Opossum (Oct. 2006 (Broad/monDom5)), Chain and Net Alignments 0 540.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMonDom1Viewchain Chain bed 3 Opossum (Oct. 2004 (Broad prelim/monDom1)), Chain and Net Alignments 3 540.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMonDom4Viewchain Chain bed 3 Opossum (Jan. 2006 (Broad/monDom4)), Chain and Net Alignments 3 540.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMonDom5Viewchain Chain bed 3 Opossum (Oct. 2006 (Broad/monDom5)), Chain and Net Alignments 3 540.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMonDom1Viewnet Net bed 3 Opossum (Oct. 2004 (Broad prelim/monDom1)), Chain and Net Alignments 2 540.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMonDom4Viewnet Net bed 3 Opossum (Jan. 2006 (Broad/monDom4)), Chain and Net Alignments 2 540.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMonDom5Viewnet Net bed 3 Opossum (Oct. 2006 (Broad/monDom5)), Chain and Net Alignments 2 540.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetOrnAna1 Platypus Chain/Net bed 3 Platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)), Chain and Net Alignments 0 550.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetOrnAna1Viewchain Chain bed 3 Platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)), Chain and Net Alignments 3 550.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetOrnAna1Viewnet Net bed 3 Platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)), Chain and Net Alignments 2 550.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetAnoCar1 Lizard Chain/Net bed 3 Lizard (Feb. 2007 (Broad/anoCar1)), Chain and Net Alignments 0 560.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetAnoCar2 Lizard Chain/Net bed 3 Lizard (May 2010 (Broad AnoCar2.0/anoCar2)), Chain and Net Alignments 0 560.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetAnoCar1Viewchain Chain bed 3 Lizard (Feb. 2007 (Broad/anoCar1)), Chain and Net Alignments 3 560.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetAnoCar2Viewchain Chain bed 3 Lizard (May 2010 (Broad AnoCar2.0/anoCar2)), Chain and Net Alignments 3 560.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetAnoCar1Viewnet Net bed 3 Lizard (Feb. 2007 (Broad/anoCar1)), Chain and Net Alignments 2 560.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetAnoCar2Viewnet Net bed 3 Lizard (May 2010 (Broad AnoCar2.0/anoCar2)), Chain and Net Alignments 2 560.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetTaeGut1 Zebra finch Chain/Net bed 3 Zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)), Chain and Net Alignments 0 570.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetTaeGut1Viewchain Chain bed 3 Zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)), Chain and Net Alignments 3 570.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetTaeGut1Viewnet Net bed 3 Zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)), Chain and Net Alignments 2 570.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetGalGal2 Chicken Chain/Net bed 3 Chicken (Feb. 2004 (WUGSC 1.0/galGal2)), Chain and Net Alignments 0 580.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetGalGal3 Chicken Chain/Net bed 3 Chicken (May 2006 (WUGSC 2.1/galGal3)), Chain and Net Alignments 0 580.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetGalGal2Viewchain Chain bed 3 Chicken (Feb. 2004 (WUGSC 1.0/galGal2)), Chain and Net Alignments 3 580.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetGalGal3Viewchain Chain bed 3 Chicken (May 2006 (WUGSC 2.1/galGal3)), Chain and Net Alignments 3 580.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetGalGal2Viewnet Net bed 3 Chicken (Feb. 2004 (WUGSC 1.0/galGal2)), Chain and Net Alignments 2 580.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetGalGal3Viewnet Net bed 3 Chicken (May 2006 (WUGSC 2.1/galGal3)), Chain and Net Alignments 2 580.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMelGal1 melGal1 Chain/Net bed 3 melGal1 (melGal1), Chain and Net Alignments 0 585.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMelGal1Viewchain Chain bed 3 melGal1 (melGal1), Chain and Net Alignments 3 585.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetMelGal1Viewnet Net bed 3 melGal1 (melGal1), Chain and Net Alignments 2 585.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetXenTro1 xenTro1 Chain/Net bed 3 X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)), Chain and Net Alignments 0 590.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetXenTro2 X. tropicalis Chain/Net bed 3 X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)), Chain and Net Alignments 0 590.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetXenTro1Viewchain Chain bed 3 X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)), Chain and Net Alignments 3 590.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetXenTro2Viewchain Chain bed 3 X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)), Chain and Net Alignments 3 590.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetXenTro1Viewnet Net bed 3 X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)), Chain and Net Alignments 2 590.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetXenTro2Viewnet Net bed 3 X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)), Chain and Net Alignments 2 590.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer2 danRer2 Chain/Net bed 3 Zebrafish (June 2004 (Zv4/danRer2)), Chain and Net Alignments 0 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer3 danRer3 Chain/Net bed 3 Zebrafish (May 2005 (Zv5/danRer3)), Chain and Net Alignments 0 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer4 danRer4 Chain/Net bed 3 Zebrafish (Mar. 2006 (Zv6/danRer4)), Chain and Net Alignments 0 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer5 Zebrafish Chain/Net bed 3 Zebrafish (July 2007 (Zv7/danRer5)), Chain and Net Alignments 0 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer6 Zebrafish Chain/Net bed 3 Zebrafish (Dec. 2008 (Zv8/danRer6)), Chain and Net Alignments 0 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer7 Zebrafish Chain/Net bed 3 Zebrafish (Jul. 2010 (Zv9/danRer7)), Chain and Net Alignments 0 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer2Viewchain Chain bed 3 Zebrafish (June 2004 (Zv4/danRer2)), Chain and Net Alignments 3 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer3Viewchain Chain bed 3 Zebrafish (May 2005 (Zv5/danRer3)), Chain and Net Alignments 3 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer4Viewchain Chain bed 3 Zebrafish (Mar. 2006 (Zv6/danRer4)), Chain and Net Alignments 3 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer5Viewchain Chain bed 3 Zebrafish (July 2007 (Zv7/danRer5)), Chain and Net Alignments 3 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer6Viewchain Chain bed 3 Zebrafish (Dec. 2008 (Zv8/danRer6)), Chain and Net Alignments 3 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer7Viewchain Chain bed 3 Zebrafish (Jul. 2010 (Zv9/danRer7)), Chain and Net Alignments 3 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer2Viewnet Net bed 3 Zebrafish (June 2004 (Zv4/danRer2)), Chain and Net Alignments 2 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer3Viewnet Net bed 3 Zebrafish (May 2005 (Zv5/danRer3)), Chain and Net Alignments 2 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer4Viewnet Net bed 3 Zebrafish (Mar. 2006 (Zv6/danRer4)), Chain and Net Alignments 2 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer5Viewnet Net bed 3 Zebrafish (July 2007 (Zv7/danRer5)), Chain and Net Alignments 2 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer6Viewnet Net bed 3 Zebrafish (Dec. 2008 (Zv8/danRer6)), Chain and Net Alignments 2 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetDanRer7Viewnet Net bed 3 Zebrafish (Jul. 2010 (Zv9/danRer7)), Chain and Net Alignments 2 600.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetOryLat1 oryLat1 Chain/Net bed 3 Medaka (Apr. 2006 (NIG/UT MEDAKA1/oryLat1)), Chain and Net Alignments 0 610.3 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetOryLat2 Medaka Chain/Net bed 3 Medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)), Chain and Net Alignments 0 610.3 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetOryLat1Viewchain Chain bed 3 Medaka (Apr. 2006 (NIG/UT MEDAKA1/oryLat1)), Chain and Net Alignments 3 610.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetOryLat2Viewchain Chain bed 3 Medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)), Chain and Net Alignments 3 610.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetOryLat1Viewnet Net bed 3 Medaka (Apr. 2006 (NIG/UT MEDAKA1/oryLat1)), Chain and Net Alignments 2 610.3 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetOryLat2Viewnet Net bed 3 Medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)), Chain and Net Alignments 2 610.3 0 0 0 100 50 0 0 0 0 compGeno 1 chainNetGasAcu1 Stickleback Chain/Net bed 3 Stickleback (Feb. 2006 (Broad/gasAcu1)), Chain and Net Alignments 0 620.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetGasAcu1Viewchain Chain bed 3 Stickleback (Feb. 2006 (Broad/gasAcu1)), Chain and Net Alignments 3 620.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetGasAcu1Viewnet Net bed 3 Stickleback (Feb. 2006 (Broad/gasAcu1)), Chain and Net Alignments 2 620.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetFr1 Fugu Chain/Net bed 3 Fugu (Aug. 2002 (JGI 3.0/fr1)), Chain and Net Alignments 0 630.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetFr2 Fugu Chain/Net bed 3 Fugu (Oct. 2004 (JGI 4.0/fr2)), Chain and Net Alignments 0 630.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetFr1Viewchain Chain bed 3 Fugu (Aug. 2002 (JGI 3.0/fr1)), Chain and Net Alignments 3 630.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetFr2Viewchain Chain bed 3 Fugu (Oct. 2004 (JGI 4.0/fr2)), Chain and Net Alignments 3 630.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetFr1Viewnet Net bed 3 Fugu (Aug. 2002 (JGI 3.0/fr1)), Chain and Net Alignments 2 630.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetFr2Viewnet Net bed 3 Fugu (Oct. 2004 (JGI 4.0/fr2)), Chain and Net Alignments 2 630.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetTetNig1 Tetraodon Chain/Net bed 3 Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)), Chain and Net Alignments 0 640.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetTetNig2 Tetraodon Chain/Net bed 3 Tetraodon (Mar. 2007 (Genoscope 8.0/tetNig2)), Chain and Net Alignments 0 640.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetTetNig1Viewchain Chain bed 3 Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)), Chain and Net Alignments 3 640.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetTetNig2Viewchain Chain bed 3 Tetraodon (Mar. 2007 (Genoscope 8.0/tetNig2)), Chain and Net Alignments 3 640.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetTetNig1Viewnet Net bed 3 Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)), Chain and Net Alignments 2 640.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetTetNig2Viewnet Net bed 3 Tetraodon (Mar. 2007 (Genoscope 8.0/tetNig2)), Chain and Net Alignments 2 640.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetPetMar1 Lamprey Chain/Net bed 3 Lamprey (Mar. 2007 (WUGSC 3.0/petMar1)), Chain and Net Alignments 0 650.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetPetMar1Viewchain Chain bed 3 Lamprey (Mar. 2007 (WUGSC 3.0/petMar1)), Chain and Net Alignments 3 650.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetPetMar1Viewnet Net bed 3 Lamprey (Mar. 2007 (WUGSC 3.0/petMar1)), Chain and Net Alignments 2 650.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetAplCal1 Sea hare Chain/Net bed 3 Sea hare (Sept. 2008 (Broad 2.0/aplCal1)), Chain and Net Alignments 0 655.3 0 0 0 255 255 0 1 0 0 compGeno 1 chainNetAplCal1Viewchain Chain bed 3 Sea hare (Sept. 2008 (Broad 2.0/aplCal1)), Chain and Net Alignments 3 655.3 0 0 0 255 255 0 1 0 0 compGeno 1 chainNetAplCal1Viewnet Net bed 3 Sea hare (Sept. 2008 (Broad 2.0/aplCal1)), Chain and Net Alignments 1 655.3 0 0 0 255 255 0 1 0 0 compGeno 1 chainNetBraFlo1 Lancelet Chain/Net bed 3 Lancelet (Mar. 2006 (JGI 1.0/braFlo1)), Chain and Net Alignments 0 660.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBraFlo1Viewchain Chain bed 3 Lancelet (Mar. 2006 (JGI 1.0/braFlo1)), Chain and Net Alignments 3 660.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetBraFlo1Viewnet Net bed 3 Lancelet (Mar. 2006 (JGI 1.0/braFlo1)), Chain and Net Alignments 2 660.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetStrPur2 S. purpuratus Chain/Net bed 3 S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)), Chain and Net Alignments 0 670.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetStrPur2Viewchain Chain bed 3 S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)), Chain and Net Alignments 3 670.3 0 0 0 100 50 0 1 0 0 compGeno 1 chainNetStrPur2Viewnet Net bed 3 S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)), Chain and Net Alignments 2 670.3 0 0 0 100 50 0 1 0 0 compGeno 1 t2g Publications bed 12 Mapped sequences from publications 0 800 20 50 100 137 152 177 0 0 0 map 1 ucsfBrainMethylViewCG CpG score bed 3 UCSF Brain DNA Methylation 2 1999 0 0 0 127 127 127 0 0 0 regulation 1 ucsfBrainMethylViewCOV Raw Signal bed 3 UCSF Brain DNA Methylation 2 1999 0 0 0 127 127 127 0 0 0 regulation 1 ucsfBrainMethyl UCSF Brain Methyl bed 3 UCSF Brain DNA Methylation 0 1999 0 0 0 127 127 127 0 0 0 regulation 1 blastCe3WB C. elegans Proteins psl protein C. elegans Proteins 0 2002 0 0 0 127 127 127 0 0 0 genes 1 blastDm2FB D. mel. Proteins (dm2) psl protein D. melanogaster Proteins (dm2) 0 2002 0 0 0 127 127 127 0 0 0 http://flybase.bio.indiana.edu/.bin/fbidq.html? genes 1 chainOryLat1 oryLat1 Chain chain oryLat1 Medaka (Apr. 2006 (NIG/UT MEDAKA1/oryLat1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainRheMac1 rheMac1 Chain chain rheMac1 Rhesus (Jan. 2005 (Baylor Mmul_0.1/rheMac1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainDanRer2 danRer2 Chain chain danRer2 Zebrafish (June 2004 (Zv4/danRer2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainDanRer3 danRer3 Chain chain danRer3 Zebrafish (May 2005 (Zv5/danRer3)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainXenTro1 xenTro1 Chain chain xenTro1 X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainCe6 ce6 Chain chain ce6 C. elegans (May 2008 (WS190/ce6)) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainCalJac1 calJac1 Chain chain calJac1 Marmoset (June 2007 (WUGSC 2.0.2/calJac1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainBosTau1 bosTau1 Chain chain bosTau1 Cow (Sep. 2004 (Baylor 1.0/bosTau1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainBosTau2 bosTau2 Chain chain bosTau2 Cow (Mar. 2005 (Baylor 2.0/bosTau2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainBosTau3 bosTau3 Chain chain bosTau3 Cow (Aug. 2006 (Baylor 3.1/bosTau3)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainMm7 mm7 Chain chain mm7 Mouse (Aug. 2005 (NCBI35/mm7)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainMm8 mm8 Chain chain mm8 Mouse (Feb. 2006 (NCBI36/mm8)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainSusScr1 susScr1 Chain chain susScr1 susScr1 (susScr1) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainDanRer4 danRer4 Net chain danRer4 Zebrafish (Mar. 2006 (Zv6/danRer4)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainGorGor1 Gorilla Chain chain gorGor1 Gorilla (Oct. 2008 (Sanger 0.1/gorGor1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainGorGor2 gorGor2 Chain chain gorGor2 gorGor2 (gorGor2) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainMelInc1 melInc1 Chain chain melInc1 melInc1 (melInc1) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainVenter1 venter1 Chain chain venter1 venter1 (venter1) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainRn3 Rat Chain chain rn3 Rat (June 2003 (Baylor 3.1/rn3)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainRn4 Rat Chain chain rn4 Rat (Nov. 2004 (Baylor 3.4/rn4)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainBraFlo1 Lancelet Chain chain braFlo1 Lancelet (Mar. 2006 (JGI 1.0/braFlo1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainOryLat2 Medaka Chain chain oryLat2 Medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainLoxAfr1 Elephant Chain chain loxAfr1 Elephant (May 2005 (Broad/loxAfr1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainLoxAfr2 Elephant Chain chain loxAfr2 Elephant (Jul. 2008 (Broad/loxAfr2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainLoxAfr3 Elephant Chain chain loxAfr3 Elephant (Jul. 2009 (Broad/loxAfr3)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainBruMal1 bruMal1 Chain chain bruMal1 bruMal1 (bruMal1) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainEriEur1 Hedgehog Chain chain eriEur1 Hedgehog (June 2006 (Broad/eriEur1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainCaeRem3 C. remanei Chain chain caeRem3 C. remanei (May 2007 (WUGSC 15.0.1/caeRem3)) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainEquCab1 Horse Chain chain equCab1 Horse (Jan. 2007 (Broad/equCab1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainEquCab2 Horse Chain chain equCab2 Horse (Sep. 2007 (Broad/equCab2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainMicMur1 Mouse lemur Chain chain MicMur1 Mouse lemur (Jun. 2003 (Broad/micMur1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainTaeGut1 Zebra finch Chain chain taeGut1 Zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainCavPor2 Guinea pig Chain chain cavPor2 Guinea pig (Oct. 2005 (Broad/cavPor2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainCavPor3 Guinea pig Chain chain cavPor3 Guinea pig (Feb. 2008 (Broad/cavPor3)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainTarSyr1 Tarsier Chain chain tarSyr1 Tarsier (Aug. 2008 (Broad/tarSyr1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainRheMac2 Rhesus Chain chain rheMac2 Rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainAilMel1 Panda Chain chain ailMel1 Panda (Dec. 2009 (BGI-Shenzhen 1.0/ailMel1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainGalGal2 Chicken Chain chain galGal2 Chicken (Feb. 2004 (WUGSC 1.0/galGal2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainGalGal3 Chicken Chain chain galGal3 Chicken (May 2006 (WUGSC 2.1/galGal3)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainDanRer1 Zebrafish Chain chain danRer1 Zebrafish (Nov. 2003 (Zv3/danRer1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainDanRer5 Zebrafish Chain chain danRer5 Zebrafish (July 2007 (Zv7/danRer5)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainDanRer6 Zebrafish Chain chain danRer6 Zebrafish (Dec. 2008 (Zv8/danRer6)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainDanRer7 Zebrafish Chain chain danRer7 Zebrafish (Jul. 2010 (Zv9/danRer7)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainAnoCar1 Lizard Chain chain anoCar1 Lizard (Feb. 2007 (Broad/anoCar1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainAnoCar2 Lizard Chain chain anoCar2 Lizard (May 2010 (Broad AnoCar2.0/anoCar2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainCanFam2 Dog Chain chain canFam2 Dog (May 2005 (Broad/canFam2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainAplCal1 Sea hare Chain chain aplCal1 Sea hare (Sept. 2008 (Broad 2.0/aplCal1)) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainMacEug1 Wallaby Chain chain macEug1 Wallaby (Nov. 2007 (Baylor 1.0/macEug1)) Chained Alignments 0 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainCaePb1 C. brenneri Chain chain caePb1 C. brenneri (Jan. 2007 (WUGSC 4.0/caePb1)) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainCaePb2 C. brenneri Chain chain caePb2 C. brenneri (Feb. 2008 (WUGSC 6.0.1/caePb2)) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainHaeCon1 haeCon1 Chain chain haeCon1 haeCon1 (haeCon1) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainMelGal1 melGal1 Chain chain melGal1 melGal1 (melGal1) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainCanFamPoodle1 canFamPoodle1 Chain chain canFamPoodle1 canFamPoodle1 (canFamPoodle1) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainOviAri1 Sheep Chain chain oviAri1 Sheep (Feb. 2010 (ISGC Ovis_aries_1.0/oviAri1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainGasAcu1 Stickleback Chain chain gasAcu1 Stickleback (Feb. 2006 (Broad/gasAcu1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainXenTro2 X. tropicalis Chain chain xenTro2 X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainXenTro3 xenTro3 Chain chain xenTro3 xenTro3 (xenTro3) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainFelCat3 Cat Chain chain felCat3 Cat (Mar. 2006 (Broad/felCat3)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainFelCat4 Cat Chain chain felCat4 Cat (Dec. 2008 (NHGRI/GTB V17e/felCat4)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainMonDom1 Opossum Chain chain monDom1 Opossum (Oct. 2004 (Broad prelim/monDom1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainMonDom4 Opossum Chain chain monDom4 Opossum (Jan. 2006 (Broad/monDom4)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainMonDom5 Opossum Chain chain monDom5 Opossum (Oct. 2006 (Broad/monDom5)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainPetMar1 Lamprey Chain chain petMar1 Lamprey (Mar. 2007 (WUGSC 3.0/petMar1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainHg17 Human Chain chain hg17 Human (May 2004 (NCBI35/hg17)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainHg18 Human Chain chain hg18 Human (Mar. 2006 (NCBI36/hg18)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainHg19 Human Chain chain hg19 Human (Feb. 2009 (GRCh37/hg19)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainPapHam1 Baboon Chain chain papHam1 Baboon (Nov. 2008 (Baylor 1.0/papHam1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainEchTel1 Tenrec Chain chain echTel1 Tenrec (July 2005 (Broad/echTel1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainMelHap1 melHap1 Chain chain melHap1 melHap1 (melHap1) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainCaeJap3 caeJap3 Chain chain caeJap3 caeJap3 (caeJap3) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainOrnAna1 Platypus Chain chain ornAna1 Platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainCb3 C. briggsae Chain chain cb3 C. briggsae (Jan. 2007 (WUGSC 1.0/cb3)) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainCe9 ce9 Chain chain ce9 ce9 (ce9) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainStrPur2 S. purpuratus Chain chain strPur2 S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainDasNov1 Armadillo Chain chain dasNov1 Armadillo (May 2005 (Broad/dasNov1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainDasNov2 Armadillo Chain chain dasNov2 Armadillo (Jul. 2008 (Broad/dasNov2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainFr1 Fugu Chain chain fr1 Fugu (Aug. 2002 (JGI 3.0/fr1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainFr2 Fugu Chain chain fr2 Fugu (Oct. 2004 (JGI 4.0/fr2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainCalJac3 Marmoset Chain chain calJac3 Marmoset (March 2009 (WUGSC 3.2/calJac3)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainHg19Patch2 GRCh37.p2 Chain chain hg19Patch2 GRCh37.p2 (Aug. 2009 (GRCh37.p2/hg19Patch2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainOryCun1 Rabbit Chain chain oryCun1 Rabbit (May 2005 (Broad/oryCun1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainOryCun2 Rabbit Chain chain oryCun2 Rabbit (Apr. 2009 (Broad/oryCun2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainPriPac1 P. pacificus Chain chain priPac1 P. pacificus (Feb. 2007 (WUGSC 5.0/priPac1)) Chained Alignments 3 10001 0 0 0 255 255 0 1 0 0 compGeno 1 chainTetNig1 Tetraodon Chain chain tetNig1 Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainTetNig2 Tetraodon Chain chain tetNig2 Tetraodon (Mar. 2007 (Genoscope 8.0/tetNig2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainPanTro2 Chimp Chain chain panTro2 Chimp (Mar. 2006 (CGSC 2.1/panTro2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainPanTro3 Chimp Chain chain panTro3 Chimp (Oct. 2010 (CGSC 2.1.3/panTro3)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainTupBel1 Tree shrew Chain chain tupBel1 Tree shrew (Dec. 2006 (Broad/tupBel1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainSorAra1 Shrew Chain chain sorAra1 Shrew (June 2006 (Broad/sorAra1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainBosTau4 Cow Chain chain bosTau4 Cow (Oct. 2007 (Baylor 4.0/bosTau4)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainBosTau6 Cow Chain chain bosTau6 Cow (Nov. 2009 (Bos_taurus_UMD_3.1/bosTau6)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainMm6 Mouse Chain chain mm6 Mouse (Mar. 2005 (NCBI34/mm6)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainMm9 Mouse Chain chain mm9 Mouse (July 2007 (NCBI37/mm9)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainSusScr2 Pig Chain chain susScr2 Pig (Nov. 2009 (SGSC Sscrofa9.2/susScr2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainPonAbe2 Orangutan Chain chain ponAbe2 Orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 chainOtoGar1 Bushbaby Chain chain otoGar1 Bushbaby (Dec. 2006 (Broad/otoGar1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 A375CytosolicPolyAPlusTnFg A375 TnFg bed 4 + A375 Cytosolic polyA+, Affy Transfrags 3 10001 35 35 175 160 160 188 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 bamAll All bam Contigs Generated from All Data 0 10001 0 0 0 127 127 127 0 0 0 neandertal 1 chainPanTro1 Chimp Chain chain panTro1 Chimp (Nov. 2003 (CGSC 1.1/panTro1)) Chained Alignments 3 10001 0 0 0 100 50 0 1 0 0 compGeno 1 covMask1kGPilotLowCovCeuDepth Depth CEU bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase: Abnormal Depth, CEU 0 10001 180 0 0 217 127 127 0 0 0 varRep 1 bamSLFeld1 Feld1 Sequence bam Feld1 Sequence Reads 0 10001 0 0 0 127 127 127 0 0 0 neandertal 1 encodeGisChipPetStat1Gif GIS STAT1 +gIF bed 12 GIS ChIP-PET: STAT1 Ab on gIF treated HeLa cells 0 10001 125 140 35 67 79 35 1 0 24 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX,chrY, encodeChip 1 ucsfChipSeqH3K4me3BrainCoverage H3K4me3 RawSignal bedGraph 4 H3K4me3 ChIP-seq Raw Signal 2 10001 0 200 0 200 100 0 0 0 0 regulation 0 pgKb1Comb KB1 bed 4 + KB1 Genome Variants, combination of 454, Illumina, and genotyping 3 10001 0 0 0 127 127 127 0 0 0 varRep 1 pgKb1454 KB1 454 bed 4 + KB1 Genome Variants, 454 3 10001 0 0 0 127 127 127 0 0 0 varRep 1 bamMMS9MbutiPygmy Mbuti Pygmy bam Mbuti Pygmy (HGDP00456) Sequence Reads 0 10001 0 0 0 127 127 127 0 0 0 denisova 1 chainMm5 Mm5 Chain chain mm5 mm5 (mm5) Chained Alignments 0 10001 0 0 0 100 50 0 1 0 0 compGeno 1 burgeRnaSeqGemMapperAlignBT474 RNA-seq BT474 bed 12 Burge Lab RNA-seq 32mer Reads from BT474 Breast Tumour Cell Line 1 10001 12 12 120 133 133 187 0 0 0 expression 1 bamMMS7 San Seq bam San (HGDP01029) Sequence Reads 0 10001 0 0 0 127 127 127 0 0 0 neandertal 1 netDanRer4 danRer4 Chain netAlign danRer4 chainDanRer4 Zebrafish (Mar. 2006 (Zv6/danRer4)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netOryLat1 oryLat1 Net netAlign oryLat1 chainOryLat1 Medaka (Apr. 2006 (NIG/UT MEDAKA1/oryLat1)) Alignment Net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netRheMac1 rheMac1 Net netAlign rheMac1 chainRheMac1 Rhesus (Jan. 2005 (Baylor Mmul_0.1/rheMac1)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netDanRer2 danRer2 Net netAlign danRer2 chainDanRer2 Zebrafish (June 2004 (Zv4/danRer2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netDanRer3 danRer3 Net netAlign danRer3 chainDanRer3 Zebrafish (May 2005 (Zv5/danRer3)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netXenTro1 xenTro1 Net netAlign xenTro1 chainXenTro1 X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netCe6 ce6 Net netAlign ce6 chainCe6 C. elegans (May 2008 (WS190/ce6)) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netCalJac1 calJac1 Net netAlign calJac1 chainCalJac1 Marmoset (June 2007 (WUGSC 2.0.2/calJac1)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netPanTro1 panTro1 Net netAlign panTro1 chainPanTro1 Chimp (Nov. 2003 (CGSC 1.1/panTro1)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netBosTau1 bosTau1 Net netAlign bosTau1 chainBosTau1 Cow (Sep. 2004 (Baylor 1.0/bosTau1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netBosTau2 bosTau2 Net netAlign bosTau2 chainBosTau2 Cow (Mar. 2005 (Baylor 2.0/bosTau2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netBosTau3 bosTau3 Net netAlign bosTau3 chainBosTau3 Cow (Aug. 2006 (Baylor 3.1/bosTau3)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netMm7 mm7 Net netAlign mm7 chainMm7 Mouse (Aug. 2005 (NCBI35/mm7)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netMm8 mm8 Net netAlign mm8 chainMm8 Mouse (Feb. 2006 (NCBI36/mm8)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netSusScr1 susScr1 Net netAlign susScr1 chainSusScr1 susScr1 (susScr1) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netGorGor1 Gorilla Net netAlign gorGor1 chainGorGor1 Gorilla (Oct. 2008 (Sanger 0.1/gorGor1)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netGorGor2 gorGor2 Net netAlign gorGor2 chainGorGor2 gorGor2 (gorGor2) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netMelInc1 melInc1 Net netAlign melInc1 chainMelInc1 melInc1 (melInc1) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netVenter1 venter1 Net netAlign venter1 chainVenter1 venter1 (venter1) Alignment Net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netRn3 Rat Net netAlign rn3 chainRn3 Rat (June 2003 (Baylor 3.1/rn3)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netRn4 Rat Net netAlign rn4 chainRn4 Rat (Nov. 2004 (Baylor 3.4/rn4)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netBraFlo1 Lancelet Net netAlign braFlo1 chainBraFlo1 Lancelet (Mar. 2006 (JGI 1.0/braFlo1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netOryLat2 Medaka Net netAlign oryLat2 chainOryLat2 Medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)) Alignment Net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netLoxAfr1 Elephant Net netAlign loxAfr1 chainLoxAfr1 Elephant (May 2005 (Broad/loxAfr1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netLoxAfr2 Elephant Net netAlign loxAfr2 chainLoxAfr2 Elephant (Jul. 2008 (Broad/loxAfr2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netLoxAfr3 Elephant Net netAlign loxAfr3 chainLoxAfr3 Elephant (Jul. 2009 (Broad/loxAfr3)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netBruMal1 bruMal1 Net netAlign bruMal1 chainBruMal1 bruMal1 (bruMal1) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netEriEur1 Hedgehog Net netAlign eriEur1 chainEriEur1 Hedgehog (June 2006 (Broad/eriEur1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netCaeRem3 C. remanei Net netAlign caeRem3 chainCaeRem3 C. remanei (May 2007 (WUGSC 15.0.1/caeRem3)) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netEquCab1 Horse Net netAlign equCab1 chainEquCab1 Horse (Jan. 2007 (Broad/equCab1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netEquCab2 Horse Net netAlign equCab2 chainEquCab2 Horse (Sep. 2007 (Broad/equCab2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netMicMur1 Mouse lemur Net netAlign MicMur1 chainMicMur1 Mouse lemur (Jun. 2003 (Broad/micMur1)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netTaeGut1 Zebra finch Net netAlign taeGut1 chainTaeGut1 Zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netCavPor2 Guinea pig Net netAlign cavPor2 chainCavPor2 Guinea pig (Oct. 2005 (Broad/cavPor2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netCavPor3 Guinea pig Net netAlign cavPor3 chainCavPor3 Guinea pig (Feb. 2008 (Broad/cavPor3)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netTarSyr1 Tarsier Net netAlign tarSyr1 chainTarSyr1 Tarsier (Aug. 2008 (Broad/tarSyr1)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netRheMac2 Rhesus Net netAlign rheMac2 chainRheMac2 Rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netAilMel1 Panda Net netAlign ailMel1 chainAilMel1 Panda (Dec. 2009 (BGI-Shenzhen 1.0/ailMel1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netGalGal2 Chicken Net netAlign galGal2 chainGalGal2 Chicken (Feb. 2004 (WUGSC 1.0/galGal2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netGalGal3 Chicken Net netAlign galGal3 chainGalGal3 Chicken (May 2006 (WUGSC 2.1/galGal3)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netDanRer1 Zebrafish Net netAlign danRer1 chainDanRer1 Zebrafish (Nov. 2003 (Zv3/danRer1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netDanRer5 Zebrafish Net netAlign danRer5 chainDanRer5 Zebrafish (July 2007 (Zv7/danRer5)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netDanRer6 Zebrafish Net netAlign danRer6 chainDanRer6 Zebrafish (Dec. 2008 (Zv8/danRer6)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netDanRer7 Zebrafish Net netAlign danRer7 chainDanRer7 Zebrafish (Jul. 2010 (Zv9/danRer7)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netAnoCar1 Lizard Net netAlign anoCar1 chainAnoCar1 Lizard (Feb. 2007 (Broad/anoCar1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netAnoCar2 Lizard Net netAlign anoCar2 chainAnoCar2 Lizard (May 2010 (Broad AnoCar2.0/anoCar2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netCanFam2 Dog Net netAlign canFam2 chainCanFam2 Dog (May 2005 (Broad/canFam2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netAplCal1 Sea hare Net netAlign aplCal1 chainAplCal1 Sea hare (Sept. 2008 (Broad 2.0/aplCal1)) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netMacEug1 Wallaby Net netAlign macEug1 chainMacEug1 Wallaby (Nov. 2007 (Baylor 1.0/macEug1)) Alignment Net 0 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netCaePb1 C. brenneri Net netAlign caePb1 chainCaePb1 C. brenneri (Jan. 2007 (WUGSC 4.0/caePb1)) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netCaePb2 C. brenneri Net netAlign caePb2 chainCaePb2 C. brenneri (Feb. 2008 (WUGSC 6.0.1/caePb2)) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netHaeCon1 haeCon1 Net netAlign haeCon1 chainHaeCon1 haeCon1 (haeCon1) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netMelGal1 melGal1 Net netAlign melGal1 chainMelGal1 melGal1 (melGal1) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netCanFamPoodle1 canFamPoodle1 Net netAlign canFamPoodle1 chainCanFamPoodle1 canFamPoodle1 (canFamPoodle1) Alignment Net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netOviAri1 Sheep Net netAlign oviAri1 chainOviAri1 Sheep (Feb. 2010 (ISGC Ovis_aries_1.0/oviAri1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netGasAcu1 Stickleback Net netAlign gasAcu1 chainGasAcu1 Stickleback (Feb. 2006 (Broad/gasAcu1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netXenTro2 X. tropicalis Net netAlign xenTro2 chainXenTro2 X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netXenTro3 xenTro3 Net netAlign xenTro3 chainXenTro3 xenTro3 (xenTro3) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netFelCat3 Cat Net netAlign felCat3 chainFelCat3 Cat (Mar. 2006 (Broad/felCat3)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netFelCat4 Cat Net netAlign felCat4 chainFelCat4 Cat (Dec. 2008 (NHGRI/GTB V17e/felCat4)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netMonDom1 Opossum Net netAlign monDom1 chainMonDom1 Opossum (Oct. 2004 (Broad prelim/monDom1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netMonDom4 Opossum Net netAlign monDom4 chainMonDom4 Opossum (Jan. 2006 (Broad/monDom4)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netMonDom5 Opossum Net netAlign monDom5 chainMonDom5 Opossum (Oct. 2006 (Broad/monDom5)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netPetMar1 Lamprey Net netAlign petMar1 chainPetMar1 Lamprey (Mar. 2007 (WUGSC 3.0/petMar1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netHg17 Human Net netAlign hg17 chainHg17 Human (May 2004 (NCBI35/hg17)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netHg18 Human Net netAlign hg18 chainHg18 Human (Mar. 2006 (NCBI36/hg18)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netHg19 Human Net netAlign hg19 chainHg19 Human (Feb. 2009 (GRCh37/hg19)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netPapHam1 Baboon Net netAlign papHam1 chainPapHam1 Baboon (Nov. 2008 (Baylor 1.0/papHam1)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netEchTel1 Tenrec Net netAlign echTel1 chainEchTel1 Tenrec (July 2005 (Broad/echTel1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netMelHap1 melHap1 Net netAlign melHap1 chainMelHap1 melHap1 (melHap1) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netCaeJap3 caeJap3 Net netAlign caeJap3 chainCaeJap3 caeJap3 (caeJap3) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netOrnAna1 Platypus Net netAlign ornAna1 chainOrnAna1 Platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netCb3 C. briggsae Net netAlign cb3 chainCb3 C. briggsae (Jan. 2007 (WUGSC 1.0/cb3)) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netCe9 ce9 Net netAlign ce9 chainCe9 ce9 (ce9) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netStrPur2 S. purpuratus Net netAlign strPur2 chainStrPur2 S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netDasNov1 Armadillo Net netAlign dasNov1 chainDasNov1 Armadillo (May 2005 (Broad/dasNov1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netDasNov2 Armadillo Net netAlign dasNov2 chainDasNov2 Armadillo (Jul. 2008 (Broad/dasNov2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netFr1 Fugu Net netAlign fr1 chainFr1 Fugu (Aug. 2002 (JGI 3.0/fr1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netFr2 Fugu Net netAlign fr2 chainFr2 Fugu (Oct. 2004 (JGI 4.0/fr2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netCalJac3 Marmoset Net netAlign calJac3 chainCalJac3 Marmoset (March 2009 (WUGSC 3.2/calJac3)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netHg19Patch2 GRCh37.p2 Net netAlign hg19Patch2 chainHg19Patch2 GRCh37.p2 (Aug. 2009 (GRCh37.p2/hg19Patch2)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netOryCun1 Rabbit Net netAlign oryCun1 chainOryCun1 Rabbit (May 2005 (Broad/oryCun1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netOryCun2 Rabbit Net netAlign oryCun2 chainOryCun2 Rabbit (Apr. 2009 (Broad/oryCun2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netPriPac1 P. pacificus Net netAlign priPac1 chainPriPac1 P. pacificus (Feb. 2007 (WUGSC 5.0/priPac1)) Alignment Net 1 10002 0 0 0 255 255 0 1 0 0 compGeno 0 netTetNig1 Tetraodon Net netAlign tetNig1 chainTetNig1 Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netTetNig2 Tetraodon Net netAlign tetNig2 chainTetNig2 Tetraodon (Mar. 2007 (Genoscope 8.0/tetNig2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netPanTro2 Chimp Net netAlign panTro2 chainPanTro2 Chimp (Mar. 2006 (CGSC 2.1/panTro2)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netPanTro3 Chimp Net netAlign panTro3 chainPanTro3 Chimp (Oct. 2010 (CGSC 2.1.3/panTro3)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netTupBel1 Tree shrew Net netAlign tupBel1 chainTupBel1 Tree shrew (Dec. 2006 (Broad/tupBel1)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netSorAra1 Shrew Net netAlign sorAra1 chainSorAra1 Shrew (June 2006 (Broad/sorAra1)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netBosTau4 Cow Net netAlign bosTau4 chainBosTau4 Cow (Oct. 2007 (Baylor 4.0/bosTau4)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netBosTau6 Cow Net netAlign bosTau6 chainBosTau6 Cow (Nov. 2009 (Bos_taurus_UMD_3.1/bosTau6)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netMm6 Mouse Net netAlign mm6 chainMm6 Mouse (Mar. 2005 (NCBI34/mm6)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netMm9 Mouse Net netAlign mm9 chainMm9 Mouse (July 2007 (NCBI37/mm9)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netSusScr2 Pig Net netAlign susScr2 chainSusScr2 Pig (Nov. 2009 (SGSC Sscrofa9.2/susScr2)) Alignment Net 2 10002 0 0 0 100 50 0 1 0 0 compGeno 0 netPonAbe2 Orangutan Net netAlign ponAbe2 chainPonAbe2 Orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 netOtoGar1 Bushbaby Net netAlign otoGar1 chainOtoGar1 Bushbaby (Dec. 2006 (Broad/otoGar1)) Alignment net 2 10002 0 0 0 100 50 0 0 0 0 compGeno 0 A375CytosolicPolyAPlusTxn A375 Txn wig 0 5251.55 A375 Cytosolic polyA+, Affy Transcriptome 0 10002 175 150 128 255 128 0 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 covMask1kGPilotLowCovChbJptDepth Depth CHB/JPT bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase: Abnormal Depth, CHB/JPT 0 10002 180 0 0 217 127 127 0 0 0 varRep 1 bamFeld1 Feld1 Contigs bam Feld1 Contigs 0 10002 0 0 0 127 127 127 0 0 0 neandertal 1 encodeGisChipPetStat1NoGif GIS STAT1 -gIF bed 12 GIS ChIP-PET: STAT1 Ab on untreated HeLa cells 0 10002 125 140 35 67 79 35 1 0 24 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX,chrY, encodeChip 1 pgKb1Illum KB1 Illumina bed 4 + KB1 Genome Variants, Illumina 23.2X 3 10002 0 0 0 127 127 127 0 0 0 varRep 1 pgKb1Indel KB1 indels bed 4 + KB1 indels from 454 and Illumina 3 10002 0 0 0 127 127 127 0 0 0 varRep 1 bamMMS12Melanesian Melanesian Seq bam Melanesian (HGDP00491) Sequence Reads 0 10002 0 0 0 127 127 127 0 0 0 denisova 1 bamSLMez1 Mez1 Sequence bam Mez1 Sequence Reads 0 10002 0 0 0 127 127 127 0 0 0 neandertal 1 netMm5 Mm5 Net netAlign mm5 chainMm5 mm5 (mm5) Alignment Net 0 10002 0 0 0 100 50 0 1 0 0 compGeno 0 ucsfMreSeqBrainCpG MRE CpG bedGraph 4 MRE-seq CpG Score 2 10002 0 100 0 200 100 0 0 0 0 regulation 0 burgeRnaSeqGemMapperAlignHME RNA-seq HME bed 12 Burge Lab RNA-seq 32mer Reads from HME (Human Mammary Epithelial) Cell Line 1 10002 12 12 120 133 133 187 0 0 0 expression 1 bamMMS6 Yoruba Seq bam Yoruba (HGDP00927) Sequence Reads 0 10002 0 0 0 127 127 127 0 0 0 neandertal 1 covMask1kGPilotLowCovYriDepth Depth YRI bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase: Abnormal Depth, YRI 0 10003 180 0 0 217 127 127 0 0 0 varRep 1 FHs738LuCytosolicPolyAPlusTnFg FHs738Lu TnFg bed 4 + FHs738Lu Cytosolic polyA+, Affy Transfrags 3 10003 35 35 175 160 160 188 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 bamMMS4 Han Seq bam Han (HGDP00778) Sequence Reads 0 10003 0 0 0 127 127 127 0 0 0 neandertal 1 ucsfMedipSeqBrainCpG MeDIP CpG bedGraph 4 MeDIP-seq CpG Score 2 10003 100 0 0 200 100 0 0 0 0 regulation 0 bamMez1 Mez1 Contigs bam Mez1 Contigs 0 10003 0 0 0 127 127 127 0 0 0 neandertal 1 pgNb1 NB1 bed 4 + NB1 Genome Variants (all SNPs, 2X genome plus 16x exome) 3 10003 0 0 0 127 127 127 0 0 0 varRep 1 bamMMS16Papuan Papuan Seq bam Papuan (HGDP00551) Sequence Reads 0 10003 0 0 0 127 127 127 0 0 0 denisova 1 burgeRnaSeqGemMapperAlignMB435 RNA-seq MB435 bed 12 Burge Lab RNA-seq 32mer Reads from MB-435 Cell Line 1 10003 12 12 120 133 133 187 0 0 0 expression 1 bamSLSid1253 Sid1253 Sequence bam Sid1253 Sequence Reads 0 10003 0 0 0 127 127 127 0 0 0 neandertal 1 pgYoruban2 YRI '8507, SOLiD bed 4 + YRI NA18507 Sequenced on the SOLiD Platform (PSU) 3 10003 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12878 CEU NA12878 bed 4 + CEU pedigree 1463, NA12878 (Complete Genomics) 3 10004 0 0 0 127 127 127 0 0 0 varRep 1 FHs738LuCytosolicPolyAPlusTxn FHs738Lu Txn wig 0 5451.35 FHs738Lu Cytosolic polyA+, Affy Transcriptome 0 10004 175 150 128 255 128 0 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 covMask1kGPilotLowCovCeuMapQ Map Qual CEU bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase: Mapping Quality Failure, CEU 0 10004 224 108 108 239 181 181 0 0 0 varRep 1 ucsfMedipSeqBrainCoverage MeDIP RawSignal bedGraph 4 MeDIP-seq Raw Signal 2 10004 100 0 0 200 100 0 0 0 0 regulation 0 pgNb1Indel NB1 indels bed 4 + NB1 Genome Variants indels 3 10004 0 0 0 127 127 127 0 0 0 varRep 1 bamMMS5 Papuan Seq bam Papuan (HGDP00542) Sequence Reads 0 10004 0 0 0 127 127 127 0 0 0 neandertal 1 burgeRnaSeqGemMapperAlignMCF7 RNA-seq MCF7 bed 12 Burge Lab RNA-seq 32mer Reads from MCF-7 Breast Adenocarcinoma Cell Line 1 10004 12 12 120 133 133 187 0 0 0 expression 1 bamMMS11Sardinian Sardinian Seq bam Sardinian (HGDP00665) Sequence Reads 0 10004 0 0 0 127 127 127 0 0 0 denisova 1 bamSid1253 Sid1253 Contigs bam Sid1253 Contigs 0 10004 0 0 0 127 127 127 0 0 0 neandertal 1 bamSLVi33dot16 Vi33.16 Sequence bam Vi33.16 Sequence Reads 0 10004 0 0 0 127 127 127 0 0 0 neandertal 1 bamMMS13Cambodian Cambodian Seq bam Cambodian (HGDP00711) Sequence Reads 0 10005 0 0 0 127 127 127 0 0 0 denisova 1 pgGS12878indel CEU NA12878 indel bed 4 + CEU NA12878 indel (Complete Genomics) 3 10005 0 0 0 127 127 127 0 0 0 varRep 1 bamMMS8 French Seq bam French (HGDP00521) Sequence Reads 0 10005 0 0 0 127 127 127 0 0 0 neandertal 1 HepG2CytosolicPolyAPlusTnFg HepG2+ Cyto TnFg bed 4 + HepG2 Cytosolic polyA+, Affy Transfrags 3 10005 35 35 175 160 160 188 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 covMask1kGPilotLowCovChbJptMapQ Map Qual CHB/JPT bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase: Mapping Quality Failure, CHB/JPT 0 10005 224 108 108 239 181 181 0 0 0 varRep 1 pgMd8 MD8 bed 4 + MD8 Genome Variants (all SNPs, 16x exome) 3 10005 0 0 0 127 127 127 0 0 0 varRep 1 ucsfRnaSeqBrainAllCoverage RNA-seq RawSignal bedGraph 4 RNA-seq Raw Signal 2 10005 0 0 200 100 0 0 0 0 0 regulation 0 burgeRnaSeqGemMapperAlignT47D RNA-seq T47D bed 12 Burge Lab RNA-seq 32mer Reads from T-47D Breast Ductal Carcinoma Cell Line 1 10005 12 12 120 133 133 187 0 0 0 expression 1 bamVi33dot16 Vi33.16 Contigs bam Vi33.16 Contigs 0 10005 0 0 0 127 127 127 0 0 0 neandertal 1 bamSLVi33dot25 Vi33.25 Sequence bam Vi33.25 Sequence Reads 0 10005 0 0 0 127 127 127 0 0 0 neandertal 1 pgGS12891 CEU NA12891 bed 4 + CEU pedigree 1463, NA12891 (Complete Genomics) 3 10006 0 0 0 127 127 127 0 0 0 varRep 1 HepG2CytosolicPolyAPlusTxn HepG2+ Cyto Txn wig 0 6183.74 HepG2 Cytosolic polyA+, Affy Transcriptome 0 10006 175 150 128 255 128 0 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 covMask1kGPilotLowCovYriMapQ Map Qual YRI bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase: Mapping Quality Failure, YRI 0 10006 224 108 108 239 181 181 0 0 0 varRep 1 pgMd8Indel MD8 indels bed 4 + MD8 Genome Variants indels 3 10006 0 0 0 127 127 127 0 0 0 varRep 1 bamMMS10NativeAmerican Native American bam Native American (HGDP00998) Sequence Reads 0 10006 0 0 0 127 127 127 0 0 0 denisova 1 burgeRnaSeqGemMapperAlignAdipose RNA-seq Adipose bed 12 Burge Lab RNA-seq 32mer Reads from Adipose 1 10006 12 12 120 133 133 187 0 0 0 expression 1 ucsfRnaSeqBrainSmartCoverage Smart RawSignal bedGraph 4 RNA-seq Smart-Tagged Raw Signal 2 10006 0 0 200 100 0 0 0 0 0 regulation 0 bamVi33dot25 Vi33.25 Contigs bam Vi33.25 Contigs 0 10006 0 0 0 127 127 127 0 0 0 neandertal 1 bamSLVi33dot26 Vi33.26 Sequence bam Vi33.26 Sequence Reads 0 10006 0 0 0 127 127 127 0 0 0 neandertal 1 pgGS12891indel CEU NA12891 indel bed 4 + CEU NA12891 indel (Complete Genomics) 3 10007 0 0 0 127 127 127 0 0 0 varRep 1 HepG2NuclearPolyAPlusTnFg HepG2+ Nuc TnFg bed 4 + HepG2 Nuclear polyA+, Affy Transfrags 3 10007 35 35 175 160 160 188 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 bamMMS14Mongolian Mongolian Seq bam Mongolian (HGDP01224) Sequence Reads 0 10007 0 0 0 127 127 127 0 0 0 denisova 1 covMask1kGPilotLowCovCeuUncov No Cov CEU bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase: No Coverage, CEU 0 10007 150 150 150 202 202 202 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignBrain RNA-seq Brain bed 12 Burge Lab RNA-seq 32mer Reads from Brain 1 10007 12 12 120 133 133 187 0 0 0 expression 1 pgTk1 TK1 bed 4 + TK1 Genome Variants (all SNPs, 16x exome) 3 10007 0 0 0 127 127 127 0 0 0 varRep 1 bamVi33dot26 Vi33.26 Contigs bam Vi33.26 Contigs 0 10007 0 0 0 127 127 127 0 0 0 neandertal 1 pgGS12892 CEU NA12892 bed 4 + CEU pedigree 1463, NA12892 (Complete Genomics) 3 10008 0 0 0 127 127 127 0 0 0 varRep 1 HepG2NuclearPolyAPlusTxn HepG2+ Nuc Txn wig 0 4206.84 HepG2 Nuclear polyA+, Affy Transcriptome 0 10008 175 150 128 255 128 0 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 covMask1kGPilotLowCovChbJptUncov No Cov CHB/JPT bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase: No Coverage, CHB/JPT 0 10008 150 150 150 202 202 202 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignBreast RNA-seq Breast bed 12 Burge Lab RNA-seq 32mer Reads from Breast 1 10008 12 12 120 133 133 187 0 0 0 expression 1 pgTk1Indel TK1 indels bed 4 + TK1 Genome Variants indels 3 10008 0 0 0 127 127 127 0 0 0 varRep 1 pgAbtSolid ABT bed 4 + ABT Genome Variants, SOLiD 3 10009 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12892indel CEU NA12892 indel bed 4 + CEU NA12892 indel (Complete Genomics) 3 10009 0 0 0 127 127 127 0 0 0 varRep 1 HepG2CytosolicPolyAMinusTnFg HepG2- Cyto TnFg bed 4 + HepG2 Cytosolic polyA-, Affy Transfrags 3 10009 35 35 175 160 160 188 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 covMask1kGPilotLowCovYriUncov No Cov YRI bed 3 Coverage Analysis from the 1000 Genomes Project Pilot Phase: No Coverage, YRI 0 10009 150 150 150 202 202 202 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignColon RNA-seq Colon bed 12 Burge Lab RNA-seq 32mer Reads from Colon 1 10009 12 12 120 133 133 187 0 0 0 expression 1 pgAbt454 ABT exome bed 4 + ABT Genome Variants, 454 exome 3 10010 0 0 0 127 127 127 0 0 0 varRep 1 covMask1kGPilotLowCovUnionDepth Depth Union bed 3 Union of all populations' depth masks 1 10010 180 0 0 217 127 127 0 0 0 varRep 1 HepG2CytosolicPolyAMinusTxn HepG2- Cyto Txn wig 0 3571.88 HepG2 Cytosolic polyA-, Affy Transcriptome 0 10010 175 150 128 255 128 0 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 burgeRnaSeqGemMapperAlignHeart RNA-seq Heart bed 12 Burge Lab RNA-seq 32mer Reads from Heart 1 10010 12 12 120 133 133 187 0 0 0 expression 1 pgGS19240 YRI NA19240 bed 4 + YRI NA19240 (Daughter) (Complete Genomics) 3 10010 0 0 0 127 127 127 0 0 0 varRep 1 pgAbt454indels ABT exome indels bed 4 + ABT Genome Variants, 454 exome indels 3 10011 0 0 0 127 127 127 0 0 0 varRep 1 HepG2NuclearPolyAMinusTnFg HepG2- Nuc TnFg bed 4 + HepG2 Nuclear polyA-, Affy Transfrags 3 10011 35 35 175 160 160 188 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 covMask1kGPilotLowCovUnionMapQ MapQ Union bed 3 Union of all populations' mapping quality masks 1 10011 224 108 108 239 181 181 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignLiver RNA-seq Liver bed 12 Burge Lab RNA-seq 32mer Reads from Liver 1 10011 12 12 120 133 133 187 0 0 0 expression 1 pgGS19240indel YRI NA19240 indel bed 4 + YRI NA19240 (Daughter) indel (Complete Genomics) 3 10011 0 0 0 127 127 127 0 0 0 varRep 1 pgAbtIllum ABT Illum bed 4 + ABT Genome Variants, Illumina 7.2X 3 10012 0 0 0 127 127 127 0 0 0 varRep 1 HepG2NuclearPolyAMinusTxn HepG2- Nuc Txn wig 0 2656.57 HepG2 Nuclear polyA-, Affy Transcriptome 0 10012 175 150 128 255 128 0 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 burgeRnaSeqGemMapperAlignLymphNode RNA-seq Lymph Node bed 12 Burge Lab RNA-seq 32mer Reads from Lymph Node 1 10012 12 12 120 133 133 187 0 0 0 expression 1 covMask1kGPilotLowCovIntersectionUncov Uncov Intsct bed 3 Intersection of all populations' uncovered regions 1 10012 150 150 150 202 202 202 0 0 0 varRep 1 pgGS19238 YRI NA19238 bed 4 + YRI NA19238 (Mother) (Complete Genomics) 3 10012 0 0 0 127 127 127 0 0 0 varRep 1 pgNA12878 CEU daught '2878 bed 4 + CEU Trio Daughter NA12878 (1000 Genomes Project) 3 10013 128 64 0 191 159 127 0 0 0 varRep 1 JurkatCytosolicPolyAPlusTnFg Jurkat TnFg bed 4 + Jurkat Cytosolic polyA+, Affy Transfrags 3 10013 35 35 175 160 160 188 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 burgeRnaSeqGemMapperAlignSkelMuscle RNA-seq Muscle bed 12 Burge Lab RNA-seq 32mer Reads from Skeletal Muscle 1 10013 12 12 120 133 133 187 0 0 0 expression 1 covMask1kGPilotLowCovUnionUncov Uncov Union bed 3 Union of all populations' uncovered regions 1 10013 150 150 150 202 202 202 0 0 0 varRep 1 pgGS19238indel YRI NA19238 indel bed 4 + YRI NA19238 (Mother) indel (Complete Genomics) 3 10013 0 0 0 127 127 127 0 0 0 varRep 1 pgNA12891 CEU father '2891 bed 4 + CEU Trio Father NA12891 (1000 Genomes Project) 3 10014 128 64 0 191 159 127 0 0 0 varRep 1 JurkatCytosolicPolyAPlusTxn Jurkat Txn wig 0 5203.76 Jurkat Cytosolic polyA+, Affy Transcriptome 0 10014 175 150 128 255 128 0 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 burgeRnaSeqGemMapperAlignTestes RNA-seq Testes bed 12 Burge Lab RNA-seq 32mer Reads from Testes 1 10014 12 12 120 133 133 187 0 0 0 expression 1 covMask1kGPilotLowCovUnion Union bed 3 Union of all masks 1 10014 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19239 YRI NA19239 bed 4 + YRI NA19239 (Father) (Complete Genomics) 3 10014 0 0 0 127 127 127 0 0 0 varRep 1 pgNA12892 CEU mother '2892 bed 4 + CEU Trio Mother NA12892 (1000 Genomes Project) 3 10015 128 64 0 191 159 127 0 0 0 varRep 1 NCCITCytosolicPolyAPlusTnFg NCCIT TnFg bed 4 + NCCIT Cytosolic polyA+, Affy Transfrags 3 10015 35 35 175 160 160 188 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 burgeRnaSeqGemMapperAlignBT474AllRawSignal RNA-seq BT474 Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from BT474 Breast Tumour Cell Line, Raw Signal 2 10015 46 0 184 150 127 219 0 0 0 expression 0 pgGS19239indel YRI NA19239 indel bed 4 + YRI NA19239 (Father) indel (Complete Genomics) 3 10015 0 0 0 127 127 127 0 0 0 varRep 1 NCCITCytosolicPolyAPlusTxn NCCIT Txn wig 0 6320.77 NCCIT Cytosolic polyA+, Affy Transcriptome 0 10016 175 150 128 255 128 0 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 pgHG00731 PUR father '731 bed 4 + PUR Trio Father HG00731 (Complete Genomics) 3 10016 128 64 0 191 159 127 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignHMEAllRawSignal RNA-seq HME Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from HME (Human Mammary Epithelial) Cell Line, Raw Signal 2 10016 46 0 184 150 127 219 0 0 0 expression 0 pgNA19240 YRI daught '9240 bed 4 + YRI Trio Daughter NA19240 (1000 Genomes Project) 3 10016 128 64 0 191 159 127 0 0 0 varRep 1 PC3CytosolicPolyAPlusTnFg PC3 TnFg bed 4 + PC3 Cytosolic polyA+, Affy Transfrags 3 10017 35 35 175 160 160 188 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 pgHG00731indel PUR HG00731 indel bed 4 + PUR HG00731 (Father) indel (Complete Genomics) 3 10017 0 0 0 127 127 127 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignMB435AllRawSignal RNA-seq MB435 Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from MB-435 Cell Line, Raw Signal 2 10017 46 0 184 150 127 219 0 0 0 expression 0 pgNA19238 YRI mother '9238 bed 4 + YRI Trio Mother NA19238 (1000 Genomes Project) 3 10017 128 64 0 191 159 127 0 0 0 varRep 1 pgGS19700 ASW NA19700 bed 4 + ASW NA19700 (Complete Genomics) 3 10018 0 0 0 127 127 127 0 0 0 varRep 1 PC3CytosolicPolyAPlusTxn PC3 Txn wig 0 2993.96 PC3 Cytosolic polyA+, Affy Transcriptome 0 10018 175 150 128 255 128 0 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 burgeRnaSeqGemMapperAlignMCF7AllRawSignal RNA-seq MCF7 Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from MCF-7 Breast Adenocarcinoma Cell Line, Raw Signal 2 10018 46 0 184 150 127 219 0 0 0 expression 0 pgNA19239 YRI father '9239 bed 4 + YRI Trio Father NA19239 (1000 Genomes Project) 3 10018 128 64 0 191 159 127 0 0 0 varRep 1 pgGS19700indel ASW NA19700 indel bed 4 + ASW NA19700 indel (Complete Genomics) 3 10019 0 0 0 127 127 127 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignT47DAllRawSignal RNA-seq T47D Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from T-47D Breast Ductal Carcinoma Cell Line, Raw Signal 2 10019 46 0 184 150 127 219 0 0 0 expression 0 SKNASCytosolicPolyAPlusTnFg SK-N-AS TnFg bed 4 + SK-N-AS Cytosolic polyA+, Affy Transfrags 3 10019 35 35 175 160 160 188 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 pgVenter Venter bed 4 + J. Craig Venter - Published Method 1, Variant in Original Form (JCVI) 3 10019 128 0 128 191 127 191 0 0 0 varRep 1 pgGS19701 ASW NA19701 bed 4 + ASW NA19701 (Complete Genomics) 3 10020 0 0 0 127 127 127 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignAdiposeAllRawSignal RNA-seq Adipose Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from Adipose, Raw Signal 2 10020 46 0 184 150 127 219 0 0 0 expression 0 SKNASCytosolicPolyAPlusTxn SK-N-AS Txn wig 0 4395.02 SK-N-AS Cytosolic polyA+, Affy Transcriptome 0 10020 175 150 128 255 128 0 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 pgWatson Watson bed 4 + James Watson (CSHL) 3 10020 153 0 0 204 127 127 0 0 0 varRep 1 pgGS19701indel ASW NA19701 indel bed 4 + ASW NA19701 indel (Complete Genomics) 3 10021 0 0 0 127 127 127 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignBrainAllRawSignal RNA-seq Brain Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from Brain, Raw Signal 2 10021 46 0 184 150 127 219 0 0 0 expression 0 U87CytosolicPolyAPlusTnFg U87 TnFg bed 4 + U87 Cytosolic polyA+, Affy Transfrags 3 10021 35 35 175 160 160 188 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 pgYoruban3 YRI NA18507 bed 4 + YRI NA18507 (Illumina Cambridge/Solexa, SNPs called by PSU) 3 10021 128 0 128 191 127 191 0 0 0 varRep 1 pgGS19703 ASW NA19703 bed 4 + ASW NA19703 (Complete Genomics) 3 10022 0 0 0 127 127 127 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignBreastAllRawSignal RNA-seq Breast Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from Breast, Raw Signal 2 10022 46 0 184 150 127 219 0 0 0 expression 0 U87CytosolicPolyAPlusTxn U87 Txn wig 0 5939.33 U87 Cytosolic polyA+, Affy Transcriptome 0 10022 175 150 128 255 128 0 0 0 10 chr6,chr7,chr13,chr14,chr19,chr20,chr21,chr22,chrX,chrY, expression 0 pgYh1 YanHuang bed 4 + Han Chinese Individual (YanHuang Project) 3 10022 0 128 128 127 191 191 0 0 0 varRep 1 pgGS19703indel ASW NA19703 indel bed 4 + ASW NA19703 indel (Complete Genomics) 3 10023 0 0 0 127 127 127 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignColonAllRawSignal RNA-seq Colon Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from Colon, Raw Signal 2 10023 46 0 184 150 127 219 0 0 0 expression 0 pgSjk SJK bed 4 + Seong-Jin Kim (SJK, GUMS/KOBIC) 3 10023 0 128 128 127 191 191 0 0 0 varRep 1 pgAk1 AK1 bed 4 + Anonymous Korean individual, AK1 (Genomic Medicine Institute) 3 10024 0 128 64 127 191 159 0 0 0 varRep 1 pgGS19704 ASW NA19704 bed 4 + ASW NA19704 (Complete Genomics) 3 10024 0 0 0 127 127 127 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignHeartAllRawSignal RNA-seq Heart Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from Heart, Raw Signal 2 10024 46 0 184 150 127 219 0 0 0 expression 0 pgGS19704indel ASW NA19704 indel bed 4 + ASW NA19704 indel (Complete Genomics) 3 10025 0 0 0 127 127 127 0 0 0 varRep 1 pgIrish Irish Male bed 4 + Anonymous Irish Male 3 10025 0 100 100 127 177 177 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignLiverAllRawSignal RNA-seq Liver Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from Liver, Raw Signal 2 10025 46 0 184 150 127 219 0 0 0 expression 0 pgGS19834 ASW NA19834 bed 4 + ASW NA19834 (Complete Genomics) 3 10026 0 0 0 127 127 127 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignLymphNodeAllRawSignal RNA-seq Lymph Node Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from Lymph Node, Raw Signal 2 10026 46 0 184 150 127 219 0 0 0 expression 0 pgQuake S. Quake bed 4 + Stephen Quake (Stanford) 3 10026 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19834indel ASW NA19834 indel bed 4 + ASW NA19834 indel (Complete Genomics) 3 10027 0 0 0 127 127 127 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignSkelMuscleAllRawSignal RNA-seq Muscle Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from Skeletal Muscle, Raw Signal 2 10027 46 0 184 150 127 219 0 0 0 expression 0 pgSaqqaq Saqqaq bed 4 + Individual from the Extinct Palaeo-Eskimo Saqqaq (Saqqaq Genome Project) 3 10027 0 0 0 127 127 127 0 0 0 varRep 1 pgHG00732 PUR mother '732 bed 4 + PUR Trio Mother HG00732 (Complete Genomics) 3 10028 128 64 0 191 159 127 0 0 0 varRep 1 burgeRnaSeqGemMapperAlignTestesAllRawSignal RNA-seq Testes Sig bedGraph 4 Burge Lab RNA-seq 32mer Reads from Testes, Raw Signal 2 10028 46 0 184 150 127 219 0 0 0 expression 0 pgSaqqaqHc Saqqaq HC bed 4 + Individual from the Extinct Palaeo-Eskimo Saqqaq, high confidence SNPs (Saqqaq Genome Project) 3 10028 0 0 0 127 127 127 0 0 0 varRep 1 pgHG00732indel PUR HG00732 indel bed 4 + PUR HG00732 (Mother) indel (Complete Genomics) 3 10029 0 0 0 127 127 127 0 0 0 varRep 1 pgGS06985 CEU NA06985 bed 4 + CEU NA06985 (Complete Genomics) 3 10030 0 0 0 127 127 127 0 0 0 varRep 1 pgGS06985indel CEU NA06985 indel bed 4 + CEU NA06985 indel (Complete Genomics) 3 10031 0 0 0 127 127 127 0 0 0 varRep 1 pgGS06994 CEU NA06994 bed 4 + CEU NA06994 (Complete Genomics) 3 10032 0 0 0 127 127 127 0 0 0 varRep 1 pgGS06994indel CEU NA06994 indel bed 4 + CEU NA06994 indel (Complete Genomics) 3 10033 0 0 0 127 127 127 0 0 0 varRep 1 pgGS07357 CEU NA07357 bed 4 + CEU NA07357 (Complete Genomics) 3 10034 0 0 0 127 127 127 0 0 0 varRep 1 pgGS07357indel CEU NA07357 indel bed 4 + CEU NA07357 indel (Complete Genomics) 3 10035 0 0 0 127 127 127 0 0 0 varRep 1 pgGS10851 CEU NA10851 bed 4 + CEU NA10851 (Complete Genomics) 3 10036 0 0 0 127 127 127 0 0 0 varRep 1 pgGS10851indel CEU NA10851 indel bed 4 + CEU NA10851 indel (Complete Genomics) 3 10037 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12004 CEU NA12004 bed 4 + CEU NA12004 (Complete Genomics) 3 10038 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12004indel CEU NA12004 indel bed 4 + CEU NA12004 indel (Complete Genomics) 3 10039 0 0 0 127 127 127 0 0 0 varRep 1 pgHG00733 PUR daughter '733 bed 4 + PUR Trio Daughter HG00733 (Complete Genomics) 3 10040 128 64 0 191 159 127 0 0 0 varRep 1 pgHG00733indel PUR HG00733 indel bed 4 + PUR HG00733 (Daughter) indel (Complete Genomics) 3 10041 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18526 CHB NA18526 bed 4 + CHB NA18526 (Complete Genomics) 3 10042 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18526indel CHB NA18526 indel bed 4 + CHB NA18526 indel (Complete Genomics) 3 10043 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18537 CHB NA18537 bed 4 + CHB NA18537 (Complete Genomics) 3 10044 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18537indel CHB NA18537 indel bed 4 + CHB NA18537 indel (Complete Genomics) 3 10045 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18555 CHB NA18555 bed 4 + CHB NA18555 (Complete Genomics) 3 10046 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18555indel CHB NA18555 indel bed 4 + CHB NA18555 indel (Complete Genomics) 3 10047 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18558 CHB NA18558 bed 4 + CHB NA18558 (Complete Genomics) 3 10048 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18558indel CHB NA18558 indel bed 4 + CHB NA18558 indel (Complete Genomics) 3 10049 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20845 GIH NA20845 bed 4 + GIH NA20845 (Complete Genomics) 3 10050 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20845indel GIH NA20845 indel bed 4 + GIH NA20845 indel (Complete Genomics) 3 10051 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20846 GIH NA20846 bed 4 + GIH NA20846 (Complete Genomics) 3 10052 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20846indel GIH NA20846 indel bed 4 + GIH NA20846 indel (Complete Genomics) 3 10053 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20847 GIH NA20847 bed 4 + GIH NA20847 (Complete Genomics) 3 10054 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20847indel GIH NA20847 indel bed 4 + GIH NA20847 indel (Complete Genomics) 3 10055 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20850 GIH NA20850 bed 4 + GIH NA20850 (Complete Genomics) 3 10056 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20850indel GIH NA20850 indel bed 4 + GIH NA20850 indel (Complete Genomics) 3 10057 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18940 JPT NA18940 bed 4 + JPT NA18940 (Complete Genomics) 3 10058 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18940indel JPT NA18940 indel bed 4 + JPT NA18940 indel (Complete Genomics) 3 10059 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18942 JPT NA18942 bed 4 + JPT NA18942 (Complete Genomics) 3 10060 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18942indel JPT NA18942 indel bed 4 + JPT NA18942 indel (Complete Genomics) 3 10061 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18947 JPT NA18947 bed 4 + JPT NA18947 (Complete Genomics) 3 10062 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18947indel JPT NA18947 indel bed 4 + JPT NA18947 indel (Complete Genomics) 3 10063 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18956 JPT NA18956 bed 4 + JPT NA18956 (Complete Genomics) 3 10064 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18956indel JPT NA18956 indel bed 4 + JPT NA18956 indel (Complete Genomics) 3 10065 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19017 LWK NA19017 bed 4 + LWK NA19017 (Complete Genomics) 3 10066 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19017indel LWK NA19017 indel bed 4 + LWK NA19017 indel (Complete Genomics) 3 10067 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19020 LWK NA19020 bed 4 + LWK NA19020 (Complete Genomics) 3 10068 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19020indel LWK NA19020 indel bed 4 + LWK NA19020 indel (Complete Genomics) 3 10069 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19025 LWK NA19025 bed 4 + LWK NA19025 (Complete Genomics) 3 10070 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19025indel LWK NA19025 indel bed 4 + LWK NA19025 indel (Complete Genomics) 3 10071 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19026 LWK NA19026 bed 4 + LWK NA19026 (Complete Genomics) 3 10072 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19026indel LWK NA19026 indel bed 4 + LWK NA19026 indel (Complete Genomics) 3 10073 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19648 MXL NA19648 bed 4 + MXL NA19648 (Complete Genomics) 3 10074 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19648indel MXL NA19648 indel bed 4 + MXL NA19648 indel (Complete Genomics) 3 10075 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19649 MXL NA19649 bed 4 + MXL NA19649 (Complete Genomics) 3 10076 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19649indel MXL NA19649 indel bed 4 + MXL NA19649 indel (Complete Genomics) 3 10077 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19669 MXL NA19669 bed 4 + MXL NA19669 (Complete Genomics) 3 10078 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19669indel MXL NA19669 indel bed 4 + MXL NA19669 indel (Complete Genomics) 3 10079 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19670 MXL NA19670 bed 4 + MXL NA19670 (Complete Genomics) 3 10080 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19670indel MXL NA19670 indel bed 4 + MXL NA19670 indel (Complete Genomics) 3 10081 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19735 MXL NA19735 bed 4 + MXL NA19735 (Complete Genomics) 3 10082 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19735indel MXL NA19735 indel bed 4 + MXL NA19735 indel (Complete Genomics) 3 10083 0 0 0 127 127 127 0 0 0 varRep 1 pgGS21732 MKK NA21732 bed 4 + MKK NA21732 (Complete Genomics) 3 10084 0 0 0 127 127 127 0 0 0 varRep 1 pgGS21732indel MKK NA21732 indel bed 4 + MKK NA21732 indel (Complete Genomics) 3 10085 0 0 0 127 127 127 0 0 0 varRep 1 pgGS21733 MKK NA21733 bed 4 + MKK NA21733 (Complete Genomics) 3 10086 0 0 0 127 127 127 0 0 0 varRep 1 pgGS21733indel MKK NA21733 indel bed 4 + MKK NA21733 indel (Complete Genomics) 3 10087 0 0 0 127 127 127 0 0 0 varRep 1 pgGS21737 MKK NA21737 bed 4 + MKK NA21737 (Complete Genomics) 3 10088 0 0 0 127 127 127 0 0 0 varRep 1 pgGS21737indel MKK NA21737 indel bed 4 + MKK NA21737 indel (Complete Genomics) 3 10089 0 0 0 127 127 127 0 0 0 varRep 1 pgGS21767 MKK NA21767 bed 4 + MKK NA21767 (Complete Genomics) 3 10090 0 0 0 127 127 127 0 0 0 varRep 1 pgGS21767indel MKK NA21767 indel bed 4 + MKK NA21767 indel (Complete Genomics) 3 10091 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20502 TSI NA20502 bed 4 + TSI NA20502 (Complete Genomics) 3 10092 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20502indel TSI NA20502 indel bed 4 + TSI NA20502 indel (Complete Genomics) 3 10093 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20509 TSI NA20509 bed 4 + TSI NA20509 (Complete Genomics) 3 10094 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20509indel TSI NA20509 indel bed 4 + TSI NA20509 indel (Complete Genomics) 3 10095 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20510 TSI NA20510 bed 4 + TSI NA20510 (Complete Genomics) 3 10096 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20510indel TSI NA20510 indel bed 4 + TSI NA20510 indel (Complete Genomics) 3 10097 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20511 TSI NA20511 bed 4 + TSI NA20511 (Complete Genomics) 3 10098 0 0 0 127 127 127 0 0 0 varRep 1 pgGS20511indel TSI NA20511 indel bed 4 + TSI NA20511 indel (Complete Genomics) 3 10099 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18501 YRI NA18501 bed 4 + YRI NA18501 (Complete Genomics) 3 10100 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18501indel YRI NA18501 indel bed 4 + YRI NA18501 indel (Complete Genomics) 3 10101 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18502 YRI NA18502 bed 4 + YRI NA18502 (Complete Genomics) 3 10102 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18502indel YRI NA18502 indel bed 4 + YRI NA18502 indel (Complete Genomics) 3 10103 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18504 YRI NA18504 bed 4 + YRI NA18504 (Complete Genomics) 3 10104 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18504indel YRI NA18504 indel bed 4 + YRI NA18504 indel (Complete Genomics) 3 10105 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18505 YRI NA18505 bed 4 + YRI NA18505 (Complete Genomics) 3 10106 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18505indel YRI NA18505 indel bed 4 + YRI NA18505 indel (Complete Genomics) 3 10107 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18508 YRI NA18508 bed 4 + YRI NA18508 (Complete Genomics) 3 10108 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18508indel YRI NA18508 indel bed 4 + YRI NA18508 indel (Complete Genomics) 3 10109 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18517 YRI NA18517 bed 4 + YRI NA18517 (Complete Genomics) 3 10110 0 0 0 127 127 127 0 0 0 varRep 1 pgGS18517indel YRI NA18517 indel bed 4 + YRI NA18517 indel (Complete Genomics) 3 10111 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19129 YRI NA19129 bed 4 + YRI NA19129 (Complete Genomics) 3 10112 0 0 0 127 127 127 0 0 0 varRep 1 pgGS19129indel YRI NA19129 indel bed 4 + YRI NA19129 indel (Complete Genomics) 3 10113 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12877 CEU NA12877 bed 4 + CEU pedigree 1463, NA12877 (Complete Genomics) 3 10114 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12877indel CEU NA12877 indel bed 4 + CEU NA12877 indel (Complete Genomics) 3 10115 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12879 CEU NA12879 bed 4 + CEU pedigree 1463, NA12879 (Complete Genomics) 3 10116 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12879indel CEU NA12879 indel bed 4 + CEU NA12879 indel (Complete Genomics) 3 10117 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12880 CEU NA12880 bed 4 + CEU pedigree 1463, NA12880 (Complete Genomics) 3 10118 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12880indel CEU NA12880 indel bed 4 + CEU NA12880 indel (Complete Genomics) 3 10119 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12881 CEU NA12881 bed 4 + CEU pedigree 1463, NA12881 (Complete Genomics) 3 10120 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12881indel CEU NA12881 indel bed 4 + CEU NA12881 indel (Complete Genomics) 3 10121 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12882 CEU NA12882 bed 4 + CEU pedigree 1463, NA12882 (Complete Genomics) 3 10122 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12882indel CEU NA12882 indel bed 4 + CEU NA12882 indel (Complete Genomics) 3 10123 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12883 CEU NA12883 bed 4 + CEU pedigree 1463, NA12883 (Complete Genomics) 3 10124 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12883indel CEU NA12883 indel bed 4 + CEU NA12883 indel (Complete Genomics) 3 10125 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12884 CEU NA12884 bed 4 + CEU pedigree 1463, NA12884 (Complete Genomics) 3 10126 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12884indel CEU NA12884 indel bed 4 + CEU NA12884 indel (Complete Genomics) 3 10127 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12885 CEU NA12885 bed 4 + CEU pedigree 1463, NA12885 (Complete Genomics) 3 10128 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12885indel CEU NA12885 indel bed 4 + CEU NA12885 indel (Complete Genomics) 3 10129 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12886 CEU NA12886 bed 4 + CEU pedigree 1463, NA12886 (Complete Genomics) 3 10130 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12886indel CEU NA12886 indel bed 4 + CEU NA12886 indel (Complete Genomics) 3 10131 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12887 CEU NA12887 bed 4 + CEU pedigree 1463, NA12887 (Complete Genomics) 3 10132 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12887indel CEU NA12887 indel bed 4 + CEU NA12887 indel (Complete Genomics) 3 10133 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12888 CEU NA12888 bed 4 + CEU pedigree 1463, NA12888 (Complete Genomics) 3 10134 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12888indel CEU NA12888 indel bed 4 + CEU NA12888 indel (Complete Genomics) 3 10135 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12889 CEU NA12889 bed 4 + CEU pedigree 1463, NA12889 (Complete Genomics) 3 10136 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12889indel CEU NA12889 indel bed 4 + CEU NA12889 indel (Complete Genomics) 3 10137 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12890 CEU NA12890 bed 4 + CEU pedigree 1463, NA12890 (Complete Genomics) 3 10138 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12890indel CEU NA12890 indel bed 4 + CEU NA12890 indel (Complete Genomics) 3 10139 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12893 CEU NA12893 bed 4 + CEU pedigree 1463, NA12893 (Complete Genomics) 3 10140 0 0 0 127 127 127 0 0 0 varRep 1 pgGS12893indel CEU NA12893 indel bed 4 + CEU NA12893 indel (Complete Genomics) 3 10141 0 0 0 127 127 127 0 0 0 varRep 1 pgAngrist Misha Angrist bed 4 + Misha Angrist (Personal Genome Project) 3 10142 0 0 0 127 127 127 0 0 0 varRep 1 pgChurch George Church bed 4 + George Church (Personal Genome Project) 3 10143 0 0 0 127 127 127 0 0 0 varRep 1 pgGatesJr Gates Jr bed 4 + Henry Louis Gates Jr (Personal Genome Project) 3 10144 0 0 0 127 127 127 0 0 0 varRep 1 pgGatesSr Gates Sr bed 4 + Henry Louis Gates Sr (Personal Genome Project) 3 10145 0 0 0 127 127 127 0 0 0 varRep 1 pgGill Rosalynn Gill bed 4 + Rosalynn Gill (Personal Genome Project) 3 10146 0 0 0 127 127 127 0 0 0 varRep 1 pgVenterSnp Venter bed 4 + J. Craig Venter - Published Method 1, Variant in Original Form (JCVI) 3 10147 128 0 128 191 127 191 0 0 0 varRep 1 pgVenterIndel Venter indels bed 4 + J. Craig Venter - Published Method 1, Indels in Original Form (JCVI) 3 10148 128 0 128 191 127 191 0 0 0 varRep 1 pgKriek M. Kriek bed 4 + Marjolein Kriek (Leiden University Medical Centre) 3 10149 0 0 0 127 127 127 0 0 0 varRep 1 pgLucier Greg Lucier bed 4 + Gregory Lucier (Life Technologies) 3 10150 0 0 0 127 127 127 0 0 0 varRep 1